# Mailing List Archive

I have found the sum() function to be much slower than to loop over the
operands myself:

def sum_products(seq1, seq2):
return sum([a * b for a, b in zip(seq1, seq2)])

def sum_products2(seq1, seq2):
sum = 0
for a, b in zip(seq1, seq2):
sum += a * b
return sum

In a program I generate about 14 million pairs of sequences of ints each
of length 15 which need to be summed. The first version with sum() needs
44 seconds while the second version runs in 37 seconds.

Can someone explain this difference?

Steve
--
https://mail.python.org/mailman/listinfo/python-list
Am 10.10.21 um 10:49 schrieb Steve Keller:
> I have found the sum() function to be much slower than to loop over the
> operands myself:
>
> def sum_products(seq1, seq2):
> return sum([a * b for a, b in zip(seq1, seq2)])
>
> def sum_products2(seq1, seq2):
> sum = 0
> for a, b in zip(seq1, seq2):
> sum += a * b
> return sum
>
> In a program I generate about 14 million pairs of sequences of ints each
> of length 15 which need to be summed. The first version with sum() needs
> 44 seconds while the second version runs in 37 seconds.

The first version constructs a list, sums it up and throws the list
away, while the second version only keeps the running sum in memory. How

sum((a * b for a, b in zip(seq1, seq2)))

(untested) ?

Christian
--
https://mail.python.org/mailman/listinfo/python-list
Christian Gollwitzer <auriocus@gmx.de> writes:

> > def sum_products(seq1, seq2):
> > return sum([a * b for a, b in zip(seq1, seq2)])
> > def sum_products2(seq1, seq2):
> > sum = 0
> > for a, b in zip(seq1, seq2):
> > sum += a * b
> > return sum
> > [...]
>
> The first version constructs a list, sums it up and throws the list
> away, while the second version only keeps the running sum in
>
>
> sum((a * b for a, b in zip(seq1, seq2)))

Ah, of course. Much cleaner and I should have seen that myself.
Thanks.

BUT

> (untested) ?

I have tested it and with () instead of [] it's even slower:

explicit loop: 37s ? .5s
sum([...]) 44s ? .5s
sum((...)) 47.5s ? .5s

Now completely surprised.

Steve
On Tue, Oct 12, 2021 at 8:55 AM Steve Keller <keller.steve@gmx.de> wrote:
>
> I have found the sum() function to be much slower than to loop over the
> operands myself:
>
> def sum_products(seq1, seq2):
> return sum([a * b for a, b in zip(seq1, seq2)])
>
> def sum_products2(seq1, seq2):
> sum = 0
> for a, b in zip(seq1, seq2):
> sum += a * b
> return sum
>
> In a program I generate about 14 million pairs of sequences of ints each
> of length 15 which need to be summed. The first version with sum() needs
> 44 seconds while the second version runs in 37 seconds.
>
> Can someone explain this difference?
>

When you use sum, you're constructing a list. Try removing the square brackets:

return sum(a * b for a, b in zip(seq1, seq2))

It's also possible that the genexp has more overhead than simply
iterating directly, but at very least, this will reduce the memory
consumption.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
On Tue, Oct 12, 2021 at 9:02 AM Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>
> Steve Keller <keller.steve@gmx.de> writes:
> >Now completely surprised.
>
> I have observed that here the generator-based sum() call
> is slower if both seq1 and seq2 have a length of 1000, but
> faster if both seq1 and seq2 have 10000000 entries each
> (with float entries).
>
> However, in any case, the direct for-loop still is fastest.
> Maybe, calling "next()" (actually, "PyIter_Next" in C) on
> an iterator has some overhead compared to just evaluating
> an expression in a loop.
>

There's overhead whichever way you do it, and ultimately, it depends
on what you're doing. The genexp is effectively a callback into Python
code. The list comp requires preconstruction of the entire list. The
plain for loop involves summing by picking up and putting down in
Python code rather than doing that in C. The only way to know which is
fastest is to try them all :)

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
On Mon, Oct 11, 2021 at 2:54 PM Steve Keller <keller.steve@gmx.de> wrote:

> I have found the sum() function to be much slower than to loop over the
> operands myself:
>
> def sum_products(seq1, seq2):
> return sum([a * b for a, b in zip(seq1, seq2)])
>
> def sum_products2(seq1, seq2):
> sum = 0
> for a, b in zip(seq1, seq2):
> sum += a * b
> return sum
>
> In a program I generate about 14 million pairs of sequences of ints each
> of length 15 which need to be summed. The first version with sum() needs
> 44 seconds while the second version runs in 37 seconds.
>
> Can someone explain this difference?
>
I can't explain it. It might help to try a line-by-line profiler.

If you need speed, maybe try Cython, numpy and/or numba.

It seems like the generator expression should be the fastest to me. But
writing for speed instead of writing for clarity is usually not a great
idea.
--
https://mail.python.org/mailman/listinfo/python-list
Am 12.10.21 um 05:41 schrieb Dan Stromberg:
> On Mon, Oct 11, 2021 at 2:54 PM Steve Keller <keller.steve@gmx.de> wrote:
>
>> I have found the sum() function to be much slower than to loop over the
>> operands myself:
>>
>> def sum_products(seq1, seq2):
>> return sum([a * b for a, b in zip(seq1, seq2)])
>>
>> def sum_products2(seq1, seq2):
>> sum = 0
>> for a, b in zip(seq1, seq2):
>> sum += a * b
>> return sum

> It seems like the generator expression should be the fastest to me. But
> writing for speed instead of writing for clarity is usually not a great
> idea.
>

Maybe, unless this was just a test, it can be fastest AND clearest with

numpy.dot(seq1, seq2)

- in case that the sequence consists of floats and, in the best case, is
already stored in a numpy array. Then you cannot beat this with Python
code.

Christian
--
https://mail.python.org/mailman/listinfo/python-list
On Mon, 11 Oct 2021 at 23:00, Christian Gollwitzer <auriocus@gmx.de> wrote:
>
> Am 10.10.21 um 10:49 schrieb Steve Keller:
> > I have found the sum() function to be much slower than to loop over the
> > operands myself:
> >
> > def sum_products(seq1, seq2):
> > return sum([a * b for a, b in zip(seq1, seq2)])
> >
> > def sum_products2(seq1, seq2):
> > sum = 0
> > for a, b in zip(seq1, seq2):
> > sum += a * b
> > return sum
> >
> > In a program I generate about 14 million pairs of sequences of ints each
> > of length 15 which need to be summed. The first version with sum() needs
> > 44 seconds while the second version runs in 37 seconds.
>
> The first version constructs a list, sums it up and throws the list
> away, while the second version only keeps the running sum in memory. How
>
>
> sum((a * b for a, b in zip(seq1, seq2)))

What really matters in cases like this is interpreter overhead.
Typically a generator expression has slightly more overhead compared
to a list comprehension. You get a slightly slower per-item speed in
exchange for O(1) memory consumption.

A list comprehension like

result = sum([expr for foo in bar])

Translates internally to something like:

def _func():
data = []
for foo in bar:
data.append(foo)
return foo

_stuff = _func()
result = sum(_stuff)

Running this really does bring in the overhead of a function call
_func() because it needs to create a frame with locals and so on.
However it is only the cost of a single function call. Afterwards if
the elements in the list _stuff are built in types like int etc with
builtin __add__ methods then sum(_stuff) does not involve the
interpreter at all: it's a built-in function operating on a built-in
container of built-in objects.

With a generator expression like

result = sum(expr for foo in bar)

This translates internally to

def _genfunc():
for foo in bar:
yield foo

_gen = _genfunc()
result = sum(_gen)

Now the _genfunc() function call simply creates the generator and sets
up its frame which I think is more or less equivalent to the overhead
of the _func() function call. However the sum(_gen) line is no longer
a pure built-in operation. Internally each time sum calls
_gen.__next__() the execution frame for _genfunc() is reactivated so
that the interpreter can execute the bytecodes taking the frame from
one yield to the next. This is almost the overhead of a function call
for each item in bar although this is often unnoticeable in practice
and other factors can affect this.

The fastest version eliminates the interpreter altogether at least
when operating on pure built-in elements:

In : nums1 = nums2 = list(range(10**5))

In : %timeit sum([a*b for a, b in zip(nums1, nums2)])
9.83 ms ± 21.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In : %timeit sum((a*b for a, b in zip(nums1, nums2)))
10.3 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In : from operator import mul

In : %timeit sum(map(mul, nums1, nums2))
7.25 ms ± 33 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Of course if the elements being multiplied and summed have pure-Python
__add__/__mul__ methods or the container has a pure Python
__iter__/__next__ then any of those methods will typically dominate
the runtime over any of the overheads considered here.

--
Oscar
--
https://mail.python.org/mailman/listinfo/python-list
Yes, Stefan, I realized that and did not test more thoroughly. Chris pointed
it out too. I indeed deleted the names but a second reference remained so I
may be wrong and the function may just keep track of the position and return
one tuple at a time. That would be a good design, unless there was a worry
the original might be changed out from under.

My apologies if it was understood to mean I had shown it was copied.

-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net@python.org> On
Behalf Of Stefan Ram
Sent: Tuesday, October 12, 2021 9:49 PM
To: python-list@python.org
Subject: Re: sum() vs. loop

"Avi Gross" <avigross@verizon.net> writes:
>I did a test where I made two list called A and B and used zip to make
>an object holding the two and then removed A and B. I was able to print
>the list of tuples just fine with print(list(C)) so clearly it is not
>so much a pure iterator as one that holds yet another copy of both lists!

You may have "deleted the names", but not the lists.

a =[ 10, 11, 12 ]
b =[ 20, 21, 22 ]
c = zip( a, b )
del a[ 0 ], a[ 0 ], a[ 0 ]
del b[ 0 ], b[ 0 ], b[ 0 ]
print( a, b, list( c ))

prints

[] [] []

here.

--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list