Alan,

I am also wondering about that zip() function call to bind the two lists

into a sort of iterator object. Presumably that calls the iterator N times.

I did a test where I made two list called A and B and used zip to make an

object holding the two and then removed A and B. I was able to print the

list of tuples just fine with print(list(C)) so clearly it is not so much a

pure iterator as one that holds yet another copy of both lists!

Now the old-fashioned C way, might simply use old fashioned but highly

efficient pointer arithmetic to move through two lists or arrays or whatever

of the same length adding as it goes so one pass period. Or it could use

indexing as in sum += A[i] * B[i] ...

But yes, there are many good reasons to use clearer code with well-tested

parts even at the cost of some efficiency.

Is there a version of zip() or an alternative that might be faster? In

particular, I was wondering the usual question of what happens if you might

abort early if something is noticed, like say a negative number. If you

pre-calculated and copied 100,000 items and abort after 5, ...

I note an approach in some languages using a vectorized approach may be

faster. Forget lists and as long as A and B are equal length vectors or

arrays as in numpy, there are ways to ask to multiply two arrays in

vectorized fashion so effectively you can say something akin to:

sum(A*B)

The above may be highly efficient especially if the underlying code is in

C/C++.

If there is no objection in your code to using the numpy module and the

corresponding objects you can make a similar test with something like this:

And the function they seem to want is the dot product of two such arrays as

in numpy.dot(A, B) provides the sum of the many products of corresponding

entries in A and B.

-----Original Message-----

From: Python-list <python-list-bounces+avigross=verizon.net@python.org> On

Behalf Of Alan Gauld

Sent: Tuesday, October 12, 2021 6:56 PM

To: python-list@python.org

Subject: Fwd: Re: sum() vs. loop

On 10/10/2021 09:49, Steve Keller wrote:

> I have found the sum() function to be much slower than to loop over

> the operands myself:

>

> def sum_products(seq1, seq2):

> return sum([a * b for a, b in zip(seq1, seq2)])

>

> def sum_products2(seq1, seq2):

> sum = 0

> for a, b in zip(seq1, seq2):

> sum += a * b

> return sum

>

> In a program I generate about 14 million pairs of sequences of ints

> each of length 15 which need to be summed. The first version with

> sum() needs

> 44 seconds while the second version runs in 37 seconds.

>

> Can someone explain this difference?

I'm no expert on Python innards but it looks to me like the first version

loops over the length of the list twice, once to generate the list of

products and the second time to add them together.

The pure Python version only loops once because it does the addition in

transit.

Presumably, especially for large numbers, a single python loop is faster

than 2 C loops?

But that's purely my speculation.

--

Alan G

Author of the Learn to Program web site

http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at:

http://www.flickr.com/photos/alangauldphotos --

https://mail.python.org/mailman/listinfo/python-list --

https://mail.python.org/mailman/listinfo/python-list