Mailing List Archive

Origins of iterators and iterables
On Sun, May 30, 2021 at 9:10 AM Julien Palard <julien@palard.fr> wrote:

>
> > is the fact some things (like generators) give iterators instead of
> > iterables as a hint they're not "rewindable" was initially thought
> > of and part of the design, or it emerged later.
>

Hm... I don't think that was a big part of the original design. The true
difference between iterable and iterator is that the iterator stores the
state needed to iterate over a given iterable with a for-loop. So if you
have an array, and you have two loops over them (e.g. nested, like this:

for x in a:
for y in a:
print(x, y, x+y)

) then you need separate iterator objects so that advancing the inner
iterator doesn't affect the outer iterator. This is why you can't store the
iteration state in the iterable (the array) but must use a separate object.

Iterators themselves cannot rewind -- at least, there's no standard API for
it, and although nothing stops you from adding such an API to a *specific*
iterator type, it's not a common pattern.

Returning "self" as the iterator was originally only intended to paper over
the case where you want to write

it = iter(a)
<maybe call next(it) a few times>
for x in it:
...

-- basically we wanted 'for x in iter(a)' and 'for x in a' to have the same
meaning.

IIRC iterables returning "self" as the iterator in other cases first came
up for files, where we had long been struggling to find the best API to get
all the lines of the file while still benefiting from buffering (calling
f.readline() in a loop was too slow).

The first version of this API was f.readlines(), which returned a list of
strings. But we realized this could potentially use up too much memory, so
we added an optional "hint" argument so you could say f.readlines(100000)
and get a number of lines approximately corresponding to 100000 bytes. This
required people to write fairly tedious double loops to loop over all lines
efficiently, e.g.

while 1:
lines = f.readlines(100000)
if not lines:
break
for line in lines:
<do the thing per line>

Maybe there was an intermediate step (I vaguely recall a special dunder?),
but eventually we realized that the best way to write this was just

for line in f:
<do the thing per line>

(the iterator can buffer internally) and we accepted that you can only
iterate once over a file -- we just told people "if you double-iterate over
a file it doesn't work right".

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: Origins of iterators and iterables [ In reply to ]
Am 30.05.21 um 19:08 schrieb Guido van Rossum:
> Returning "self" as the iterator was originally only intended to paper
> over the case where you want to write
>
> it = iter(a)
> <maybe call next(it) a few times>
> for x in it:
>     ...
>
> -- basically we wanted 'for x in iter(a)' and 'for x in a' to have the
> same meaning.

The above use case (iterator being iterable themselves) was a very good
design decision. In fact, I have a blog post dating back from 2006 where
I berated Java from not doing the same:
https://rittau.org/2006/11/java-iterators-are-not-iterable/. To take my
example from there converted to python:

class Tree:
    def depth_first() -> Iterator[...]: ...
    def breath_first() -> Iterator[...]: ...

for item in tree.depth_first(): ...

This example would not work if "iter(it)" would not return "self".

 - Sebastian