Mailing List Archive

Should the definition of an "(async) iterator" include __iter__?
Over in https://github.com/python/typeshed/issues/6030 I have managed to
kick up a discussion over what exactly an "iterator" is. If you look at
https://docs.python.org/3/library/functions.html#iter you will see the docs
say it "Return[s] an iterator
<https://docs.python.org/3/glossary.html#term-iterator> object." Great, but
you go the glossary definition of "iterator" at
https://docs.python.org/3/glossary.html#term-iterator you will see it says
"[i]terators are required to have an __iter__()
<https://docs.python.org/3/reference/datamodel.html#object.__iter__>
method" which neither `for` nor `iter()` actually enforce.

Is there something to do here? Do we loosen the definition of "iterator" to
say they *should* define __iter__? Leave it as-is with an understanding
that we know that it's technically inaccurate for iter() but that we want
to encourage people to define __iter__? I'm assuming people don't want to
change `for` and `iter()` to start requiring __iter__ be defined if we
decided to go down the "remove the __aiter__ requirement" from aiter() last
week.

BTW all of this applies to async iterators as well.
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
My view of this is:

A. It's not an iterator if it doesn't define `__next__`.

B. It is strongly recommended that iterators also define `__iter__`.

In "standards" language, I think (A) is MUST and (B) is merely OUGHT or
maybe SHOULD.

On Tue, Sep 14, 2021 at 12:30 PM Brett Cannon <brett@python.org> wrote:

> Over in https://github.com/python/typeshed/issues/6030 I have managed to
> kick up a discussion over what exactly an "iterator" is. If you look at
> https://docs.python.org/3/library/functions.html#iter you will see the
> docs say it "Return[s] an iterator
> <https://docs.python.org/3/glossary.html#term-iterator> object." Great,
> but you go the glossary definition of "iterator" at
> https://docs.python.org/3/glossary.html#term-iterator you will see it
> says "[i]terators are required to have an __iter__()
> <https://docs.python.org/3/reference/datamodel.html#object.__iter__>
> method" which neither `for` nor `iter()` actually enforce.
>
> Is there something to do here? Do we loosen the definition of "iterator"
> to say they *should* define __iter__? Leave it as-is with an
> understanding that we know that it's technically inaccurate for iter() but
> that we want to encourage people to define __iter__? I'm assuming people
> don't want to change `for` and `iter()` to start requiring __iter__ be
> defined if we decided to go down the "remove the __aiter__ requirement"
> from aiter() last week.
>
> BTW all of this applies to async iterators as well.
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/3W7TDX5KNVQVGT5CUHBK33M7VNTP25DZ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
I think there is also a distinction about the *current* meaning of "required" to be made, in "[i]terators are required to have an |__iter__()| <https://docs.python.org/3/reference/datamodel.html#object.__iter__> method": "required" doesn't specify whether this is:

1. by convention, and doing otherwise is just some form of undefined behaviour; for a human (or perhaps type-checker) reading it to think it's an iterator, it needs `__iter__`, but it's really something like passing an object of the wrong type to an unbound method - unenforced by the language (it used to be illegal in Py2)

2. in some way actually enforced: the iterator is required to have `__iter__` that returns self, and

While 1 is clearly what actually happens in CPython, was that the intended meaning? I'd think so - 1 is still a perfectly acceptable interpretation of "required" (even if "required" isn't the most clear way of expressing it). Even if it wasn't the original meaning, that's how I think it should now be interpreted because that's what it is de facto.

Do we know who originally wrote that line, so we could ask them? (The furthest I've traced it is https://github.com/python/cpython/commit/f10aa9825e49e8652f30bc6d92c736fe47bb134c but I don't have any knowledge of SVN or CVS (whichever was used at the time) to go further.)

Also, any user-defined iterator that doesn't also define __iter__ would be considered wrong and nobody would refuse to fix that. If it's already a bug anyway, why bother changing the behaviour and check that?

> A. It's not an iterator if it doesn't define `__next__`.
>
> B. It is strongly recommended that iterators also define `__iter__`.
>
> In "standards" language, I think (A) is MUST and (B) is merely OUGHT or maybe SHOULD.
>
> On Tue, Sep 14, 2021 at 12:30 PM Brett Cannon <brett@python.org <mailto:brett@python.org>> wrote:
>
> Over in https://github.com/python/typeshed/issues/6030 <https://github.com/python/typeshed/issues/6030> I have managed to kick up a discussion over what exactly an "iterator" is. If you look at https://docs.python.org/3/library/functions.html#iter <https://docs.python.org/3/library/functions.html#iter> you will see the docs say it "Return[s] an iterator <https://docs.python.org/3/glossary.html#term-iterator> object." Great, but you go the glossary definition of "iterator" at https://docs.python.org/3/glossary.html#term-iterator <https://docs.python.org/3/glossary.html#term-iterator> you will see it says "[i]terators are required to have an |__iter__()| <https://docs.python.org/3/reference/datamodel.html#object.__iter__> method" which neither `for` nor `iter()` actually enforce.
>
> Is there something to do here? Do we loosen the definition of "iterator" to say they /should/ define __iter__? Leave it as-is with an understanding that we know that it's technically inaccurate for iter() but that we want to encourage people to define __iter__? I'm assuming people don't want to change `for` and `iter()` to start requiring __iter__ be defined if we decided to go down the "remove the __aiter__ requirement" from aiter() last week.
>
> BTW all of this applies to async iterators as well.
>
Patrick
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
I think it's also worth noting that a missing "`__iter__` that returns self" is trivial to recover from... just use a new reference to the iterator instead. The overhead of a method call for this convention almost seems silly.

What worries me most about changing the current "requirement" is that it may create either confusion or backward compatibility issues for `collections.abc.Iterator` (which is a subtype of `Iterable`, and thus requires `__iter__`).
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RSV6MOIBVNEFKL4NDHTKDVSGVABVY65Q/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Tue, Sep 14, 2021 at 3:49 PM Brandt Bucher <brandtbucher@gmail.com>
wrote:

> I think it's also worth noting that a missing "`__iter__` that returns
> self" is trivial to recover from... just use a new reference to the
> iterator instead. The overhead of a method call for this convention almost
> seems silly.
>

The use case is this:

def foo(it):
for x in it:
print(x)

def main():
it = iter([1, 2, 3])
next(it)
foo(it)

Since "for x in it" calls iter(it), if the argument is an iterator that
doesn't define __iter__, it would fail. But this is all about convention --
we want to make it convenient to do this kind of thing, so all standard
iterators define __iter__ as well as __next__.


> What worries me most about changing the current "requirement" is that it
> may create either confusion or backward compatibility issues for
> `collections.abc.Iterator` (which is a subtype of `Iterable`, and thus
> requires `__iter__`).
>

If you explicitly inherit from Iterator, you inherit a default
implementation of __iter__ (that returns self, of course). If you merely
register, it's up to you to comply. And sometimes people register things
that don't follow the letter of the protocol, just to get things going.
(This is common for complex protocols like Mapping, where some function you
have no control over insists on a Mapping but only calls one or two common
methods.

Duck typing is alive and kicking!

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
Guido van Rossum wrote:
> On Tue, Sep 14, 2021 at 3:49 PM Brandt Bucher brandtbucher@gmail.com
> wrote:
> > I think it's also worth noting that a missing "`__iter__` that returns
> > self" is trivial to recover from... just use a new reference to the
> > iterator instead. The overhead of a method call for this convention almost
> > seems silly.
> The use case is this:

Yeah, I understand that. But what I'm hinting that is that the `GET_ITER` opcode and `iter` builtin *could* gracefully handle this situation when called on something that doesn't define `__iter__` but does define `__next__`. Pseudocode:

def iter(o):
if hasattr(o, "__iter__"):
return o.__iter__()
elif hasattr(o, "__next__"):
# Oh well, o.__iter__() would have just returned o anyways...
return o
raise TypeError

This would be implemented at the lowest possible level, in `PyObject_GetIter`.

> > What worries me most about changing the current "requirement" is that it
> > may create either confusion or backward compatibility issues for
> > `collections.abc.Iterator` (which is a subtype of `Iterable`, and thus
> > requires `__iter__`).
> If you explicitly inherit from Iterator, you inherit a default
> implementation of __iter__ (that returns self, of course). If you merely
> register, it's up to you to comply. And sometimes people register things
> that don't follow the letter of the protocol, just to get things going.
> (This is common for complex protocols like Mapping, where some function you
> have no control over insists on a Mapping but only calls one or two common
> methods.

Yeah, I was thinking about cases like `isinstance(o, Iterator)`, where `o` defines `__iter__` but not `__next__`. Even though this code might start returning the "right" answer, it's still a backward-compatibility break. Not sure what the true severity would be, though...
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SDZDMAF4MJDZHKIIWO2UUNRG6ZV2EU55/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Tue, Sep 14, 2021 at 4:33 PM Brandt Bucher <brandtbucher@gmail.com>
wrote:

> Guido van Rossum wrote:
> > On Tue, Sep 14, 2021 at 3:49 PM Brandt Bucher brandtbucher@gmail.com
> > wrote:
> > > I think it's also worth noting that a missing "`__iter__` that returns
> > > self" is trivial to recover from... just use a new reference to the
> > > iterator instead. The overhead of a method call for this convention
> almost
> > > seems silly.
> > The use case is this:
>
> Yeah, I understand that. But what I'm hinting that is that the `GET_ITER`
> opcode and `iter` builtin *could* gracefully handle this situation when
> called on something that doesn't define `__iter__` but does define
> `__next__`. Pseudocode:
>
> def iter(o):
> if hasattr(o, "__iter__"):
> return o.__iter__()
> elif hasattr(o, "__next__"):
> # Oh well, o.__iter__() would have just returned o anyways...
> return o
> raise TypeError
>
> This would be implemented at the lowest possible level, in
> `PyObject_GetIter`.
>

That seems like violating the Zen: "Errors should never pass silently." It
would certainly have a ripple effect, since everyone who currently defines
a __iter__ (in C or Python) that returns self would want to remove it, and
documentation would need to be updated everywhere. I don't see this issue
as important enough to do that. There are also probably multiple things
that emulate iter() that would have to be updated to match, if builtin
iter() starts changing its behavior.

TBH I don't think there is an *actual* problem here. I think it's just
about choosing the right wording for the glossary (which IMO does not have
status as a source of truth anyway).


> > > What worries me most about changing the current "requirement" is that
> it
> > > may create either confusion or backward compatibility issues for
> > > `collections.abc.Iterator` (which is a subtype of `Iterable`, and thus
> > > requires `__iter__`).
> > If you explicitly inherit from Iterator, you inherit a default
> > implementation of __iter__ (that returns self, of course). If you merely
> > register, it's up to you to comply. And sometimes people register things
> > that don't follow the letter of the protocol, just to get things going.
> > (This is common for complex protocols like Mapping, where some function
> you
> > have no control over insists on a Mapping but only calls one or two
> common
> > methods.
>
> Yeah, I was thinking about cases like `isinstance(o, Iterator)`, where `o`
> defines `__iter__` but not `__next__`.


(Did you mean the other way around? __iter__ without next is an Iterable
but not an Iterator. And isinstance() returns the right answer for this.)


> Even though this code might start returning the "right" answer, it's still
> a backward-compatibility break. Not sure what the true severity would be,
> though...
>

The ABC Iterator does not define the concept Iterator though. And static
type checking is not meant to exactly follow all the rules of the language
anyway -- there are many approximations being made by static type checkers.

Regarding the meaning of "requires", not all requirements are checked at
runtime either.

But I expect we won't be able to make everyone happy here.

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
Guido van Rossum wrote:
> TBH I don't think there is an *actual* problem here. I think it's just
> about choosing the right wording for the glossary (which IMO does not have
> status as a source of truth anyway).

Good point. I'm probably approaching this from the wrong angle (by trying to "fix" the language, rather than the docs).
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZUESGSZ2BIBZZI42ZUMCQVWUX3STIO6V/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Tue, Sep 14, 2021 at 12:33:32PM -0700, Guido van Rossum wrote:
> My view of this is:
>
> A. It's not an iterator if it doesn't define `__next__`.
>
> B. It is strongly recommended that iterators also define `__iter__`.
>
> In "standards" language, I think (A) is MUST and (B) is merely OUGHT or
> maybe SHOULD.

That's not what the docs say :-)

https://docs.python.org/3/library/stdtypes.html#iterator-types

Part of the problem is that there are two kinds of thing that we call
"iterator":

1. Objects that we implicitly or explicitly pass to `iter()` in order to
return an interator object; they only need to define an `__iter__`
method that returns the actual iterator object itself.

(That's a slight simplification, because iter() will fall back on the
Sequence Protocol if `__iter__` isn't defined. But to my mind, that
makes Sequence Protocol objects *iterables* not iterators.)


2. Iterator objects themselves, which are defined by a protocol, not a
type. The iterator object MUST define both `__iter__` and `__next__`,
and the `__iter__` method MUST return self.


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RBM2IBQUKA6NGYIEW4WQRQ2EJ3I4OZY2/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
If it helps, I have tons of code that tests for iterators using:

iter(obj) is obj

That has been a documented requirement for the iterator protocol
forever. Its in the PEP.

"A class that wants to be an iterator should implement two methods: a
next() method that behaves as described above, and an __iter__() method
that returns self."

https://www.python.org/dev/peps/pep-0234/

We have objects such that:

iter(obj)

returns an iterator, but aren't themselves iterators. The most common
example of that would be, I think, classes that define __iter__ as a
generator method:

class A:
def __iter__(self):
for x in range(10):
yield x

Then we have actual iterators, like iter(A()). They define `__iter__`
that returns self.

I don't know what I would call an object that only has __next__,
apart from "broken" :-(


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6FB5AT2IQENUSRZZT7G2CE3LDEDN2WNQ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Tue, Sep 14, 2021 at 9:03 PM Steven D'Aprano <steve@pearwood.info> wrote:

> On Tue, Sep 14, 2021 at 12:33:32PM -0700, Guido van Rossum wrote:
> > My view of this is:
> >
> > A. It's not an iterator if it doesn't define `__next__`.
> >
> > B. It is strongly recommended that iterators also define `__iter__`.
> >
> > In "standards" language, I think (A) is MUST and (B) is merely OUGHT or
> > maybe SHOULD.
>
> That's not what the docs say :-)
>
> https://docs.python.org/3/library/stdtypes.html#iterator-types
>

Huh, so it does. And in very clear words as well. I still don't think this
should be enforced by checks for the presence of __iter__ in situations
where it's not going to be called (e.g. in iter() itself and in "for x in
it"). But since this is a longstanding convention and matches
collections.abc.Iterator (and typing.Iterator) we might as well *document*
it to be the case.


> Part of the problem is that there are two kinds of thing that we call
> "iterator":
>
> 1. Objects that we implicitly or explicitly pass to `iter()` in order to
> return an interator object; they only need to define an `__iter__`
> method that returns the actual iterator object itself.
>

No, we don't call that an iterator. That's an *Iterable*. In this the docs
you point to are actually weak:

- It doesn't use the term Iterable at all but describe it as "container
objects" or "containers".

- It says " Sequences, described below in more detail, always support the
iteration methods." That's wrong, or at the very least misleading, since a
sequence itself *only* supports __iter__ -- it's the Iterator returned by
s.__iter__() that supports __next__.


> (That's a slight simplification, because iter() will fall back on the
> Sequence Protocol if `__iter__` isn't defined. But to my mind, that
> makes Sequence Protocol objects *iterables* not iterators.)
>

Right, it's wrong.


> 2. Iterator objects themselves, which are defined by a protocol, not a
> type. The iterator object MUST define both `__iter__` and `__next__`,
> and the `__iter__` method MUST return self.
>

So you say. I will compromise and agree that Iterators MUST have __next__
and SHOULD have __iter__ returning self. The distinction is that without
__next__ it's not an Iterator. But without __iter__ it's merely a broken
Iterator (that nevertheless works in most situations).

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Tue, Sep 14, 2021 at 9:31 PM Steven D'Aprano <steve@pearwood.info> wrote:

> If it helps, I have tons of code that tests for iterators using:
>
> iter(obj) is obj
>
> That has been a documented requirement for the iterator protocol
> forever. Its in the PEP.
>
> "A class that wants to be an iterator should implement two methods: a
> next() method that behaves as described above, and an __iter__() method
> that returns self."
>
> https://www.python.org/dev/peps/pep-0234/
>

However, the description clarifies that the reason for requiring __iter__
is weaker than the reason for requiring __next__.


> We have objects such that:
>
> iter(obj)
>
> returns an iterator, but aren't themselves iterators.


Yeah, those are Iterables.


> The most common
> example of that would be, I think, classes that define __iter__ as a
> generator method:
>
> class A:
> def __iter__(self):
> for x in range(10):
> yield x
>
> Then we have actual iterators, like iter(A()). They define `__iter__`
> that returns self.
>
> I don't know what I would call an object that only has __next__,
> apart from "broken" :-(
>

It's still an iterator, since it duck-types in most cases where an iterator
is required (notably "for", which is the primary use case for the iteration
protocols -- it's in the first sentence of PEP 234's abstract).

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Tue, Sep 14, 2021 at 09:38:38PM -0700, Guido van Rossum wrote:

> > I don't know what I would call an object that only has __next__,
> > apart from "broken" :-(
> >
>
> It's still an iterator, since it duck-types in most cases where an iterator
> is required (notably "for", which is the primary use case for the iteration
> protocols -- it's in the first sentence of PEP 234's abstract).

I don't think it duck-types as an iterator. Here's an example:


class A:
def __init__(self): self.items = [1, 2, 3]
def __next__(self):
try: return self.items.pop()
except IndexError: raise StopIteration


class B:
def __iter__(self):
return A()


It's fine to iterate over B() directly, but you can't iterate over
A() at all. If you try, you get a TypeError:

>>> for item in A(): pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'A' object is not iterable


In practice, this impacts some very common techniques. For instance,
pre-calling iter() on your input.


>>> x = B()
>>> it = iter(x)
>>> for value in it: pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'A' object is not iterable


There are all sorts of reasons why one might pre-call iter(). One common
one is to pre-process the first element:

it = iter(obj)
first = next(obj, None)
for item in it: ...

Another is to test for an iterable. iter(obj) will raise TypeError if
obj is not a sequence, collection, iterator, iterable etc.

Another is to break out of one loop and then run another:

it = iter(obj)
for x in it:
if condition: break
do_something()

for x in it:
something_else()


I'm sure there are others I haven't thought of.


I believe that iterable objects that define `__next__` but not
`__iter__` are fundamentally broken. If they happen to work in some
circumstances but not others, that's because the iterator protocol is
relaxed enough to work with broken iterators :-)



--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/54WUEDKDIT7FXH3JHL34VZDJCFV5Q3FH/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Tue, Sep 14, 2021 at 04:50:05PM -0700, Guido van Rossum wrote:

> TBH I don't think there is an *actual* problem here. I think it's just
> about choosing the right wording for the glossary (which IMO does not have
> status as a source of truth anyway).

+1


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BIY52VZAIX7LE5CYNBLZDORJJ6II6MGL/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Tue, Sep 14, 2021 at 11:44 PM Steven D'Aprano <steve@pearwood.info>
wrote:

> On Tue, Sep 14, 2021 at 09:38:38PM -0700, Guido van Rossum wrote:
>
> > > I don't know what I would call an object that only has __next__,
> > > apart from "broken" :-(
> > >
> >
> > It's still an iterator, since it duck-types in most cases where an
> iterator
> > is required (notably "for", which is the primary use case for the
> iteration
> > protocols -- it's in the first sentence of PEP 234's abstract).
>
> I don't think it duck-types as an iterator. Here's an example:
>
>
> class A:
> def __init__(self): self.items = [1, 2, 3]
> def __next__(self):
> try: return self.items.pop()
> except IndexError: raise StopIteration
>
>
> class B:
> def __iter__(self):
> return A()
>
>
> It's fine to iterate over B() directly, but you can't iterate over
> A() at all. If you try, you get a TypeError:
>
> >>> for item in A(): pass
> ...
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: 'A' object is not iterable
>

Yes, we all understand that. The reason I invoked "duck typing" is that as
long as you don't use the iterator in a situation where iter() is called on
it, it works fine. Just like a class with a readline() method works fine in
some cases where a file is expected.


> In practice, this impacts some very common techniques. For instance,
> pre-calling iter() on your input.
>
>
> >>> x = B()
> >>> it = iter(x)
> >>> for value in it: pass
> ...
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: 'A' object is not iterable
>
>
> There are all sorts of reasons why one might pre-call iter(). One common
> one is to pre-process the first element:
>
> it = iter(obj)
> first = next(obj, None)
> for item in it: ...
>
> Another is to test for an iterable. iter(obj) will raise TypeError if
> obj is not a sequence, collection, iterator, iterable etc.
>
> Another is to break out of one loop and then run another:
>
> it = iter(obj)
> for x in it:
> if condition: break
> do_something()
>
> for x in it:
> something_else()
>
>
> I'm sure there are others I haven't thought of.
>

No-one is arguing that an iterator that doesn't define __iter__ is great.
And the docs should continue to recommend strongly to add an __iter__
method returning self.

My only beef is with over-zealous people who might preemptively want to
reject an iterator at runtime that only has __next__; in particular "for"
and iter() have no business checking for this attribute ("for" only needs
__next__, and iter() only should check for the minimal version of the
protocol to reject things without __next__).


> I believe that iterable objects that define `__next__` but not
> `__iter__` are fundamentally broken. If they happen to work in some
> circumstances but not others, that's because the iterator protocol is
> relaxed enough to work with broken iterators :-)
>

Your opinion is loud and clear. I just happen to disagree.

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On 9/15/2021 12:33 AM, Guido van Rossum wrote:
> On Tue, Sep 14, 2021 at 9:03 PM Steven D'Aprano <steve@pearwood.info
> <mailto:steve@pearwood.info>> wrote:
>
> On Tue, Sep 14, 2021 at 12:33:32PM -0700, Guido van Rossum wrote:
> > My view of this is:
> >
> > A. It's not an iterator if it doesn't define `__next__`.
> >
> > B. It is strongly recommended that iterators also define `__iter__`.
> >
> > In "standards" language, I think (A) is MUST and (B) is merely
> OUGHT or
> > maybe SHOULD.
>
> That's not what the docs say :-)
>
> https://docs.python.org/3/library/stdtypes.html#iterator-types
> <https://docs.python.org/3/library/stdtypes.html#iterator-types>

Like Steven, I consider 'iterators are iterables' to be a very positive
feature.

> Huh, so it does. And in very clear words as well. I still don't think
> this should be enforced by checks for the presence of __iter__ in
> situations where it's not going to be called (e.g. in iter() itself and
> in "for x in it").

I agree with this also as I consider 'duck typing' (delayed type
checking by use) and 'consenting adults' (break rules at one's own risk)
to also be features. If iter were to check for __iter__ on the return
object, it might as well call it to see if it returns the same object.
That might be appropriate for a 'SargentPython' implementation, but, to
me, not for CPython.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DJPR23CY6N3GP6DDULPBMSHWH7JKWMFT/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
Guido:
> It's still an iterator, since it duck-types in most cases where an iterator
> is required (notably "for", which is the primary use case for the iteration
> protocols -- it's in the first sentence of PEP 234's abstract).

D'Aprano:
> I don't think it duck-types as an iterator. Here's an example:
>
> class A:
> def __init__(self): self.items = [1, 2, 3]
> def __next__(self):
> try: return self.items.pop()
> except IndexError: raise StopIteration
>
> >>> for item in A(): pass
> ...
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: 'A' object is not iterable

Guido:
> Yes, we all understand that. The reason I invoked "duck typing" is that as
> long as you don't use the iterator in a situation where iter() is called
> on it, it works fine.


I'm confused.

- a "broken" iterator should be usable in `for`;
- `A` is a broken iterator;

but

- `A()` is not usable in `for`.

What am I missing?

--
~Ethan~
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZMDWM7ICFLD5R7URT2ME4WNYBVQZKNUT/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Wed, Sep 15, 2021 at 3:54 PM Ethan Furman <ethan@stoneleaf.us> wrote:

> Guido:
> > It's still an iterator, since it duck-types in most cases where an
> iterator
> > is required (notably "for", which is the primary use case for the
> iteration
> > protocols -- it's in the first sentence of PEP 234's abstract).
>
> D'Aprano:
> > I don't think it duck-types as an iterator. Here's an example:
> >
> > class A:
> > def __init__(self): self.items = [1, 2, 3]
> > def __next__(self):
> > try: return self.items.pop()
> > except IndexError: raise StopIteration
> >
> > >>> for item in A(): pass
> > ...
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in <module>
> > TypeError: 'A' object is not iterable
>
> Guido:
> > Yes, we all understand that. The reason I invoked "duck typing" is that
> as
> > long as you don't use the iterator in a situation where iter() is called
> > on it, it works fine.
>
>
> I'm confused.
>
> - a "broken" iterator should be usable in `for`;
> - `A` is a broken iterator;
>
> but
>
> - `A()` is not usable in `for`.
>
> What am I missing?
>

Steven's class A is the kind of class a custom sequence might return from
its __iter__ method. E.g.

class S:
def __iter__(self):
return A()

Now this works:

for x in S(): ...

However this doesn't:

for x in iter(S()): ...

In Steven's view, A does not deserve to work in the former case: Because A
is a "broken" iterator, he seems to want it rejected by the iter() call
that is *implicit* in the for-loop.

Reminder about how for-loops work:

This:

for x in seq:
<body>

translates (roughly) to this:

_it = iter(seq)
while True:
try:
x = next(_it)
except StopIteration:
break
<body>

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
Note: I am all for not enforcing anything here -- let's keep duck typing
alive!

If static type checkers want to be more pedantic, they can be -- that's
kinda what they are for :-)

But the OP wrote:

"""
"[i]terators are required to have an __iter__()
<https://docs.python.org/3/reference/datamodel.html#object.__iter__> method"
which neither `for` nor `iter()` actually enforce.
"""

I'm confused -- as far as I can tell `for` does enforce this -- well, it
doesn't enforce it, but it does require it, which is the same thing, yes?
But does it need to?

On Wed, Sep 15, 2021 at 4:07 PM Guido van Rossum <guido@python.org> wrote:

> Reminder about how for-loops work:
>
> This:
>
> for x in seq:
> <body>
>
> translates (roughly) to this:
>
> _it = iter(seq)
> while True:
> try:
> x = next(_it)
> except StopIteration:
> break
> <body>
>

exactly -- that call to iter is always made, yes?

The "trick" here is that we want it to be easy to use a for loop with
either an iterable or an iterator. Otherwise, we would require people to
write:

for i in iter(a_sequence):
...

which I doubt anyone would want, backward compatibility aside.

And since iter() is going to always get called, we need __iter__ methods
that return self.

However, I suppose one could do a for loop something like this instead.

_it = seq
while True:
try:
x = next(_it)
except TypeError:
_it = iter(_it)
x = next(_it)
except StopIteration:
break
<body>

That is, instead of making every iterator an iterable, keep the two
concepts more distinct:

An "Iterator" has a __next__ method that returns an item or raises
StopIteration.

An "Iterable" has an __iter__ method that returns an iterator.

That would mean that one couldn't write a single class that is both an
iterable and an iterator, and uses (abuses) __iter__ to reset itself. But
would that be a bad thing?

Anyway, this is just a mental exercise, I am not suggesting changing
anything.

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Wed, Sep 15, 2021 at 08:57:58AM -0700, Guido van Rossum wrote:

[...]
> Yes, we all understand that. The reason I invoked "duck typing" is that as
> long as you don't use the iterator in a situation where iter() is called on
> it, it works fine. Just like a class with a readline() method works fine in
> some cases where a file is expected.

Okay, you've convinced me that perhaps duck typing is an appropriate
term to use.

But I hope we wouldn't be arguing that a class with only a readline()
method *is* a file object and changing the docs to support that view :-)


[...]
> No-one is arguing that an iterator that doesn't define __iter__ is great.

I'm arguing that it's not an iterator at all, even if you can use it in
place of an iterator under some circumstances. As you pointed out, there
is already a name for that: iterable.


> And the docs should continue to recommend strongly to add an __iter__
> method returning self.

Agreed. That's part of the iterator protocol.

If some objects don't need to support the full iterator protocol in
order to get the job done, then that's great, and people should be
allowed to support only the part of the protocol they need.


> My only beef is with over-zealous people who might preemptively want to
> reject an iterator at runtime that only has __next__; in particular "for"
> and iter() have no business checking for this attribute ("for" only needs
> __next__, and iter() only should check for the minimal version of the
> protocol to reject things without __next__).

Again, I agree. `for` and iter() should only check for the minimum of
what they need.


> > I believe that iterable objects that define `__next__` but not
> > `__iter__` are fundamentally broken. If they happen to work in some
> > circumstances but not others, that's because the iterator protocol is
> > relaxed enough to work with broken iterators :-)
> >
>
> Your opinion is loud and clear. I just happen to disagree.

I think we're in violent agreement here :-)

Obligatory Argument Sketch video:

https://www.youtube.com/watch?v=ohDB5gbtaEQ


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5PCJG3L725HTIANQDQRQAL2S6XE3IQM2/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Wed, Sep 15, 2021 at 04:01:31PM -0700, Guido van Rossum wrote:

> Steven's class A is the kind of class a custom sequence might return from
> its __iter__ method. E.g.
>
> class S:
> def __iter__(self):
> return A()

Correct, where A itself has a `__next__` method but no `__iter__`
method.


> Now this works:
>
> for x in S(): ...

Agreed.


> However this doesn't:
>
> for x in iter(S()): ...

Correct, but *in practice* nobody would actually write it like that,
since that would be silly. But what can happen is that one might have
earlier called iter() directly, and only afterwards used the result in a
for loop.

it = iter(S())
# assert isinstance(it, A)
...
for x in it: ...

Or we can short-cut the discussion and just write it like this:

for x in A(): ...

which clearly fails because A has no `__iter__` method. When we write it
like that, it is clear that A is not an iterator. The waters are only
muddied because *most of the time* we don't write it like that, we do
the simplest thing that can work:

for x in S(): ...

which does work.

So the question is, in that last snippet, the version that *does* work,
what are we iterating over? Are we iterating over S() or A()?

I think the answer is Yes :-)


> In Steven's view, A does not deserve to work in the former case: Because A
> is a "broken" iterator, he seems to want it rejected by the iter() call
> that is *implicit* in the for-loop.

No, I'm not arguing that.

1. It's not a matter of "deserves", it is that A instances cannot be
used *directly* in a for loop, because they have no `__iter__` method.

2. I don't want iter() or the for loop to reject *S* instances just
because A instances don't have `__iter__`.

3. I don't need to propose that for loops reject A instances, since
they already do that. That's the status quo, and it's working correctly
according to the iterator protocol.

The bottom line here is that I'm not asking for any runtime changes here
at all. Perhaps improving the docs would be a good thing, and honestly
I'm unsure what typeshed should do. I suppose that depends on whether
you see the role of static type checking to be as strict as possible or
as forgiving as possible.

If you want your type checking to be strict, then maybe you want it to
flag A as not an iterator. If you want it to accept anything that works,
maybe you want it to allow S as an iterator.

On the typeshed issue, Akuli comments that they have a policy of
preferring false negatives. So I think that nothing needs to be done?

https://github.com/python/typeshed/issues/6030#issuecomment-918544344



--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3BQ2KIYFVRDKRK3HFLAFPS2GCBEAT24R/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Thu, 16 Sept 2021 at 01:30, Chris Barker via Python-Dev
<python-dev@python.org> wrote:
> """
> "[i]terators are required to have an __iter__() method" which neither `for` nor `iter()` actually enforce.
> """
>
> I'm confused -- as far as I can tell `for` does enforce this -- well, it doesn't enforce it, but it does require it, which is the same thing, yes? But does it need to?

for enforces that *iterables* have an __iter__ method, not
*iterators*. (for takes an iterable, not an iterator, and uses
__iter__ to *get* an iterator from it).

The debate here is (I think!) whether an *iterator* that is not also
an *iterable* is a valid iterator.

IMO it is valid (because that's what the definitions say, basically)
but it may not be *useful* in certain circumstances, and it definitely
may not be *expected* (because nearly all iterators are iterables).
"Broken" is a strong word to use, though, and that might be why the
debate is continuing this long...

Paul
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UIKBKWT5G4ME2LVZ3W6RYRK5ESNBEZBQ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On Wed, Sep 15, 2021 at 4:06 PM Guido van Rossum <guido@python.org> wrote:

> [SNIP]
> Reminder about how for-loops work:
>
> This:
>
> for x in seq:
> <body>
>
> translates (roughly) to this:
>
> _it = iter(seq)
> while True:
> try:
> x = next(_it)
> except StopIteration:
> break
> <body>
>

And if anyone wants more details on this, I have a blog post about it at
https://snarky.ca/unravelling-for-statements/ .
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
On 9/16/2021 3:02 AM, Paul Moore wrote:

> The debate here is (I think!) whether an *iterator* that is not also
> an *iterable* is a valid iterator.

This framing of the question seems biased in that it initially uses
'iterator' to mean 'object with __next__ but not __iter__' whe the
propriety of that equating is at least half of the debate.

> IMO it is valid (because that's what the definitions say, basically)

The definitions pretty much answer the question above in the negative.

https://www.python.org/dev/peps/pep-0234/
C-API:
"Iterators ought to implement the tp_iter slot as returning a
reference to themselves; this is needed to make it possible to use an
iterator (as opposed to a sequence) in a for loop."
Python-API"
" A class that wants to be an iterator should implement two methods: a
next() method that behaves as described above, and an __iter__() method
that returns self." ... "Iterators are currently required to support
both protocols."

The clear intention is that iterators be usable as iterables.

https://docs.python.org/3/glossary.html
iterator:
" Iterators are required to have an __iter__() method that returns the
iterator object itself so every iterator is also iterable and may be
used in most places where other iterables are accepted."

> but it may not be *useful* in certain circumstances, and it definitely
> may not be *expected* (because nearly all iterators are iterables).
> "Broken" is a strong word to use, though, and that might be why the
> debate is continuing this long...

I think 'semi-iterator' might be a better term, definitely more neutral,
for an object that is maybe duck-type usable as an iterator and maybe not.

For Python code, I currently do not see a reason to omit the minimal
"def __init__(self): return self". I don't know about C code.


--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XHGL3BWV5UNZYOEPQLVT7H5YRRO3UQBU/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Should the definition of an "(async) iterator" include __iter__? [ In reply to ]
I understood that _iterables_ are required to have an __iter__ method, not
iterators.

Therefore, are we simply discussing whether all iterators should be
iterable? At the moment the CPython implementation does't require that
AFAIK.

regards
Steve

On Tue, Sep 14, 2021 at 8:39 PM Guido van Rossum <guido@python.org> wrote:

> My view of this is:
>
> A. It's not an iterator if it doesn't define `__next__`.
>
> B. It is strongly recommended that iterators also define `__iter__`.
>
> In "standards" language, I think (A) is MUST and (B) is merely OUGHT or
> maybe SHOULD.
>
> On Tue, Sep 14, 2021 at 12:30 PM Brett Cannon <brett@python.org> wrote:
>
>> Over in https://github.com/python/typeshed/issues/6030 I have managed to
>> kick up a discussion over what exactly an "iterator" is. If you look at
>> https://docs.python.org/3/library/functions.html#iter you will see the
>> docs say it "Return[s] an iterator
>> <https://docs.python.org/3/glossary.html#term-iterator> object." Great,
>> but you go the glossary definition of "iterator" at
>> https://docs.python.org/3/glossary.html#term-iterator you will see it
>> says "[i]terators are required to have an __iter__()
>> <https://docs.python.org/3/reference/datamodel.html#object.__iter__>
>> method" which neither `for` nor `iter()` actually enforce.
>>
>> Is there something to do here? Do we loosen the definition of "iterator"
>> to say they *should* define __iter__? Leave it as-is with an
>> understanding that we know that it's technically inaccurate for iter() but
>> that we want to encourage people to define __iter__? I'm assuming people
>> don't want to change `for` and `iter()` to start requiring __iter__ be
>> defined if we decided to go down the "remove the __aiter__ requirement"
>> from aiter() last week.
>>
>> BTW all of this applies to async iterators as well.
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-leave@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/3W7TDX5KNVQVGT5CUHBK33M7VNTP25DZ/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> *Pronouns: he/him **(why is my pronoun here?)*
> <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/OICGRBPLXO6WXO4CHTGUK46WIHO7PDUU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

1 2  View All