Mailing List Archive

__contains__ hook
Here's a very preliminary, very hackish version of a hook for the "in"
operator.

Basically, use like:

class spam:
def __contains__(self, o):
return 1
6 in spam() (answers 1)

I must say I was horrified by the current way the operator was handled:
very non-OO-ish. I'd much rather there'd be a slot in the sequence
interface for this method. This is why there's still no way to use the
hook with regular C extension types.

Have fun!

(BTW: I've tested it only minimally, so it might break your Python. Use
with caution).

PS.
Eric, you can implement sets the *right* way this time.

--
Moshe Zadka <mzadka@geocities.com>.
INTERNET: Learn what you know.
Share what you don't.
Re: __contains__ hook [ In reply to ]
Moshe seems eager to get comments on this post :-)

> Here's a very preliminary, very hackish version of a hook for the "in"
> operator.
>
> Basically, use like:
>
> class spam:
> def __contains__(self, o):
> return 1
> 6 in spam() (answers 1)
>
> I must say I was horrified by the current way the operator was handled:
> very non-OO-ish. I'd much rather there'd be a slot in the sequence
> interface for this method. This is why there's still no way to use the
> hook with regular C extension types.
>
> Have fun!
>
> (BTW: I've tested it only minimally, so it might break your Python. Use
> with caution).
>
> PS.
> Eric, you can implement sets the *right* way this time.

For those who, like me, are too lazy to unpack attachments, here's the
text of Moshe's patch:

> *** ../../../python/dist/src/Objects/abstract.c Fri Oct 15 14:09:02 1999
> --- Objects/abstract.c Tue Feb 1 10:34:34 2000
> ***************
> *** 1110,1115 ****
> --- 1110,1140 ----
> }
> return 0;
> }
> + /* special case for instances. Basically emulating Python code,
> + but optimizations will come later */
> + if (PyInstance_Check(w)) {
> + PyObject *py__contains__, *py_ret, *py_args;
> + int ret;
> +
> + py__contains__ = PyObject_GetAttrString(w, "__contains__");
> + if(py__contains__ == NULL)
> + return -1;
> + py_args = PyTuple_New(1);
> + if(py_args == NULL) {
> + Py_DECREF(py__contains__);
> + return -1;
> + }
> + Py_INCREF(v);
> + PyTuple_SET_ITEM(py_args, 0, v);
> + py_ret = PyObject_CallObject(py__contains__, py_args);
> + Py_DECREF(py__contains__);
> + Py_DECREF(py_args);
> + if(py_ret == NULL)
> + return -1;
> + ret = PyObject_IsTrue(py_ret);
> + Py_DECREF(py_args);
> + return ret;
> + }
>
> sq = w->ob_type->tp_as_sequence;
> if (sq == NULL || sq->sq_item == NULL) {

I like the idea of overloading 'in' (and 'not in') with __contains__.
There are several issues with this patch though (apart from the fact
that he left out the disclaimer from
http://www.python.org/1.5/bugrelease.html :-).

First of all, it actually breaks 'in' for descendants of UserList, and
other classes that define __getitem__ but not __contains__. That's
easily fixed by clearing the error and jumping forward instead of
returning an error when the GetAttrString() call fails.

Second, it's customary to define a static object variable initialized
to NULL, which is set to the interned string object; this speeds up
the lookup a bit using PyObject_GetAttr().

Micro-nit: I want a space between 'if' and '('. It just looks better.

But the real issue is what Moshe himself already brings up: contains
should have a slot in the type struct, so extension types can also
define this.

Moshe, do you feel like doing this right?

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: __contains__ hook [ In reply to ]
On 02 February 2000, Guido van Rossum said:
> I like the idea of overloading 'in' (and 'not in') with __contains__.
> There are several issues with this patch though (apart from the fact
> that he left out the disclaimer from
> http://www.python.org/1.5/bugrelease.html :-).

I agree at a language level; the current way to "overload" 'in' is...
ummm... weird. It seems like there's a a simple and natural "magic
method" corresponding to almost every operator, so any operators that
*don't* get that treatment are second-class citizens.

As for the implementation of __contains__, I'm just not familiar enough
with Python internals to comment. I'll let the rest of you argue over
that.

Greg
Re: __contains__ hook [ In reply to ]
On Wed, 2 Feb 2000, Guido van Rossum wrote:

> I like the idea of overloading 'in' (and 'not in') with __contains__.
> There are several issues with this patch though (apart from the fact
> that he left out the disclaimer from
> http://www.python.org/1.5/bugrelease.html :-).

Sorry: I'd d/l it and mail it later...

> Micro-nit: I want a space between 'if' and '('. It just looks better.

Sorry: old habits die hard. Change as you will.

> But the real issue is what Moshe himself already brings up: contains
> should have a slot in the type struct, so extension types can also
> define this.
>
> Moshe, do you feel like doing this right?

Yes, but not in the near future. Wouldn't adding a new slot break old
extension types? I'm a bit ignorant on the subject


--
Moshe Zadka <mzadka@geocities.com>.
INTERNET: Learn what you know.
Share what you don't.
Re: __contains__ hook [ In reply to ]
> > But the real issue is what Moshe himself already brings up: contains
> > should have a slot in the type struct, so extension types can also
> > define this.
> >
> > Moshe, do you feel like doing this right?
>
> Yes, but not in the near future. Wouldn't adding a new slot break old
> extension types? I'm a bit ignorant on the subject

There are some spare slots in PyTypeObject:

/* More spares */
long tp_xxx5;
long tp_xxx6;
long tp_xxx7;
long tp_xxx8;

These can be used for binary compatibility; old extensions will simply
not have the new feature.

There's also a more sophisticated feature, implemented through
tp_flags, which can indicate that an extension is aware of a
particular feature. These comments in object.h may explain this:

/*

Type flags (tp_flags)

These flags are used to extend the type structure in a backwards-compatible
fashion. Extensions can use the flags to indicate (and test) when a given
type structure contains a new feature. The Python core will use these when
introducing new functionality between major revisions (to avoid mid-version
changes in the PYTHON_API_VERSION).

Arbitration of the flag bit positions will need to be coordinated among
all extension writers who publically release their extensions (this will
be fewer than you might expect!)..

Python 1.5.2 introduced the bf_getcharbuffer slot into PyBufferProcs.

Type definitions should use Py_TPFLAGS_DEFAULT for their tp_flags value.

Code can use PyType_HasFeature(type_ob, flag_value) to test whether the
given type object has a specified feature.

*/

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: __contains__ hook [ In reply to ]
Guido van Rossum wrote:
>
> > > But the real issue is what Moshe himself already brings up: contains
> > > should have a slot in the type struct, so extension types can also
> > > define this.
> > >
> > > Moshe, do you feel like doing this right?
> >
> > Yes, but not in the near future. Wouldn't adding a new slot break old
> > extension types? I'm a bit ignorant on the subject
>
> There are some spare slots in PyTypeObject:
>
> /* More spares */
> long tp_xxx5;
> long tp_xxx6;
> long tp_xxx7;
> long tp_xxx8;
>
> These can be used for binary compatibility; old extensions will simply
> not have the new feature.
>
> There's also a more sophisticated feature, implemented through
> tp_flags, which can indicate that an extension is aware of a
> particular feature. These comments in object.h may explain this:
>
> /*
>
> Type flags (tp_flags)

Shouldn't 'in' be a slot of the sequence methods ? I'd suggest
creating a new tp_flag bit and then extending tp_as_sequence
with:

binaryfunc sq_contains;

plus of course add an abstract function to abstract.c:

PySequence_Contain(PyObject *container, PyObject *element)

which uses the above slot after testing the tp_flag setting.
Python instances, lists, tuples should then support this new
slot. We could even sneak in support for dictionaries once we
decide whether semantics whould be
1. key in dict
2. value in dict
or 3. (key,value) in dict

:-)

--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
Re: __contains__ hook [ In reply to ]
> Shouldn't 'in' be a slot of the sequence methods ? I'd suggest
> creating a new tp_flag bit and then extending tp_as_sequence
> with:
>
> binaryfunc sq_contains;
>
> plus of course add an abstract function to abstract.c:
>
> PySequence_Contain(PyObject *container, PyObject *element)

That function already exists, spelled "PySequence_Contains" (currently
it does the C equivalent of

for i in container:
if element == i: return 1
return 0

I'm not entirely sure whether the 'contains' slot should be part of
the as_sequence struct, but I suppose it makes sense historically.
(The as_number, as_sequece, as_mapping structs don't make sense at all
in the grand scheme of things, but we're stuck with them for the time
being.)

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: __contains__ hook [ In reply to ]
M.-A. Lemburg writes:
> Shouldn't 'in' be a slot of the sequence methods ? I'd suggest
> creating a new tp_flag bit and then extending tp_as_sequence

Only if we want to restrict set-like behavior to sequences, and I
don't think that's clear, though it does mirror the current
situation.
Regardless of the location of the slot, why should a flag be needed?
Testing the slot for NULL is necessary to avoid core dumps anyway.

> plus of course add an abstract function to abstract.c:
>
> PySequence_Contain(PyObject *container, PyObject *element)

There's already PySequence_In(...); see:

http://www.python.org/doc/current/api/sequence.html#l2h-135

I'm inclined to add PyObject_In(...) (or ..._Contains(); I like
Contains better than In, but there's precedence for In and that's more
important) and define the new slot on the Object using one of the
reserved spaces. That allows a clean interface for "pure" sets that
don't have to "look like" sequences or mappings.

> which uses the above slot after testing the tp_flag setting.
> Python instances, lists, tuples should then support this new
> slot. We could even sneak in support for dictionaries once we
> decide whether semantics whould be

Bait!


-Fred

--
Fred L. Drake, Jr. <fdrake at acm.org>
Corporation for National Research Initiatives
Re: __contains__ hook [ In reply to ]
> M.-A. Lemburg writes:
> > Shouldn't 'in' be a slot of the sequence methods ? I'd suggest
> > creating a new tp_flag bit and then extending tp_as_sequence

Fred Drake:
> Only if we want to restrict set-like behavior to sequences, and I
> don't think that's clear, though it does mirror the current
> situation.
> Regardless of the location of the slot, why should a flag be needed?

Because if we add a slot to the as_sequence struct, old extensions
that haven't been recompiled will appear to have garbage in that slot
(because they don't actually have it). When we use a spare slot in
the main type struct, that problem doesn't exist; but the as_sequence
struct and friends don't have spares.

> Testing the slot for NULL is necessary to avoid core dumps anyway.
>
> > plus of course add an abstract function to abstract.c:
> >
> > PySequence_Contain(PyObject *container, PyObject *element)
>
> There's already PySequence_In(...); see:

That's just a backwards compatibility alias for PySequence_Contains;
see abstract.h. (PySequence_In() was a bad name, because it has its
arguments reversed with respect to the 'in' operator:
PySequence_In(seq, item) is equivalent to item in seq; you would
expect PySequence_In(item, seq). The PySequence_Contains name
correctly suggests the (seq, item) argument order.

> http://www.python.org/doc/current/api/sequence.html#l2h-135

Maybe the docs need to be updated? (Hint, hint.)

> I'm inclined to add PyObject_In(...) (or ..._Contains(); I like
> Contains better than In, but there's precedence for In and that's more
> important) and define the new slot on the Object using one of the
> reserved spaces. That allows a clean interface for "pure" sets that
> don't have to "look like" sequences or mappings.
>
> > which uses the above slot after testing the tp_flag setting.
> > Python instances, lists, tuples should then support this new
> > slot. We could even sneak in support for dictionaries once we
> > decide whether semantics whould be
>
> Bait!

Yuck. The same argument for disallowing 'x in dict' applies to the C
API. There's already PyMapping_HasKey().

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: __contains__ hook [ In reply to ]
<meta-comment>
I like to be personally CC'ed on mails here,
and I assume arbitrarily everyone else is like me.
If you don't want to be CC'ed, please mention it
personally.
</meta-comment>

On Thu, 3 Feb 2000, Guido van Rossum wrote:

> > > which uses the above slot after testing the tp_flag setting.
> > > Python instances, lists, tuples should then support this new
> > > slot. We could even sneak in support for dictionaries once we
> > > decide whether semantics whould be
> >
> > Bait!
>
> Yuck. The same argument for disallowing 'x in dict' applies to the C
> API. There's already PyMapping_HasKey().

I totally agree with Guido -- for me, the whole point of this hack is
to avoid people asking for 'in' in dicts: this way we can code a class
'set' (as I've demonstrated), and have rational semantics to 'in' which
is just as efficient as 'dict.has_key'.

I'm not quite sure where we want to put the C API version of __contains__
- I'd add a tp_as_set, but the only method seems to be 'in', so it seems
like a waste of valuable real-estate before we are driven into
non-backwards-compatability. I think I should at least ask permission from
the owner before I move over there, trampling everything in my way<wink>

What does everyone think about that?

--
Moshe Zadka <mzadka@geocities.com>.
INTERNET: Learn what you know.
Share what you don't.
Re: __contains__ hook [ In reply to ]
Guido van Rossum writes:
> Because if we add a slot to the as_sequence struct, old extensions
> that haven't been recompiled will appear to have garbage in that slot
> (because they don't actually have it). When we use a spare slot in
> the main type struct, that problem doesn't exist; but the as_sequence
> struct and friends don't have spares.

Good point. I still think a spare slot should be used so sets don't
have to look like sequences.

> Maybe the docs need to be updated? (Hint, hint.)

Done in CVS.

> Yuck. The same argument for disallowing 'x in dict' applies to the C
> API. There's already PyMapping_HasKey().

Yep! Poorly conceived notions aren't dependent on syntax.


-Fred

--
Fred L. Drake, Jr. <fdrake at acm.org>
Corporation for National Research Initiatives
Re: __contains__ hook [ In reply to ]
Moshe Zadka writes:
> I totally agree with Guido -- for me, the whole point of this hack is
> to avoid people asking for 'in' in dicts: this way we can code a class

That's not a good enough reason to add it.

> I'm not quite sure where we want to put the C API version of __contains__
> - I'd add a tp_as_set, but the only method seems to be 'in', so it seems
> like a waste of valuable real-estate before we are driven into
> non-backwards-compatability. I think I should at least ask permission from
> the owner before I move over there, trampling everything in my way<wink>

I suspect there will be fairly few set implementations in C; there
will be something like a dictionary (kjSet might be updated, for
instance), but that's probably about it.
The "in"/"not in" operation can work off the contains slot, and I
expect set union would be expressed as +, which is already in the
as_number structure. Everything else should probably be implemented
as a method or a function rather than as an operator overload.


-Fred

--
Fred L. Drake, Jr. <fdrake at acm.org>
Corporation for National Research Initiatives
Re: __contains__ hook [ In reply to ]
Guido van Rossum wrote:
>
> > Shouldn't 'in' be a slot of the sequence methods ? I'd suggest
> > creating a new tp_flag bit and then extending tp_as_sequence
> > with:
> >
> > binaryfunc sq_contains;
> >
> > plus of course add an abstract function to abstract.c:
> >
> > PySequence_Contain(PyObject *container, PyObject *element)
>
> That function already exists, spelled "PySequence_Contains" (currently
> it does the C equivalent of
>
> for i in container:
> if element == i: return 1
> return 0

Hmm, I must have overseen that one... the above only works
for sequences, while 'in'ness only need an unordered set
to work. Perhaps we do need an abstraction for unordered
object containers after all, just like Moshe suggested.

I don't think it's top-priority, though...

> I'm not entirely sure whether the 'contains' slot should be part of
> the as_sequence struct, but I suppose it makes sense historically.
> (The as_number, as_sequece, as_mapping structs don't make sense at all
> in the grand scheme of things, but we're stuck with them for the time
> being.)

Doens't really matter where we put it -- the type object is
a mess already ;-)

--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
Re: __contains__ hook [ In reply to ]
> > Because if we add a slot to the as_sequence struct, old extensions
> > that haven't been recompiled will appear to have garbage in that slot
> > (because they don't actually have it). When we use a spare slot in
> > the main type struct, that problem doesn't exist; but the as_sequence
> > struct and friends don't have spares.
>
> Good point. I still think a spare slot should be used so sets don't
> have to look like sequences.

But they won't have to -- all the other pointers in the as_sequence
struct can be NULL. (This used to be not the case, but I've finally
given in and added NULL tests everywhere -- it was a recurring
complaint from extension writers.)

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: __contains__ hook [ In reply to ]
Guido van Rossum writes:
> But they won't have to -- all the other pointers in the as_sequence
> struct can be NULL. (This used to be not the case, but I've finally
> given in and added NULL tests everywhere -- it was a recurring
> complaint from extension writers.)

Good enough; sounds like the thing to do is to declare a set to be
an sequence that supports sq_contains, sets a flag in tp_flag, and
doesn't support the irrelevant slots in the sequence structure.


-Fred

--
Fred L. Drake, Jr. <fdrake at acm.org>
Corporation for National Research Initiatives
Re: __contains__ hook [ In reply to ]
On Thu, 3 Feb 2000, Fred L. Drake, Jr. wrote:

> > I totally agree with Guido -- for me, the whole point of this hack is
> > to avoid people asking for 'in' in dicts: this way we can code a class
>
> That's not a good enough reason to add it.

Well, it the metaphorical sense it is -- the reason people were asking for
'in' in dicts were usually because they wanted to use dictionaries as
sets. Not having a way to express with 'in' certainly seems like a wart.

> > I'm not quite sure where we want to put the C API version of __contains__
> > - I'd add a tp_as_set, but the only method seems to be 'in', so it seems
> > like a waste of valuable real-estate before we are driven into
> > non-backwards-compatability. I think I should at least ask permission from
> > the owner before I move over there, trampling everything in my way<wink>
>
> I suspect there will be fairly few set implementations in C; there
> will be something like a dictionary (kjSet might be updated, for
> instance), but that's probably about it.
> The "in"/"not in" operation can work off the contains slot, and I
> expect set union would be expressed as +, which is already in the
> as_number structure. Everything else should probably be implemented
> as a method or a function rather than as an operator overload.

Fred, I'm afraid I didn't understand you /at all/. Can you just say what
is it you're offering? There isn't a "contains" slot right now, and what
I'm wondering is where to put it.
--
Moshe Zadka <mzadka@geocities.com>.
INTERNET: Learn what you know.
Share what you don't.