Mailing List Archive

Seeking deeper understanding of python equality (==)
Hi,

I was recently trying to explain how python equality works and ran into a
gap in my knowledge. I haven't found any good pages going beneath a surface
level explanation of python equality comparison.

I'll post my investigations below. What I think I'm looking for is where in
the source code (https://github.com/python/cpython) does the equality
comparison occur. I have an idea but wanted to ask first.


Using the dis module, we see the comparison operator is a single bytecode,
which is expected.

? docker run -it --rm ubuntu:jammy
root@919d94c98191:/# apt-get update
root@919d94c98191:/# apt-get --yes install python3
root@919d94c98191:/# cat >play.py <<EOF
import dis
import uuid

def test():
x = uuid.uuid4()
y = str(x)
x == y
return

dis.dis(test)
EOF
root@f33b02fef026:/# python3 play.py
... snip ...
7 16 LOAD_FAST 0 (x)
18 LOAD_FAST 1 (y)
20 COMPARE_OP 2 (==)
22 POP_TOP
... snip ...


Stepping through the code with gdb, we see it jump from the compare
operator to the dunder-eq method on the UUID object. What I want to be able
to do is explain the in-between steps. Also, if you change `x == y` to `y
== x`, you still see the same behavior, which I assume has to do with
dunder-eq being defined on the UUID class and thus given priority.

? docker run -it --rm ubuntu:jammy
root@919d94c98191:/# apt-get update
root@919d94c98191:/# apt-get --yes install dpkg-source-gitarchive
root@919d94c98191:/# sed -i 's/^# deb-src/deb-src/' /etc/apt/sources.list
root@919d94c98191:/# apt-get update
root@919d94c98191:/# apt-get --yes install gdb python3.10-dbg
root@919d94c98191:/# apt-get source python3.10-dbg
root@919d94c98191:/# cat >play.py <<EOF
import uuid
x = uuid.uuid4()
y = str(x)
breakpoint()
x == y
EOF
root@919d94c98191:/# gdb python3.10-dbg
(gdb) dir python3.10-3.10.4/Python
(gdb) run play.py
Starting program: /usr/bin/python3.10-dbg play.py

warning: Error disabling address space randomization: Operation not
permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> //play.py(5)<module>()
-> x == y
(Pdb) s
--Call--
> /usr/lib/python3.10/uuid.py(239)__eq__()
-> def __eq__(self, other):


Thank you,
Jonathan
--
https://mail.python.org/mailman/listinfo/python-list
Re: Seeking deeper understanding of python equality (==) [ In reply to ]
Perhaps these source references are useful:

Python/ceval.c (_PyEval_EvalFrameDefault)
https://github.com/python/cpython/blob/main/Python/ceval.c#L3754-L3768
Objects/object.c (do_richcompare)
https://github.com/python/cpython/blob/42fee931d055a3ef8ed31abe44603b9b2856e04d/Objects/object.c#L661-L713

Kind regards,
Sam Ezeh


On Fri, 6 May 2022 at 18:12, Jonathan Kaczynski
<jonathan.kaczynski@guildeducation.com> wrote:
>
> Hi,
>
> I was recently trying to explain how python equality works and ran into a
> gap in my knowledge. I haven't found any good pages going beneath a surface
> level explanation of python equality comparison.
>
> I'll post my investigations below. What I think I'm looking for is where in
> the source code (https://github.com/python/cpython) does the equality
> comparison occur. I have an idea but wanted to ask first.
>
>
> Using the dis module, we see the comparison operator is a single bytecode,
> which is expected.
>
> ? docker run -it --rm ubuntu:jammy
> root@919d94c98191:/# apt-get update
> root@919d94c98191:/# apt-get --yes install python3
> root@919d94c98191:/# cat >play.py <<EOF
> import dis
> import uuid
>
> def test():
> x = uuid.uuid4()
> y = str(x)
> x == y
> return
>
> dis.dis(test)
> EOF
> root@f33b02fef026:/# python3 play.py
> ... snip ...
> 7 16 LOAD_FAST 0 (x)
> 18 LOAD_FAST 1 (y)
> 20 COMPARE_OP 2 (==)
> 22 POP_TOP
> ... snip ...
>
>
> Stepping through the code with gdb, we see it jump from the compare
> operator to the dunder-eq method on the UUID object. What I want to be able
> to do is explain the in-between steps. Also, if you change `x == y` to `y
> == x`, you still see the same behavior, which I assume has to do with
> dunder-eq being defined on the UUID class and thus given priority.
>
> ? docker run -it --rm ubuntu:jammy
> root@919d94c98191:/# apt-get update
> root@919d94c98191:/# apt-get --yes install dpkg-source-gitarchive
> root@919d94c98191:/# sed -i 's/^# deb-src/deb-src/' /etc/apt/sources.list
> root@919d94c98191:/# apt-get update
> root@919d94c98191:/# apt-get --yes install gdb python3.10-dbg
> root@919d94c98191:/# apt-get source python3.10-dbg
> root@919d94c98191:/# cat >play.py <<EOF
> import uuid
> x = uuid.uuid4()
> y = str(x)
> breakpoint()
> x == y
> EOF
> root@919d94c98191:/# gdb python3.10-dbg
> (gdb) dir python3.10-3.10.4/Python
> (gdb) run play.py
> Starting program: /usr/bin/python3.10-dbg play.py
>
> warning: Error disabling address space randomization: Operation not
> permitted
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > //play.py(5)<module>()
> -> x == y
> (Pdb) s
> --Call--
> > /usr/lib/python3.10/uuid.py(239)__eq__()
> -> def __eq__(self, other):
>
>
> Thank you,
> Jonathan
> --
> https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list
Re: Seeking deeper understanding of python equality (==) [ In reply to ]
On 7/05/22 12:22 am, Jonathan Kaczynski wrote:
> Stepping through the code with gdb, we see it jump from the compare
> operator to the dunder-eq method on the UUID object. What I want to be able
> to do is explain the in-between steps.

Generally what happens with infix operators is that the interpreter
first looks for a dunder method on the left operand. If that method
doesn't exist or returns NotImplemented, it then looks for a dunder
method on the right operand.

There is an exception if the right operand is a subclass of the
left operand -- in that case the right operand's dunder method
takes precedence.

> Also, if you change `x == y` to `y
> == x`, you still see the same behavior, which I assume has to do with
> dunder-eq being defined on the UUID class and thus given priority.

No, in that case the conparison method of str will be getting
called first, but you won't see that in pdb because it doesn't
involve any Python code. Since strings don't know how to compare
themselves with uuids, it will return NotImplemented and the
interpreter will then call uuid's method.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list
Re: Seeking deeper understanding of python equality (==) [ In reply to ]
Thank you for your responses, Sam and Greg.

The do_richcompare function is where my research originally took me, but I
feel like I'm still missing some pieces to the puzzle.

Here is my updated research since you posted your responses (I'll attach a
pdf copy too):
https://docs.google.com/document/d/10zgOMetEQtZCiYFnSS90pDnNZD7I_-MFohSy83pOieA/edit#
The summary section, in the middle, is where I've summarized my reading of
the source code.

Greg, your response here,

> Generally what happens with infix operators is that the interpreter
> first looks for a dunder method on the left operand. If that method
> doesn't exist or returns NotImplemented, it then looks for a dunder
> method on the right operand.

reads like the contents of the do_richcompare function.

What I think I'm missing is how do the dunder methods relate to
the tp_richcompare function?

Thank you,
Jonathan


On Fri, May 6, 2022 at 11:55 PM Greg Ewing <greg.ewing@canterbury.ac.nz>
wrote:

> On 7/05/22 12:22 am, Jonathan Kaczynski wrote:
> > Stepping through the code with gdb, we see it jump from the compare
> > operator to the dunder-eq method on the UUID object. What I want to be
> able
> > to do is explain the in-between steps.
>
> Generally what happens with infix operators is that the interpreter
> first looks for a dunder method on the left operand. If that method
> doesn't exist or returns NotImplemented, it then looks for a dunder
> method on the right operand.
>
> There is an exception if the right operand is a subclass of the
> left operand -- in that case the right operand's dunder method
> takes precedence.
>
> > Also, if you change `x == y` to `y
> > == x`, you still see the same behavior, which I assume has to do with
> > dunder-eq being defined on the UUID class and thus given priority.
>
> No, in that case the conparison method of str will be getting
> called first, but you won't see that in pdb because it doesn't
> involve any Python code. Since strings don't know how to compare
> themselves with uuids, it will return NotImplemented and the
> interpreter will then call uuid's method.
>
> --
> Greg
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Seeking deeper understanding of python equality (==) [ In reply to ]
Trying some new searches, I came across slotdefs in ./Objects/typeobject.c,
and those are used in the resolve_slotdups function.

The comment preceding the function says, "Note that multiple names may map
to the same slot (e.g. __eq__, __ne__ etc. all map to tp_richcompare)".

So, I'm still wondering how Py_TYPE(v)->tp_richcompare resolves to __eq__
on a user-defined class. Conversely, my understanding is, for a type
defined in cpython, like str, there is usually an explicitly
defined tp_richcompare function.

Thank you,
Jonathan


On Fri, May 13, 2022 at 8:23 PM Jonathan Kaczynski <
jonathan.kaczynski@guildeducation.com> wrote:

> Thank you for your responses, Sam and Greg.
>
> The do_richcompare function is where my research originally took me, but I
> feel like I'm still missing some pieces to the puzzle.
>
> Here is my updated research since you posted your responses (I'll attach a
> pdf copy too):
> https://docs.google.com/document/d/10zgOMetEQtZCiYFnSS90pDnNZD7I_-MFohSy83pOieA/edit#
> The summary section, in the middle, is where I've summarized my reading of
> the source code.
>
> Greg, your response here,
>
>> Generally what happens with infix operators is that the interpreter
>> first looks for a dunder method on the left operand. If that method
>> doesn't exist or returns NotImplemented, it then looks for a dunder
>> method on the right operand.
>
> reads like the contents of the do_richcompare function.
>
> What I think I'm missing is how do the dunder methods relate to
> the tp_richcompare function?
>
> Thank you,
> Jonathan
>
>
> On Fri, May 6, 2022 at 11:55 PM Greg Ewing <greg.ewing@canterbury.ac.nz>
> wrote:
>
>> On 7/05/22 12:22 am, Jonathan Kaczynski wrote:
>> > Stepping through the code with gdb, we see it jump from the compare
>> > operator to the dunder-eq method on the UUID object. What I want to be
>> able
>> > to do is explain the in-between steps.
>>
>> Generally what happens with infix operators is that the interpreter
>> first looks for a dunder method on the left operand. If that method
>> doesn't exist or returns NotImplemented, it then looks for a dunder
>> method on the right operand.
>>
>> There is an exception if the right operand is a subclass of the
>> left operand -- in that case the right operand's dunder method
>> takes precedence.
>>
>> > Also, if you change `x == y` to `y
>> > == x`, you still see the same behavior, which I assume has to do with
>> > dunder-eq being defined on the UUID class and thus given priority.
>>
>> No, in that case the conparison method of str will be getting
>> called first, but you won't see that in pdb because it doesn't
>> involve any Python code. Since strings don't know how to compare
>> themselves with uuids, it will return NotImplemented and the
>> interpreter will then call uuid's method.
>>
>> --
>> Greg
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Seeking deeper understanding of python equality (==) [ In reply to ]
On 5/14/22, Jonathan Kaczynski <jonathan.kaczynski@guildeducation.com> wrote:
>
> So, I'm still wondering how Py_TYPE(v)->tp_richcompare resolves to __eq__
> on a user-defined class. Conversely, my understanding is, for a type
> defined in cpython, like str, there is usually an explicitly
> defined tp_richcompare function.

Sometimes it's simplest to directly examine an object using a native
debugger (e.g. gdb in Linux; cdb/windbg in Windows).

With a debugger attached to the interpreter, create two classes, one
that doesn't override __eq__() and one that does:

>>> class C:
... pass
...
>>> class D:
... __eq__ = lambda s, o: False
...

In CPython, the id() of an object is its address in memory:

>>> hex(id(C))
'0x2806a705790'
>>> hex(id(D))
'0x2806a6bbfe0'

Break into the attached debugger to examine the class objects:

>>> kernel32.DebugBreak()

(1968.1958): Break instruction exception - code 80000003 (first chance)
KERNELBASE!wil::details::DebugBreak+0x2:
00007ffd`8818fd12 cc int 3

Class C uses the default object_richcompare():

0:000> ?? *((python310!PyTypeObject *)0x2806a705790)->tp_richcompare
<function> 0x00007ffd`55cac288
_object* python310!object_richcompare+0(
_object*,
_object*,
int)

Class D uses slot_tp_richcompare():

0:000> ?? *((python310!PyTypeObject *)0x2806a6bbfe0)->tp_richcompare
<function> 0x00007ffd`55cdef1c
_object* python310!slot_tp_richcompare+0(
_object*,
_object*,
int)

Source code of slot_tp_richcompare():

https://github.com/python/cpython/blob/v3.10.4/Objects/typeobject.c#L7610-L7626
--
https://mail.python.org/mailman/listinfo/python-list
Re: Seeking deeper understanding of python equality (==) [ In reply to ]
Thank you for your response Eryk.

I did try using gdb in my original post, but using breakpoint() left me in
the python layer (pdb), not the cpython layer (gdb) and I couldn't figure
out how to drop down.

I see you're using the kernel32 library, so I assume you're on windows. I
only have a mac and getting gdb functional on a mac seems to be onerous (
https://gist.github.com/mike-myers-tob/9a6013124bad7ff074d3297db2c98247),
so I'm performing my testing on a linux container running ubuntu.

The closest equivalent to putting kernel32.DebugBreak() in front of the x
== y line in my test script is os.kill(os.getpid(), signal.SIGTRAP).
Though, this drops me into the signal handling code in the cpython layer,
and after several dozen steps I did not reach the richcompare-related
functions. I'll make another attempt later today.

Thanks to everyone's responses, I have a great idea of what's going on,
now. I appreciate you all.

My goal now is to be able to work with the debugger, like Erik is, so that
next time I am able to perform this investigation in-full. Should I create
a new thread for this question?

Thank you,
Jonathan



On Sat, May 14, 2022 at 1:51 PM Eryk Sun <eryksun@gmail.com> wrote:

> On 5/14/22, Jonathan Kaczynski <jonathan.kaczynski@guildeducation.com>
> wrote:
> >
> > So, I'm still wondering how Py_TYPE(v)->tp_richcompare resolves to __eq__
> > on a user-defined class. Conversely, my understanding is, for a type
> > defined in cpython, like str, there is usually an explicitly
> > defined tp_richcompare function.
>
> Sometimes it's simplest to directly examine an object using a native
> debugger (e.g. gdb in Linux; cdb/windbg in Windows).
>
> With a debugger attached to the interpreter, create two classes, one
> that doesn't override __eq__() and one that does:
>
> >>> class C:
> ... pass
> ...
> >>> class D:
> ... __eq__ = lambda s, o: False
> ...
>
> In CPython, the id() of an object is its address in memory:
>
> >>> hex(id(C))
> '0x2806a705790'
> >>> hex(id(D))
> '0x2806a6bbfe0'
>
> Break into the attached debugger to examine the class objects:
>
> >>> kernel32.DebugBreak()
>
> (1968.1958): Break instruction exception - code 80000003 (first chance)
> KERNELBASE!wil::details::DebugBreak+0x2:
> 00007ffd`8818fd12 cc int 3
>
> Class C uses the default object_richcompare():
>
> 0:000> ?? *((python310!PyTypeObject *)0x2806a705790)->tp_richcompare
> <function> 0x00007ffd`55cac288
> _object* python310!object_richcompare+0(
> _object*,
> _object*,
> int)
>
> Class D uses slot_tp_richcompare():
>
> 0:000> ?? *((python310!PyTypeObject *)0x2806a6bbfe0)->tp_richcompare
> <function> 0x00007ffd`55cdef1c
> _object* python310!slot_tp_richcompare+0(
> _object*,
> _object*,
> int)
>
> Source code of slot_tp_richcompare():
>
>
> https://github.com/python/cpython/blob/v3.10.4/Objects/typeobject.c#L7610-L7626
>
--
https://mail.python.org/mailman/listinfo/python-list