Mailing List Archive: Memory address vs serial number in reprs

Memory address vs serial number in reprs

storchaka at gmail

Jul 19, 2020, 8:38 AM

Post #1 of 15 (1260 views)

I have problem with the location of hexadecimal memory address in custom
reprs.

<threading.BoundedSemaphore: 2/3 at 0x7ff4c26b3eb0>

vs

<threading.BoundedSemaphore at 0x7ff4c26b3eb0: 2/3>

The long hexadecimal number makes the repr longer and distracts
attention from other useful information. We could get rid of it, but it
is useful if we want to distinguish objects of the same type. Although
it is hard to distinguish long hexadecimal numbers which differ only by
few digits in the middle.

What if use serial numbers to differentiate instances?

<threading.BoundedSemaphore #5: 2/3>

where the serial number starts with 1 and increased for every new
instance of that type.

The advantages are:

* Shorter repr.
* Easier to distinguish different objects.
* The serial number is unique for the life of program and cannot be
reused (in contrary to id/memory address).

The disadvantages are:

* Increased object size and creation time.

I do not propose to use serial numbers for all objects, because it would
increase the size of objects and the fixed-size integer can be
overflowed for some short-living objects created in mass (like numbers,
strings, tuples). But only for some custom objects implemented in
Python, for which size and creation time are not critical. I want to
start with synchronization objects in threading and multiprocessing
which did not have custom reprs, than change reprs of locks and asyncio
objects.

Is it worth to do?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/E6YEXMQ4OE5YGZGRP62JOLTAGBCL6RCX/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

steve at pearwood

Jul 19, 2020, 9:33 AM

Post #2 of 15 (1260 views)

On Sun, Jul 19, 2020 at 06:38:30PM +0300, Serhiy Storchaka wrote:

> What if use serial numbers to differentiate instances?

I like this idea. It is similar to how Jython and IronPython object IDs
work:

# Jython
>>> id(None)
2
>>> id(len)
3
>>> object()
<object object at 0x4>

> I do not propose to use serial numbers for all objects, because it would
> increase the size of objects and the fixed-size integer can be
> overflowed for some short-living objects created in mass (like numbers,
> strings, tuples). But only for some custom objects implemented in
> Python, for which size and creation time are not critical. I want to
> start with synchronization objects in threading and multiprocessing
> which did not have custom reprs, than change reprs of locks and asyncio
> objects.

This sounds reasonable to me. +1

--
Steven
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PDN2DF3BTU4P3N5MD5GEHUZRAT6ETGU5/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

guido at python

Jul 19, 2020, 10:02 AM

Post #3 of 15 (1260 views)

That looks expensive, esp. for objects implemented in Python — an extra
dict entry plus a new unique int object. What is the problem you are trying
to solve for these objects specifically? Just that the hex numbers look
distracting doesn’t strike me as sufficient motivation.

On Sun, Jul 19, 2020 at 08:39 Serhiy Storchaka <storchaka@gmail.com> wrote:

> I have problem with the location of hexadecimal memory address in custom
> reprs.
>
> <threading.BoundedSemaphore: 2/3 at 0x7ff4c26b3eb0>
>
> vs
>
> <threading.BoundedSemaphore at 0x7ff4c26b3eb0: 2/3>
>
> The long hexadecimal number makes the repr longer and distracts
> attention from other useful information. We could get rid of it, but it
> is useful if we want to distinguish objects of the same type. Although
> it is hard to distinguish long hexadecimal numbers which differ only by
> few digits in the middle.
>
> What if use serial numbers to differentiate instances?
>
> <threading.BoundedSemaphore #5: 2/3>
>
> where the serial number starts with 1 and increased for every new
> instance of that type.
>
> The advantages are:
>
> * Shorter repr.
> * Easier to distinguish different objects.
> * The serial number is unique for the life of program and cannot be
> reused (in contrary to id/memory address).
>
> The disadvantages are:
>
> * Increased object size and creation time.
>
> I do not propose to use serial numbers for all objects, because it would
> increase the size of objects and the fixed-size integer can be
> overflowed for some short-living objects created in mass (like numbers,
> strings, tuples). But only for some custom objects implemented in
> Python, for which size and creation time are not critical. I want to
> start with synchronization objects in threading and multiprocessing
> which did not have custom reprs, than change reprs of locks and asyncio
> objects.
>
> Is it worth to do?
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/E6YEXMQ4OE5YGZGRP62JOLTAGBCL6RCX/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
--Guido (mobile)

Re: Memory address vs serial number in reprs [ In reply to ]

solipsis at pitrou

Jul 19, 2020, 12:17 PM

Post #4 of 15 (1259 views)

On Sun, 19 Jul 2020 18:38:30 +0300
Serhiy Storchaka <storchaka@gmail.com> wrote:
> I have problem with the location of hexadecimal memory address in custom
> reprs.
>
> <threading.BoundedSemaphore: 2/3 at 0x7ff4c26b3eb0>
>
> vs
>
> <threading.BoundedSemaphore at 0x7ff4c26b3eb0: 2/3>

How about putting it in parentheses, to point more clearly that it can
most of the time be ignored:

<threading.BoundedSemaphore: 2/3 (at 0x7ff4c26b3eb0)>

> I do not propose to use serial numbers for all objects, because it would
> increase the size of objects and the fixed-size integer can be
> overflowed for some short-living objects created in mass (like numbers,
> strings, tuples). But only for some custom objects implemented in
> Python, for which size and creation time are not critical. I want to
> start with synchronization objects in threading and multiprocessing
> which did not have custom reprs, than change reprs of locks and asyncio
> objects.
>
> Is it worth to do?

I would like it if it applied to all objects, but doing it only for
certain objects will be distracting and confusing (does the serial
number point to a specific feature? it turns out it doesn't, it's just
an arbitrary aesthetical choice).

Regards

Antoine.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7ZSD6GHNJPS3LB74RE7OCI5J3AB642EE/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

thomas.moreau.2010 at gmail

Jul 19, 2020, 1:30 PM

Post #5 of 15 (1259 views)

Dear all,

While it would be nice to have simpler identifiers for objects, it would be
hard to make it work for multiprocessing, as objects in different
interpreter would end up having the same repr. Shared objects (locks) might
also have different serial numbers depending on how many objects have been
created before it is communicated to the child process.

regards
Thomas

Le dim. 19 juil. 2020 à 21:26, Antoine Pitrou <solipsis@pitrou.net> a
écrit :

> On Sun, 19 Jul 2020 18:38:30 +0300
> Serhiy Storchaka <storchaka@gmail.com> wrote:
> > I have problem with the location of hexadecimal memory address in custom
> > reprs.
> >
> > <threading.BoundedSemaphore: 2/3 at 0x7ff4c26b3eb0>
> >
> > vs
> >
> > <threading.BoundedSemaphore at 0x7ff4c26b3eb0: 2/3>
>
> How about putting it in parentheses, to point more clearly that it can
> most of the time be ignored:
>
> <threading.BoundedSemaphore: 2/3 (at 0x7ff4c26b3eb0)>
>
> > I do not propose to use serial numbers for all objects, because it would
> > increase the size of objects and the fixed-size integer can be
> > overflowed for some short-living objects created in mass (like numbers,
> > strings, tuples). But only for some custom objects implemented in
> > Python, for which size and creation time are not critical. I want to
> > start with synchronization objects in threading and multiprocessing
> > which did not have custom reprs, than change reprs of locks and asyncio
> > objects.
> >
> > Is it worth to do?
>
> I would like it if it applied to all objects, but doing it only for
> certain objects will be distracting and confusing (does the serial
> number point to a specific feature? it turns out it doesn't, it's just
> an arbitrary aesthetical choice).
>
> Regards
>
> Antoine.
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/7ZSD6GHNJPS3LB74RE7OCI5J3AB642EE/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

Re: Memory address vs serial number in reprs [ In reply to ]

Richard at Damon-Family

Jul 19, 2020, 1:51 PM

Post #6 of 15 (1259 views)

On 7/19/20 4:30 PM, Thomas Moreau wrote:
> Dear all,
>
> While it would be nice to have simpler identifiers for objects, it
> would be hard to make it work for multiprocessing, as objects in
> different interpreter would end up having the same repr. Shared
> objects (locks) might also have different serial numbers depending on
> how many objects have been created before it is communicated to the
> child process.
>
> regards
> Thomas
>
>
My guess is that these numbers are the 'id()' of the object, which as an
implementation detail in CPython is the object address. If some other
method was chosen for generating the object id, then by necessity, there
would need to be a method to let multiple interpreters keep the number
unique, perhaps some bits being reserved for an interpreter id, and the
rest be a serial number.

--
Richard Damon
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZUEK3LQ2PBGXG4KZ2466EDNIDGNLAWR2/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

carl.shapiro at gmail

Jul 19, 2020, 4:53 PM

Post #7 of 15 (1258 views)

On Sun, Jul 19, 2020 at 1:34 PM Thomas Moreau <thomas.moreau.2010@gmail.com>
wrote:

> While it would be nice to have simpler identifiers for objects, it would
> be hard to make it work for multiprocessing, as objects in different
> interpreter would end up having the same repr. Shared objects (locks) might
> also have different serial numbers depending on how many objects have been
> created before it is communicated to the child process.
>

Adding to what was said here, there are serious implications outside of the
multiprocessing case, too...

1) In a multi-threaded Python, threads will need to contend over a per-type
counter, serializing the allocation of those counted types.

2) In a Python with tagged immediates (like fixnums, etc.) the added space
cost would disqualify counted types from being implemented as an immediate
value. This would force counted types to be heap-allocated and suffer from
the aforementioned serialization.

Re: Memory address vs serial number in reprs [ In reply to ]

storchaka at gmail

Jul 25, 2020, 2:03 AM

Post #8 of 15 (1246 views)

19.07.20 19:33, Steven D'Aprano ????:
> On Sun, Jul 19, 2020 at 06:38:30PM +0300, Serhiy Storchaka wrote:
>
>> What if use serial numbers to differentiate instances?
>
> I like this idea. It is similar to how Jython and IronPython object IDs
> work:
>
>
> # Jython
> >>> id(None)
> 2
> >>> id(len)
> 3
> >>> object()
> <object object at 0x4>

No, I do not propose to change object IDs. I proposed only to use serial
numbers instead of IDs in reprs of some classes.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PJOBIIV52J3AHKT7SBVQLPMY46SS36AO/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

steve at pearwood

Jul 25, 2020, 2:21 AM

Post #9 of 15 (1246 views)

On Sat, Jul 25, 2020 at 12:03:55PM +0300, Serhiy Storchaka wrote:
> 19.07.20 19:33, Steven D'Aprano ????:

> No, I do not propose to change object IDs. I proposed only to use serial
> numbers instead of IDs in reprs of some classes.

Yes, I understood that you were only talking about reprs, and only for a
few classes. I was pointing out a similarity, that was all.

I'm sorry if I wasn't clear enough.

--
Steven
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/S5RXFXDPMVN77X56LPZDAIK3ZZQVRVWK/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

storchaka at gmail

Jul 25, 2020, 2:26 AM

Post #10 of 15 (1246 views)

19.07.20 20:02, Guido van Rossum ????:
> That looks expensive, esp. for objects implemented in Python — an extra
> dict entry plus a new unique int object. What is the problem you are
> trying to solve for these objects specifically? Just that the hex
> numbers look distracting doesn’t strike me as sufficient motivation.

It is the main problem that I want to solve. " at 0x7ff4c26b3eb0" is 18
characters long, and first and last digits usually are the same for
different objects.

Also, since objects can reuse memory after destroying other objects,
unique identifier can help to analyze logs.

It is not so expensive. New dict entry does not cost anything if the
object already has a dict (space for 5 entries is reserved from the
start). The size of small integer up to 2**30 is 28 bytes, and integers
up to 255 does not cost anything. It is minor in comparison with the
Python object size (48 bytes), dict size (104 bytes), and the size of
other object attributes (locks, counters, etc). It is very unlikely the
program will have millions of semaphores or event objects at one time,
it is most likely it will use tens of them.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/P2KSIMU3FHWBCHVPRLF2QBLLWYZM6ILA/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

storchaka at gmail

Jul 25, 2020, 2:30 AM

Post #11 of 15 (1246 views)

19.07.20 22:17, Antoine Pitrou ????:
> How about putting it in parentheses, to point more clearly that it can
> most of the time be ignored:
>
> <threading.BoundedSemaphore: 2/3 (at 0x7ff4c26b3eb0)>

It will just make the repr 2 characters longer and will not solve other
problems (that first and last digits of the identifier of different
objects usually are the same, and that the same identifier can be used
for different objects in different time).
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MFQLJFTTORI7QV5JCBZPJ6WPFRPX2WLG/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

storchaka at gmail

Jul 25, 2020, 2:32 AM

Post #12 of 15 (1246 views)

19.07.20 23:30, Thomas Moreau ????:
> While it would be nice to have simpler identifiers for objects, it would
> be hard to make it work for multiprocessing, as objects in different
> interpreter would end up having the same repr. Shared objects (locks)
> might also have different serial numbers depending on how many objects
> have been created before it is communicated to the child process.

Multiprocessing synchronization objects can include PID in the repr.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NLQL3UYEGG4PLUMWDSIXIZMYPIPU5YI4/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

ja.py at farowl

Jul 25, 2020, 2:57 AM

Post #13 of 15 (1246 views)

On 19/07/2020 16:38, Serhiy Storchaka wrote:
> I have problem with the location of hexadecimal memory address in
> custom reprs.
>
I agree they are "noise" mostly and difficult to distinguish when you
need to.
> What if use serial numbers to differentiate instances?
>
> <threading.BoundedSemaphore #5: 2/3>
>
> where the serial number starts with 1 and increased for every new
> instance of that type.

What would happen at a __class__ assignment?

IIUC class assignability is an equivalence relation amongst types:
serial numbers would have to be unique within the equivalence class, not
within the type. Otherwise, they would have to change (unlike id()), may
not round-trip if __class__ were assigned there and back.

Jeff Allen

Re: Memory address vs serial number in reprs [ In reply to ]

Jul 25, 2020, 7:28 AM

Post #14 of 15 (1245 views)

Hi Serhiy,

Can I suggest using a short hash of the id as a prefix to the id?

<object object at 0x7fc16c0a2ed0>

would become something like:

<object object #71 at 0x7fc16c0a2ed0>

This approach uses no extra memory in the object and makes similar
objects more visually distinct.

It fails to make the repr shorter, and the hashed ids are not globally
unique.

The hash doesn't need to be secure, just have a good spread.

Cheers,
Mark.

On 19/07/2020 4:38 pm, Serhiy Storchaka wrote:
> I have problem with the location of hexadecimal memory address in custom
> reprs.
>
>     <threading.BoundedSemaphore: 2/3 at 0x7ff4c26b3eb0>
>
> vs
>
>     <threading.BoundedSemaphore at 0x7ff4c26b3eb0: 2/3>
>
> The long hexadecimal number makes the repr longer and distracts
> attention from other useful information. We could get rid of it, but it
> is useful if we want to distinguish objects of the same type. Although
> it is hard to distinguish long hexadecimal numbers which differ only by
> few digits in the middle.
>
> What if use serial numbers to differentiate instances?
>
>     <threading.BoundedSemaphore #5: 2/3>
>
> where the serial number starts with 1 and increased for every new
> instance of that type.
>
> The advantages are:
>
> * Shorter repr.
> * Easier to distinguish different objects.
> * The serial number is unique for the life of program and cannot be
> reused (in contrary to id/memory address).
>
> The disadvantages are:
>
> * Increased object size and creation time.
>
> I do not propose to use serial numbers for all objects, because it would
> increase the size of objects and the fixed-size integer can be
> overflowed for some short-living objects created in mass (like numbers,
> strings, tuples). But only for some custom objects implemented in
> Python, for which size and creation time are not critical. I want to
> start with synchronization objects in threading and multiprocessing
> which did not have custom reprs, than change reprs of locks and asyncio
> objects.
>
> Is it worth to do?
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/E6YEXMQ4OE5YGZGRP62JOLTAGBCL6RCX/
>
> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TMLOB3JIU6SA5EWJX7YLJCDNBHTE4DRG/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: Memory address vs serial number in reprs [ In reply to ]

random832 at fastmail

Jul 25, 2020, 8:46 PM

Post #15 of 15 (1245 views)

On Sun, Jul 19, 2020, at 13:02, Guido van Rossum wrote:
> That looks expensive, esp. for objects implemented in Python — an extra
> dict entry plus a new unique int object. What is the problem you are
> trying to solve for these objects specifically? Just that the hex
> numbers look distracting doesn’t strike me as sufficient motivation.

Could the numbers be kept outside the object, perhaps in a weak* dictionary that's maintained in the __repr__ method, so you don't pay for it if you don't use it? *if the object's hash/eq use identity, anyway... a "weak identity-keyed dictionary" might be a nice thing to add anyway
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/I34MYQDLIJNWEHV2JFO4QAKIZKCFTG7A/
Code of Conduct: http://python.org/psf/codeofconduct/