Mailing List Archive: Accuracy of multiprocessing.Queue.qsize before any Queue.get invocations?

The documentation says[1]

> Return the approximate size of the queue. Because of
> multithreading/multiprocessing semantics, this number is not
> reliable.

Are there any circumstances under which it *is* reliable? Most
germane, if I've added a bunch of items to the Queue, but not yet
launched any processes removing those items from the Queue, does
Queue.qsize accurately (and reliably) reflect the number of items in
the queue?

q = Queue()
for fname in os.listdir():
q.put(fname)
file_count = q.qsize() # is this reliable?
# since this hasn't yet started fiddling with it
for _ in range(os.cpu_count()):
Process(target=myfunc, args=(q, arg2, arg3)).start()

I'm currently tracking the count as I add them to my Queue,

file_count = 0
for fname in os.listdir():
q.put(fname)
file_count += 1

but if .qsize is reliably accurate before anything has a chance to
.get data from it, I'd prefer to tidy the code by removing the
redunant counting code if I can.

I'm just not sure what circumstances the "this number is not
reliable" holds. I get that things might be asynchronously
added/removed once processes are running, but is there anything that
would cause unreliability *before* other processes/consumers run?

Thanks,

-tkc

[1]
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue.qsize

--
https://mail.python.org/mailman/listinfo/python-list

If the queue was not shared to any other process, I would guess that its
size is reliable.

However, a plain counter could be much simpler/safer.

The developer of multiprocessing.Queue, implemented
size() thinking in how to share the size and maintain a reasonable
consistency between process.
He/she probably didn't care how well works in a single-process scenario
as this is a very special case.

Thanks,
Martin.

On Thu, May 12, 2022 at 06:07:02PM -0500, Tim Chase wrote:
>The documentation says[1]
>
>> Return the approximate size of the queue. Because of
>> multithreading/multiprocessing semantics, this number is not
>> reliable.
>
>Are there any circumstances under which it *is* reliable? Most
>germane, if I've added a bunch of items to the Queue, but not yet
>launched any processes removing those items from the Queue, does
>Queue.qsize accurately (and reliably) reflect the number of items in
>the queue?
>
> q = Queue()
> for fname in os.listdir():
> q.put(fname)
> file_count = q.qsize() # is this reliable?
> # since this hasn't yet started fiddling with it
> for _ in range(os.cpu_count()):
> Process(target=myfunc, args=(q, arg2, arg3)).start()
>
>I'm currently tracking the count as I add them to my Queue,
>
> file_count = 0
> for fname in os.listdir():
> q.put(fname)
> file_count += 1
>
>but if .qsize is reliably accurate before anything has a chance to
>.get data from it, I'd prefer to tidy the code by removing the
>redunant counting code if I can.
>
>I'm just not sure what circumstances the "this number is not
>reliable" holds. I get that things might be asynchronously
>added/removed once processes are running, but is there anything that
>would cause unreliability *before* other processes/consumers run?
>
>Thanks,
>
>-tkc
>
>[1]
>https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue.qsize
>
>
>
>
>
>--
>https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list