Mailing List Archive

What kind of "thread safe" are deque's actually?
A while ago I chose to use a deque that is shared between two threads. I did so because the docs say:

"Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.”

(https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)

Earlier today, looking through some server logs I noticed that from time to I’m getting a

RuntimeError: deque mutated during iteration

I guess this surprised me. When I see “thread safe”, I don’t expect to get errors.

Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than a median?

I’ve got a couple ways I can work around this. But I was surprised.
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On Tue, 28 Mar 2023 at 12:26, Travis Griggs <travisgriggs@gmail.com> wrote:
>
> A while ago I chose to use a deque that is shared between two threads. I did so because the docs say:
>
> "Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.”
>
> (https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)
>
> Earlier today, looking through some server logs I noticed that from time to I’m getting a
>
> RuntimeError: deque mutated during iteration
>
> I guess this surprised me. When I see “thread safe”, I don’t expect to get errors.
>

I'd like to see your code, but here's an example with no threads
whatsoever that has the same error:

>>> from collections import deque
>>> q = deque([1, 2, 3, 4])
>>> for item in q:
... if item % 2: q.append(item * 2)
... print(item)
...
1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: deque mutated during iteration

This error comes from having an active iterator, then mutating the
deque, then continuing to iterate. That MAY be a result of threading,
but it isn't necessarily.

For threaded usage, I would recommend restricting yourself to
append/appendleft/pop/popleft (the standard mutators), with any
operations on the entire queue being done on a copy instead (either
q.copy() or list(q) depending on what you need). The act of taking a
copy should itself be thread-safe, and obviously anything done on a
separate copy will be independent of changes to the original.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On 2023-03-27 at 18:25:01 -0700,
Travis Griggs <travisgriggs@gmail.com> wrote:

> "Deques support thread-safe, memory efficient appends and pops from
> either side of the deque with approximately the same O(1) performance
> in either direction.”

> (https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)

[...]

> I guess this surprised me. When I see “thread safe”, I don’t expect to
> get errors.

Even without threads, mutating a collection while iterating over it
usually results in bad things happening.

$ python
Python 3.10.10 (main, Mar 5 2023, 22:26:53) [GCC 12.2.1 20230201] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import collections
>>> x = collections.deque()
>>> x.append(44)
>>> x.append(55)
>>> x.append(66)
>>> x.append(77)
>>> x
deque([44, 55, 66, 77])
>>> for y in x:
x.pop()

77
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: deque mutated during iteration

Concurrency just increases the likeliness of mutating while iterating.
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On 2023-03-28, Travis Griggs <travisgriggs@gmail.com> wrote:
> A while ago I chose to use a deque that is shared between two threads. I did so because the docs say:
>
> "Deques support thread-safe, memory efficient appends and pops from
> either side of the deque with approximately the same O(1)
> performance in either direction.”
>
> (https://docs.python.org/3.11/library/collections.html?highlight=deque#collections.deque)
>
> Earlier today, looking through some server logs I noticed that from
> time to I’m getting a
>
> RuntimeError: deque mutated during iteration
>
> I guess this surprised me. When I see “thread safe”, I don’t expect
> to get errors.

Well, I guess it doesn't say that iteration of a deque is thread
safe. It only claims that appends and pops from either end are thread
safe. It doesn't even claim that inserts, removes, clear, copy, or any
other operations are thread-safe.

--
Grant




--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On 28/03/23 2:25 pm, Travis Griggs wrote:
> Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than a median?

It may be a matter of whether the GIL is held or not. I had a look
at the source for deque, and it doesn't seem to explicitly do
anything about locking, it just relies on the GIL.

So maybe statistics.median() is implemented in C and statistics.mean()
in Python, or something like that?

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list
<python-list@python.org> wrote:
>
> On 28/03/23 2:25 pm, Travis Griggs wrote:
> > Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than a median?
>
> It may be a matter of whether the GIL is held or not. I had a look
> at the source for deque, and it doesn't seem to explicitly do
> anything about locking, it just relies on the GIL.
>
> So maybe statistics.median() is implemented in C and statistics.mean()
> in Python, or something like that?
>

Both functions are implemented in Python, but median() starts out with
this notable line:

data = sorted(data)

which gives back a copy, iterated over rapidly in C. All subsequent
work is done on that copy.

The same effect could be had with mean() by taking a snapshot using
list(q) and, I believe, would have the same effect (the source code
for the sorted() function begins by calling PySequence_List).

In any case, it makes *conceptual* sense to do your analysis on a copy
of the queue, thus ensuring that your stats are stable. The other
threads can keep going while you do your calculations, even if that
means changing the queue.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On 3/29/23 02:08, Chris Angelico wrote:
> On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list
> <python-list@python.org> wrote:
>> On 28/03/23 2:25 pm, Travis Griggs wrote:
>>> Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than a median?
>> It may be a matter of whether the GIL is held or not. I had a look
>> at the source for deque, and it doesn't seem to explicitly do
>> anything about locking, it just relies on the GIL.
>>
>> So maybe statistics.median() is implemented in C and statistics.mean()
>> in Python, or something like that?
>>
> Both functions are implemented in Python, but median() starts out with
> this notable line:
>
> data = sorted(data)
>
> which gives back a copy, iterated over rapidly in C. All subsequent
> work is done on that copy.
>
> The same effect could be had with mean() by taking a snapshot using
> list(q) and, I believe, would have the same effect (the source code
> for the sorted() function begins by calling PySequence_List).
>
> In any case, it makes *conceptual* sense to do your analysis on a copy
> of the queue, thus ensuring that your stats are stable. The other
> threads can keep going while you do your calculations, even if that
> means changing the queue.
>
> ChrisA
Sorry for any injected confusion here, but that line "data =
sorted(data)" appears as though it takes the value of the variable named
_data_, sorts it and returns it to the same variable store, so no copy
would be created. Am I missing something there?
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On 2023-03-29, Jack Dangler <tdldev@gmail.com> wrote:
>
>> data = sorted(data)
>
> Sorry for any injected confusion here, but that line "data =
> sorted(data)" appears as though it takes the value of the variable named
> _data_, sorts it and returns it to the same variable store, so no copy
> would be created. Am I missing something there?

Yes, you're missing the basics of what an assignment does in Python
and how objects work. Python doesn't have such a thing as "a variable
st store".

The assignment operator binds a name to an object.

The 'sorted(data)' expression creates a new object containing a sorted copy of 'data'.

The assignment then binds the name "data" to that new object.

The old, unsorted object then becomes inaccessable (unless there are
other names bound to it, or it's being otherwise used). At some point
that old, unsorted object will "go away" completely and cease to exist.

--
Grant





--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On Thu, 30 Mar 2023 at 01:52, Jack Dangler <tdldev@gmail.com> wrote:
>
>
> On 3/29/23 02:08, Chris Angelico wrote:
> > On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list
> > <python-list@python.org> wrote:
> >> On 28/03/23 2:25 pm, Travis Griggs wrote:
> >>> Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than a median?
> >> It may be a matter of whether the GIL is held or not. I had a look
> >> at the source for deque, and it doesn't seem to explicitly do
> >> anything about locking, it just relies on the GIL.
> >>
> >> So maybe statistics.median() is implemented in C and statistics.mean()
> >> in Python, or something like that?
> >>
> > Both functions are implemented in Python, but median() starts out with
> > this notable line:
> >
> > data = sorted(data)
> >
> > which gives back a copy, iterated over rapidly in C. All subsequent
> > work is done on that copy.
> >
> > The same effect could be had with mean() by taking a snapshot using
> > list(q) and, I believe, would have the same effect (the source code
> > for the sorted() function begins by calling PySequence_List).
> >
> > In any case, it makes *conceptual* sense to do your analysis on a copy
> > of the queue, thus ensuring that your stats are stable. The other
> > threads can keep going while you do your calculations, even if that
> > means changing the queue.
> >
> > ChrisA
> Sorry for any injected confusion here, but that line "data =
> sorted(data)" appears as though it takes the value of the variable named
> _data_, sorts it and returns it to the same variable store, so no copy
> would be created. Am I missing something there?

The variable name "data" is the parameter to median(), so it's
whatever you ask for the median of. (I didn't make that obvious in my
previous post - an excess of brevity on my part.)

The sorted() function, UNlike list.sort(), returns a sorted copy of
what it's given. I delved into the CPython source code for that, and
it begins with the PySequence_List call to (effectively) call
list(data) to get a copy of it. It ought to be a thread-safe copy due
to holding the GIL the entire time. I'm not sure what would happen in
a GIL-free world but most likely the lock on the input object would
still ensure thread safety.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On Wed, 29 Mar 2023 10:50:49 -0400, Jack Dangler <tdldev@gmail.com>
declaimed the following:

>Sorry for any injected confusion here, but that line "data =
>sorted(data)" appears as though it takes the value of the variable named
>_data_, sorts it and returns it to the same variable store, so no copy
>would be created. Am I missing something there?

The entire Python object/data model.

Data is not "stored in" variables -- rather names are "attached to"
data.

sorted() creates a new data object, allocating memory for it. THEN the
name "data" is attached to this new data object (and disconnected from the
previous object). If there are no names left connected to the original
object, the garbage collector reaps its memory.
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On 3/29/23 13:13, Chris Angelico wrote:
> On Thu, 30 Mar 2023 at 01:52, Jack Dangler <tdldev@gmail.com> wrote:
>>
>> On 3/29/23 02:08, Chris Angelico wrote:
>>> On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list
>>> <python-list@python.org> wrote:
>>>> On 28/03/23 2:25 pm, Travis Griggs wrote:
>>>>> Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than a median?
>>>> It may be a matter of whether the GIL is held or not. I had a look
>>>> at the source for deque, and it doesn't seem to explicitly do
>>>> anything about locking, it just relies on the GIL.
>>>>
>>>> So maybe statistics.median() is implemented in C and statistics.mean()
>>>> in Python, or something like that?
>>>>
>>> Both functions are implemented in Python, but median() starts out with
>>> this notable line:
>>>
>>> data = sorted(data)
>>>
>>> which gives back a copy, iterated over rapidly in C. All subsequent
>>> work is done on that copy.
>>>
>>> The same effect could be had with mean() by taking a snapshot using
>>> list(q) and, I believe, would have the same effect (the source code
>>> for the sorted() function begins by calling PySequence_List).
>>>
>>> In any case, it makes *conceptual* sense to do your analysis on a copy
>>> of the queue, thus ensuring that your stats are stable. The other
>>> threads can keep going while you do your calculations, even if that
>>> means changing the queue.
>>>
>>> ChrisA
>> Sorry for any injected confusion here, but that line "data =
>> sorted(data)" appears as though it takes the value of the variable named
>> _data_, sorts it and returns it to the same variable store, so no copy
>> would be created. Am I missing something there?
> The variable name "data" is the parameter to median(), so it's
> whatever you ask for the median of. (I didn't make that obvious in my
> previous post - an excess of brevity on my part.)
>
> The sorted() function, UNlike list.sort(), returns a sorted copy of
> what it's given. I delved into the CPython source code for that, and
> it begins with the PySequence_List call to (effectively) call
> list(data) to get a copy of it. It ought to be a thread-safe copy due
> to holding the GIL the entire time. I'm not sure what would happen in
> a GIL-free world but most likely the lock on the input object would
> still ensure thread safety.
>
> ChrisA
Aah - thanks, Chris! That makes much more sense.
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On 30/03/23 6:13 am, Chris Angelico wrote:
> I'm not sure what would happen in
> a GIL-free world but most likely the lock on the input object would
> still ensure thread safety.

In a GIL-free world, I would not expect deque to hold a lock
the entire time that something was iterating over it. That
would require holding the lock as long as an iterator object
existed referencing it, which could be a long time, even
longer than the caller expects (no reference counting,
remember!)

So for future-proofing I would recommend using deque's
copy() method to copy it before doing anything that iterates
over it. Hopefully that would be implemented in a thread-safe
way (although the docs don't currently promise that).

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list
Re: What kind of "thread safe" are deque's actually? [ In reply to ]
On Thu, 30 Mar 2023 at 07:36, Greg Ewing via Python-list
<python-list@python.org> wrote:
>
> On 30/03/23 6:13 am, Chris Angelico wrote:
> > I'm not sure what would happen in
> > a GIL-free world but most likely the lock on the input object would
> > still ensure thread safety.
>
> In a GIL-free world, I would not expect deque to hold a lock
> the entire time that something was iterating over it. That
> would require holding the lock as long as an iterator object
> existed referencing it, which could be a long time, even
> longer than the caller expects (no reference counting,
> remember!)

Certainly not, but I *would* expect the sorted() call to retain a lock
on the input object while it copies it (or, more precisely, for the
PySequence_List() call to do that).

> So for future-proofing I would recommend using deque's
> copy() method to copy it before doing anything that iterates
> over it. Hopefully that would be implemented in a thread-safe
> way (although the docs don't currently promise that).
>

Probably? It's actually less clear there, since a deque's copy method
is built on top of basic iteration and broadly looks like this (though
in C, not in Python):

def copy(self):
ret = deque()
ret.extend(self)
return ret

Simplified, but mostly accurate. And extending is done by getting an
iterator, then repeatedly appending. So.... probably safe? Question
mark?

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list