Mailing List Archive

tstate invalid crash with threads
I have been playing with a multithreaded tcp based server for a while,
and during some serious stress testing continually hit a problem
involving the interpreter crashing with "Fatal Python error:
PyThreadState_Delete: invalid tstate".

I have stripped the server down to a pretty basic SocketServer
based process which replies to a client request. It still shows the
bug, and also crashes with a Segmentation fault sometimes.
I have tested the same code on Linux 2.0.36/Python1.5.1, Linux 2.0.36/
Python1.5.2 and Solaris 2.6/python 1.5.2, all crash the same way.
Unfortunately to produce the crash I need to run 12 clients on various
other machines via 100Mbit Ethernet, the server usually dies between
150k
and 1.4M transactions later (it could take an hour).
Sometimes it crashes within 15 minutes.

It appears to be a subtle race in PyThreadState_Delete ....
interestingly, if I uncomment the small sleep in "handle" in the server,
ie.
make the server slower, it seems to work for ever ... 4m transactions
before I gave up. I think the problem only comes if you are creating and
destroying
threads quickly in parallel.

Any comments would be appreciated, note however that the following is
seriously stripped down, it isn't supposed to do anything useful.
The queue is used to make sure that I'm not keeling over due
to thread exhaustion, it isn't required, I just wanted to know if
the process goes thread crazy prior to crashing ... it doesn't.
All this was done on an otherwise idle system with 384Mb memory,
so resource exhaustion seems unlikely. I've seen this bug before
but that particular application didn't really require threads
so I converted to a synchronous server by using an ordinary
TCPServer.

The server is:
from SocketServer import ThreadingTCPServer, StreamRequestHandler
from socket import *
import Queue
import cPickle
import time
from threading import *

server_address = ('', 8000)

cnt = 0L

threadq = Queue.Queue(200)
for i in range(200):
threadq.put(i)

class BankHandler(StreamRequestHandler):
def handle(self):
global cnt

val = self.connection.recv(1024)
if (cnt % 10000) == 0:
print time.ctime(time.time()), cnt
cnt = cnt + 1
#time.sleep(0.01)
self.connection.send('OK')

class MyServer(ThreadingTCPServer):
def process_request(self, request, client_address):
"""Start a new thread to process the request."""
import thread

if threadq.empty():
self.handle_error(request, client_address)
else:
thread.start_new_thread(self.finish_request,
(request, client_address))

def finish_request(self, request, client_address):
"""Finish one request by instantiating
RequestHandlerClass."""
t = threadq.get()
self.RequestHandlerClass(request, client_address, self)
threadq.put(t)


def server_bind(self):
"""Called by constructor to bind the socket.
May be overridden.
"""
self.socket.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
self.socket.bind(self.server_address)

s = MyServer(server_address, BankHandler)

s.serve_forever()

*********************************************************************************************
And the client is: (client_speed has a single line with .01 in it)

from socket import *
import string
import cPickle
import sys
import time
from select import select

TIMEOUT=30
trans = 0L
while 1:
try:
f = open('client_speed')
delay = string.atof(f.readline())
f.close()
sok = socket(AF_INET, SOCK_STREAM)
sok.connect('bozo', 8000)
sok.send('Hello')
x = select([sok.fileno()], [], [], TIMEOUT)
if x != ([],[],[]):
data = sok.recv(2048)
if (trans % 10240) == 0:
print time.ctime(time.time()), trans,
data
else:
print 'Recv timeout'
sok.close()
except:
print 'failure %s'%(sys.exc_info()[1])
trans = trans + 1
time.sleep(0.01)
tstate invalid crash with threads [ In reply to ]
Followups to the Thread-SIG, please.

[Ray Loyzaga]
> I have been playing with a multithreaded tcp based server for a while,
> and during some serious stress testing continually hit a problem
> involving the interpreter crashing with "Fatal Python error:
> PyThreadState_Delete: invalid tstate".
> ...
> It appears to be a subtle race in PyThreadState_Delete ....
> interestingly, if I uncomment the small sleep in "handle" in the server,
> ie. make the server slower, it seems to work for ever ... 4m transactions
> before I gave up. I think the problem only comes if you are creating and
> destroying threads quickly in parallel.

PyThreadState_Delete is called from very few places, and one of them strikes
me as suspicious: at the end of threadmodule.c's t_bootstrap, we have:

PyThreadState_Clear(tstate);
PyEval_ReleaseThread(tstate);
PyThreadState_Delete(tstate);
PyThread_exit_thread();

The suspicious thing here is that PyEval_ReleaseThread releases the global
interpreter lock, so nothing is serializing calls to PyThreadState_Delete
made from the following line. PyThreadState_Delete in turn does no locking
of its own either, but mutates a shared list.

If this isn't plain wrong, it's certainly not plain right <wink>. Matches
your symptoms, too (very rare blowups during high rates of thread death).

Guido? I haven't been able to provoke Ray's problem under Win95, but the
above just doesn't smell right.

win95-didn't-crash-but-the-TAB-and-ESC-keys-did-swap-their-
meanings!-ly y'rs - tim
tstate invalid crash with threads [ In reply to ]
In article <000101beb237$88b71d80$999e2299@tim>,
"Tim Peters" <tim_one@email.msn.com> intoned:
> Followups to the Thread-SIG, please.
>
> [Ray Loyzaga]
>> I have been playing with a multithreaded tcp based server for a while,
>> and during some serious stress testing continually hit a problem
>> involving the interpreter crashing with "Fatal Python error:
>> PyThreadState_Delete: invalid tstate".
>> ...
>> It appears to be a subtle race in PyThreadState_Delete ....
>> interestingly, if I uncomment the small sleep in "handle" in the server,
>> ie. make the server slower, it seems to work for ever ... 4m transactions
>> before I gave up. I think the problem only comes if you are creating and
>> destroying threads quickly in parallel.
>
> PyThreadState_Delete is called from very few places, and one of them strikes
> me as suspicious: at the end of threadmodule.c's t_bootstrap, we have:
>
> PyThreadState_Clear(tstate);
> PyEval_ReleaseThread(tstate);
> PyThreadState_Delete(tstate);
> PyThread_exit_thread();
>
> The suspicious thing here is that PyEval_ReleaseThread releases the global
> interpreter lock, so nothing is serializing calls to PyThreadState_Delete
> made from the following line. PyThreadState_Delete in turn does no locking
> of its own either, but mutates a shared list.
>
> If this isn't plain wrong, it's certainly not plain right <wink>. Matches
> your symptoms, too (very rare blowups during high rates of thread death).
>
> Guido? I haven't been able to provoke Ray's problem under Win95, but the
> above just doesn't smell right.
>
> win95-didn't-crash-but-the-TAB-and-ESC-keys-did-swap-their-
> meanings!-ly y'rs - tim

I've put a lock in pystate.c around anything that touches tstate,
and it seems to have fixed the problem (although reproducing it is
time-consuming).

This seems to be the same solution that Tim suggests, but at one remove.

Stephen
--
Stephen Norris srn@fn.com.au
PGP key available via finger srn@flibble.fn.com.au.
Farrow Norris Pty. Ltd. http://www.fn.com.au/