Mailing List Archive

Is there another way to solve the continuation problem?
Okay, from my feeble understanding of the problem it appears that
coroutines/continuations and threads are going to be problematic at best for
Sam's needs. Are there other "solutions"? We know about state machines.
They have the problem that the number of states grows exponentially (?) as
the number of state variables increases.

Can exceptions be coerced into providing the necessary structure without
botching up the application too badly? Seems that at some point where you
need to do some I/O, you could raise an exception whose second expression
contains the necessary state to get back to where you need to be once the
I/O is ready to go. The controller that catches the exceptions would use
select or poll to prepare for the I/O then dispatch back to the handlers
using the information from exceptions.

class IOSetup:
pass

class WaveHands:
"""maintains exception raise info and selects one to go to next"""
def choose_one(r,w,e):
pass

def remember(info):
pass

def controller(...):
waiters = WaveHands()
while 1:
r, w, e = select([...], [...], [...])
# using r,w,e, select a waiter to call
func, place = waiters.choose_one(r,w,e)
try:
func(place)
except IOSetup, info:
waiters.remember(info)


def spam_func(place):
if place == "spam":
# whatever I/O we needed to do is ready to go
bytes = read(some_fd)
process(bytes)
# need to read some more from some_fd. args are:
# function, target, fd category (r, w), selectable object,
raise IOSetup, (spam_func, "eggs" , "r", some_fd)

elif place == "eggs":
# that next chunk is ready - get it and proceed...

elif yadda, yadda, yadda...


One thread, some craftiness needed to construct things. Seems like it might
isolate some of the statefulness to smaller functional units than a pure
state machine. Clearly not as clean as continuations would be. Totally
bogus? Totally inadequate? Maybe Sam already does things this way?


Skip Montanaro | Mojam: "Uniting the World of Music" http://www.mojam.com/
skip@mojam.com | Musi-Cal: http://www.musi-cal.com/
518-372-5583
RE: Is there another way to solve the continuation problem? [ In reply to ]
> Sam's needs. Are there other "solutions"? We know about
> state machines.
> They have the problem that the number of states grows
> exponentially (?) as
> the number of state variables increases.

Well, I can give you my feeble understanding of "IO Completion Ports", the
technique Win32 provides to "solve" this problem.

My experience is limited to how we used these in a server product designed
to maintain thousands of long-term client connections each spooling large
chunks of data (MSOffice docs - yes, that large :-). We too could
obviously not afford a thread per connection. Searching through NT's
documentation, completion ports are the technique they recommend for
high-performance IO, and it appears to deliver.

NT has the concept of a completion port, which in many ways is like an
"inverted semaphore". You create a completion port with a "max number of
threads" value. Then, for every IO object you need to use (files, sockets,
pipes etc) you "attach" it to the completion port, along with an integer
key. This key is (presumably) unique to the file, and usually a pointer to
some structure maintaing the state of the file (ie, connection)

The general programming model is that you have a small number of threads
(possibly 1), and a large number of io objects (eg files). Each of these
threads is executing a state machine. When IO is "ready" for a particular
file, one of the available threads is woken, and passed the "key"
associated with the file. This key identifies the file, and more
importantly the state of that file. The thread uses the state to perform
the next IO operation, then immediately go back to sleep. When that IO
operation completes, some other thread is woken to handle that state
change. What makes this work of course is that _all_ IO is asynch - not a
single IO call in this whole model can afford to block. NT provides asynch
IO natively.

This sounds very similar to what Medusa does internally, although the NT
model provides a "thread pooling" scheme built-in. Although our server
performed very well with a single thread and hundreds of high-volume
connections, we chose to run with a default of 5 threads here.

For those still interested, our project has the multi-threaded state
machine I described above implemented in C. Most of the work is
responsible for spooling the client request data (possibly 100s of kbs)
before handing that data off to the real server. When the C code
transitions the client through the state of "send/get from the real
server", we actually set a different completion port. This other
completion port wakes a thread written in Python. So our architecture
consists of a C implemented thread-pool managing client connections, and a
different Python implemented thread pool that does the real work for each
of these client connections. (The Python side of the world is bound by the
server we are talking to, so Python performance doesnt matter as much - C
wouldnt buy enough)

This means that our state machines are not that complex. Each "thread
pool" is managing its own, fairly simple state. NT automatically allows
you to associate state with the IO object, and as we have multiple thread
pools, each one is simple - the one spooling client data is simple, the one
doing the actual server work is simple. If we had to have a single,
monolithic state machine managing all aspects of the client spooling, _and_
the server work, it would be horrid.

This is all in a shrink-wrapped relatively cheap "Document Management"
product being targetted (successfully, it appears) at huge NT/Exchange
based sites. Australia's largest Telco are implementing it, and indeed the
company has VC from Intel! Lots of support from MS, as it helps compete
with Domino. Not bad for a little startup - now they are wondering what to
do with this Python-thingy they now have in their product that noone else
has ever heard off; but they are planning on keeping it for now :-)
[.Funnily, when they started, they didnt think they even _needed_ a server,
so I said "Ill just knock up a little one in Python", and we havent looked
back :-]

Mark.
Is there another way to solve the continuation problem? [ In reply to ]
Skip Montanaro writes:
> Can exceptions be coerced into providing the necessary structure
> without botching up the application too badly? Seems that at some
> point where you need to do some I/O, you could raise an exception
> whose second expression contains the necessary state to get back to
> where you need to be once the I/O is ready to go. The controller
> that catches the exceptions would use select or poll to prepare for
> the I/O then dispatch back to the handlers using the information
> from exceptions.

> [... code ...]

Well, you just re-invented the 'Reactor' pattern! 8^)

http://www.cs.wustl.edu/~schmidt/patterns-ace.html

> One thread, some craftiness needed to construct things. Seems like
> it might isolate some of the statefulness to smaller functional
> units than a pure state machine. Clearly not as clean as
> continuations would be. Totally bogus? Totally inadequate? Maybe
> Sam already does things this way?

What you just described is what Medusa does (well, actually, 'Python'
does it now, because the two core libraries that implement this are
now in the library - asyncore.py and asynchat.py). asyncore doesn't
really use exceptions exactly that way, and asynchat allows you to add
another layer of processing (basically, dividing the input into
logical 'lines' or 'records' depending on a 'line terminator').

The same technique is at the heart of many well-known network servers,
including INND, BIND, X11, Squid, etc.. It's really just a state
machine underneath (with python functions or methods implementing the
'states'). As long as things don't get too complex. Python
simplifies things enough to allow one to 'push the difficulty
envelope' a bit further than one could reasonably tolerate in C. For
example, Squid implements async HTTP (server and client, because it's
a proxy) - but stops short of trying to implement async FTP. Medusa
implements async FTP, but it's the largest file in the Medusa
distribution, weighing in at a hefty 32KB.

The hard part comes when you want to plug different pieces and
protocols together. For example, building a simple HTTP or FTP server
is relatively easy, but building an HTTP server *that proxied to an
FTP server* is much more difficult. I've done these kinds of things,
viewing each as a challenge; but past a certain point it boggles.

The paper I posted about earlier by Matthew Fuchs has a really good
explanation of this, but in the context of GUI event loops... I think
it ties in neatly with this discussion because at the heart of any X11
app is a little guy manipulating a file descriptor.

-Sam