Mailing List Archive

Newbie Threads question
Hello everyone-

I have never done any threads programming whatsoever. I did take an Operating
Systems class back in College ('94) so I understand semaphores used to protect
resources, but haven't ever implemented them.

What I'd like to do is learn how to use threads by "parallelizing" a simple
set of now currently serialized tasks. No shared resources here. I
understand that in all likelihood I will not get any performance increase
(since I'm on a uni-processor system) but I want to learn how to do this
stuff.

hypothetically my code looks like this:

def func1(param):
result = []
# do a bunch of work, takes about 3 seconds of wall-clock time
return result

def func2(param):
result = []
# do a bunch of work, takes about 5 seconds of wall-clock time
return result

def func3(list1, list2):
result = []
# does work on the two lists passed in, takes 0.5 secs of wall-clock time
return result


#my main loop simplified
list1 = func1(x)
list2 = func2(y)
reallist = func3(list1,list2)
#do something with reallist.

Since func1 and func2 are completely independent - IE do not use any of the
same resources, to me this would be a great place to do the work in parallel.
What I want to do is something like this:

#my new main loop
list1 = thread.start_new_thread(func1,(x))
list2 = thread.start_new_thread(func2,(y))
reallist = func3(list1,list2)

I have read the documentation a couple of times, but what I don't know:

A) Is the return value of start_new_thread the same as the return value of the
function it calls?

B) Do I need to do anything fancy to make the func3 call wait until the calls
to func1 and func2 have both returned? For example, should I have the func1
and func2 calls aquire a lock at the start of the function and release it at
the end, and then make the main loop try to acquire both locks before
proceeding to the reallist = func3() call? If I did this, could the main
loop acquire the locks before the new threads do, causing deadlock?

C) I'm assuming I have to recompile python (since I didn't compile it "with
threads")

Thanks for any help...
-Fred

----
Michael "Fred" Fredericks
Graduate Student, University of Maryland Dept of Computer Science
fred-at-cs-dot-umd-dot-edu
I never read my deja news e-mail

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Newbie Threads question [ In reply to ]
Michael "Fred" Fredericks writes:

> What I'd like to do is learn how to use threads by "parallelizing" a
> simple set of now currently serialized tasks. No shared resources
> here. I understand that in all likelihood I will not get any
> performance increase (since I'm on a uni-processor system) but I
> want to learn how to do this stuff.

If your processing is CPU-bound, this is correct. If not, you will
almost certainly see a performance boost.

> hypothetically my code looks like this:
>
> def func1(param):
> result = []
> # do a bunch of work, takes about 3 seconds of wall-clock time
> return result
>
> def func2(param):
> result = []
> # do a bunch of work, takes about 5 seconds of wall-clock time
> return result
>
> def func3(list1, list2):
> result = []
> # does work on the two lists passed in, takes 0.5 secs of
> wall-clock time return result
>
>
> #my main loop simplified
> list1 = func1(x)
> list2 = func2(y)
> reallist = func3(list1,list2)
> #do something with reallist.
>
> Since func1 and func2 are completely independent - IE do not use any
> of the same resources, to me this would be a great place to do the
> work in parallel.
> What I want to do is something like this:
>
> #my new main loop
> list1 = thread.start_new_thread(func1,(x))
> list2 = thread.start_new_thread(func2,(y))
> reallist = func3(list1,list2)
>
> I have read the documentation a couple of times, but what I don't
> know:
>
> A) Is the return value of start_new_thread the same as the return
> value of the function it calls?

If it were, threads wouldn't be of much use, would they? You'll have
to stash your intermediate results some place for the main thread to
pick them up.

> B) Do I need to do anything fancy to make the func3 call wait until
> the calls to func1 and func2 have both returned?

If you're using the low-level thread module, yes. If you use the
higher level threading module, Guido's done most of the fancy stuff
for you. You *will* have to wait on the 2 parallel threads before
going on to func3.

If you're determined to go the low-level route, look at the
test_thread script. Then go the high-level route!

> For example, should
> I have the func1 and func2 calls aquire a lock at the start of the
> function and release it at the end, and then make the main loop try
> to acquire both locks before proceeding to the reallist = func3()
> call? If I did this, could the main loop acquire the locks before
> the new threads do, causing deadlock?

If you're mucking about at this level, you'll find you need the
occaisional "sleep(0.000001)". This will force the current thread to
release the interpreter lock. If _anything_ else is ready to go, it
will go.

> C) I'm assuming I have to recompile python (since I didn't compile
> it "with threads")

Assuming you're on *nix, yes (it's on by default in Win32). You'll
also need to make clean and start over from configure. Depending on
your *nix, you may or may not have some cursing to do...

- Gordon
Newbie Threads question [ In reply to ]
Michael "Fred" Fredericks wrote in message
<7gfdrm$69t$1@nnrp1.dejanews.com>...
>What I'd like to do is learn how to use threads by "parallelizing" a simple
>set of now currently serialized tasks.
> [sample code snipped]
>#my new main loop
>list1 = thread.start_new_thread(func1,(x))
>list2 = thread.start_new_thread(func2,(y))
>reallist = func3(list1,list2)
>
>I have read the documentation a couple of times, but what I don't know:
>[...]
>B) Do I need to do anything fancy to make the func3 call wait until the
calls
>to func1 and func2 have both returned? For example, should I have the func1
>and func2 calls aquire a lock at the start of the function and release it
at
>the end, and then make the main loop try to acquire both locks before
>proceeding to the reallist = func3() call? If I did this, could the main
>loop acquire the locks before the new threads do, causing deadlock?


There is a nifty class called WorkQ that will wait on N threads in
python\python-1.5.2\demo\threads\find.py
courtesy of Guido. You may need to download the source-code to get this --
not sure.
--
Phil Mayes pmayes AT olivebr DOT com
Olive Branch Software - home of Arranger http://www.olivebr.com/
Newbie Threads question [ In reply to ]
[mbf2y@my-dejanews.com]
> ...
> What I'd like to do is learn how to use threads by "parallelizing" a
> simple set of now currently serialized tasks.
> ...
> hypothetically my code looks like this:
>
> def func1(param):
> result = []
> # do a bunch of work, takes about 3 seconds of wall-clock time
> return result

When a thread function returns, the thread dies, and the return value is
tossed into the bit bucket. You'll need to stuff the result away in a
non-local vrbl of some kind.

> def func2(param):
> result = []
> # do a bunch of work, takes about 5 seconds of wall-clock time
> return result

Ditto.

> def func3(list1, list2):
> result = []
> # does work on the two lists passed in, takes 0.5 secs of
> wall-clock time
> return result
>
>
> #my main loop simplified
> list1 = func1(x)
> list2 = func2(y)
> reallist = func3(list1,list2)
> #do something with reallist.
>
> Since func1 and func2 are completely independent - IE do not use
> any of the same resources, to me this would be a great place to do
> the work in parallel.

Yup! In the biz, this is what's called "embarrassingly parallel". I worked
for at least one now-defunct startup that tried to get rich off non-problems
exactly like that <wink>.

> What I want to do is something like this:
>
> #my new main loop
> list1 = thread.start_new_thread(func1,(x))
> list2 = thread.start_new_thread(func2,(y))
> reallist = func3(list1,list2)
>
> I have read the documentation a couple of times, but what I don't know:
>
> A) Is the return value of start_new_thread the same as the return
> value of the function it calls?

Expanding on Gordon's hint, if start_new_thread waited for func1 to return a
value, nothing at all would happen in parallel (the second call to
start_new_thread couldn't begin before func1 returned).

Note too that you need to pass a tuple of arguments in the call, and (x)
isn't a tuple. A 1-tuple is a degenerate case that needs to be spelled (x,)
(note the silly-looking trailing comma there).

> B) Do I need to do anything fancy to make the func3 call wait
> until the calls to func1 and func2 have both returned?

Absolutely.

> For example, should I have the func1 and func2 calls aquire a lock at
> the start of the function and release it at the end, and then make the
> main loop try to acquire both locks before proceeding to the
> reallist = func3() call?

What you "should do" is use the higher-level threading module's "join"
method. Rolling your own is fraught with peril. For example, consider your
suggestion:

def func1(...):
acquire lock1
do work
release lock1

# func2 similarly, but with lock2

# main loop
start func1
start func2
acquire lock1
acquire lock2

It's quite possible that the main thread will start func1 and func2, and
acquire both lock1 and lock2 before any code in func1 or func2 gets a chance
to execute. Then func1 and func2 hang waiting to acquire locks that will
never get released, and your main loop hangs too on the next trip around.

The kind of gimmick you're thinking of *can* work, but requires acquiring
the locks in the main loop *before* starting the threads; then it's
guaranteed that the thread function is entered with its lock in the acquired
state:

def func1(...):
do work
release lock1

# func2 similarly, but with lock2

acquire lock1
acquire lock2
# main loop
start func1
start func2
acquire lock1
acquire lock2

I'll attach a less painful alternative using the "threading" module.
"threading" presents a scheme more-or-less like Java's thread API, so
getting a book on Java threads would be a good idea.

threads-tend-to-unravel-ly y'rs - tim

This is executable as-is:

from threading import Thread
import time

def square(n, answer):
print "in square"
for i in range(n):
answer.append(i**2)
time.sleep(3)
print "returning from square"

def cube(n, answer):
print "in cube"
for i in range(n):
answer.append(i**3)
time.sleep(5)
print "returning from cube"

for i in range(4):
t1result = []
t2result = []
t1 = Thread(target=square, args=(2*i, t1result))
t2 = Thread(target=cube, args=(3*i, t2result))
print "starting threads with i =", i
t1.start(); t2.start()
t1.join(); t2.join()
print "back from joins"
print "square returned", t1result
print "cube returned", t2result