Mailing List Archive

Parallel(?) programming with python
tI would like to write a program, that reads from the network a fixed
amount of bytes and appends them to a list. This should happen once a
second.

Another part of the program should take the list, as it has been filled
so far, every 6 hours or so, and do some computations on the data (a FFT).

Every so often (say once a week) the list should be saved to a file,
shorthened in the front by so many items, and filled further with the
data coming fom the network. After the first saving of the whole list,
only the new part (the data that have come since the last saving) should
be appended to the file. A timestamp is in the data, so it's easy to say
what is new and what was already there.

I'm not sure how to do this properly: can I write a part of a program
that keeps doing its job (appending data to the list once every second)
while another part computes something on the data of the same list,
ignoring the new data being written?

Basically the question boils down to wether it is possible to have parts
of a program (could be functions) that keep doing their job while other
parts do something else on the same data, and what is the best way to do
this.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
Thanks for your reply.

On 08.08.22 13:20, Stefan Ram wrote:

> Yes, but this is difficult. If you ask this question here,
> you might not be ready for this.

Indeed.

>
> I haven't learned it yet myself, but nevertheless tried to
> write a small example program quickly, which might still
> contain errors because of my lack of education.
>
> import threading
> import time
>
> def write_to_list( list, lock, event ):
> for i in range( 10 ):
> lock.acquire()
> try:
> list.append( i )
> finally:
> lock.release()
> event.set()
> time.sleep( 3 )
>
> def read_from_list( list, lock, event ):
> while True:
> event.wait()
> print( "Waking up." )
> event.clear()
> if len( list ):
> print( "List contains " + str( list[ 0 ]) + "." )
> lock.acquire()
> try:
> del list[ 0 ]
> finally:
> lock.release()
> else:
> print( "List is empty." )
>
> list = []
> lock = threading.Lock()
> event = threading.Event()
> threading.Thread( target=write_to_list, args=[ list, lock, event ]).start()
> threading.Thread( target=read_from_list, args=[ list, lock, event ]).start()

If I understand some things correctly, a "lock" would be something that,
as the name says, locks, meaning prevents parts of the program from
executing on the locked resource until ohter parts have finished doing
their things and have released the lock. If this is correct, it's not
exactly what I wanted, because this way "parts of the program" would not
"keep doing their things, while other parts do other things on the same
data".

I'm in principle ok with locks, if it must be. What I fear is that the
lock could last long and prevent the function that writes into the list
from doing so every second. With an FFT on a list that contains a few
bytes taken every second over one week time (604.800 samples), I believe
it's very likely that the FFT function takes longer than a second to return.

Then I would have to import all the data I have missed since the lock
was aquired, which is doable, but I would like to avoid it if possible.

>
> In basketball, first you must learn to dribble and pass,
> before you can begin to shoot.

Sure.

>
> With certain reservations, texts that can be considered
> to learn Python are:
>
> "Object-Oriented Programming in Python Documentation" - a PDF file,
> Introduction to Programming Using Python - Y Daniel Liang (2013),
> How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12),
> The Coder's Apprentice - Pieter Spronck (2016-09-21), and
> Python Programming - John Zelle (2009).
>

Thank you for the list. I an currently taking a Udemy course and at the
same time reading the tutorials on python.org. I hope I will some day
come to any of the books you suggest (I'm doing this only in my spare
time and it will take forever).
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
Thank you for your reply.

On 08.08.22 14:55, Julio Di Egidio wrote:

> Concurrent programming is quite difficult, plus you better think
> in terms of queues than shared data...

Do you mean queues in the sense of deque (the data structure)? I ask
because I can see the advantage there when I try to pop data from the
front of it, but I don't see the sense of the following statement ("than
shared data"). I mean, I called my structure a list, but it may well be
a queue instead. That wouldn't prevent it from being shared in the idea
I described: one function would still append data to it while the other
is reading what is there up to a certain point and calculate the FFT of it.

But, an easier and often
> better option for concurrent data access is use a (relational)
> database, then the appropriate transaction isolation levels
> when reading and/or writing.
>

That would obviusly save some coding (but would introduce the need to
code the interaction with the database), but I'm not sure it would speed
up the thing. Would the RDBMS allow to read a table while something else
is writing to it? I doubt it and I'm not sure it doesn't flush the cache
before letting you read, which would include a normally slow disk access.

Andreas

> Julio

--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On Mon, 8 Aug 2022 12:47:26 +0200, Andreas Croci <andrea.croci@gmx.de>
declaimed the following:

>tI would like to write a program, that reads from the network a fixed
>amount of bytes and appends them to a list. This should happen once a
>second.
>

Ignoring leap seconds, there are 86400 seconds in a day -- how many
bytes are you planning to read each second?

Maybe more important? Is this a constant network connection feeding you
bytes (in which case the bytes available to read will be controlled by the
sender -- which may be sending continuously and building up a back log if
you don't empty the stream. Or are you planning to make a socket
connection, read n-bytes, close socket?

>Another part of the program should take the list, as it has been filled
>so far, every 6 hours or so, and do some computations on the data (a FFT).
>

"6 hours or so"? That leaves one open to all sorts of variable timing.
In either event, a 6 hour interval is more suited to a process started by a
cron job (Linux/Unix) or Task Scheduler (Windows). Having a thread sleep
for 6 hours means no safeguard if the parent process should die at some
point (and if you are keeping the data in an internal list, you lose all
that data too)

>Every so often (say once a week) the list should be saved to a file,

This REQUIRES the process to not fail at any point, nor any system
restarts, etc. And (see prior paragraphs) how much data are you
accumulating. In one week you have 604800 "reads". If you are reading 10
bytes each time, that makes 6MB of data you could potentially lose (on most
modern hardware, 6MB is not a memory concern... Even 32-bit OS should be
able to find space for 600MB of data...).

Much better would be to write the file as you read each chunk. If the
file is configured right, a separate process should be able to do read-only
processing of the file even while the write process is on-going. OR, you
attempt an open/write/close cycle which could be blocked while your FFT is
processing -- you'd have to detect that situation and buffer the read data
until you get a subsequent successful open, at which time you'd write all
the backlog data.

Or you could even have your FFT process copy the data to the long term
file, while the write process just starts a new file when it finds itself
blocked (and the FFT deletes the file it was reading).

>shorthened in the front by so many items, and filled further with the
>data coming fom the network. After the first saving of the whole list,
>only the new part (the data that have come since the last saving) should
>be appended to the file. A timestamp is in the data, so it's easy to say
>what is new and what was already there.
>

Personally, this sounds more suited for something like SQLite3...
Insert new records as the data is read, with timestamps. FFT process
selects records based upon last data ID (that it processed previously) to
end of new data. SQLite3 database IS the long-term storage. Might need a
second table to hold the FFT process "last data ID" so on start up it can
determine where to begin.

>I'm not sure how to do this properly: can I write a part of a program
>that keeps doing its job (appending data to the list once every second)
>while another part computes something on the data of the same list,
>ignoring the new data being written?
>
Well, if you really want ONE program -- you'll likely be looking at the
Threading module (I don't do "async", and your task doesn't seem suited for
async type call backs -- one thread that does the fetching of data, and a
second that does the FFT processing, which will be sleeping most of the
time).

But either way, I'd suggest not keeping the data in an internal list;
use some RDBM to keep the long-term data, accumulating it as you fetch it,
and letting the FFT read from the database for its processing.


--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On 8/8/2022 4:47 AM, Andreas Croci wrote:
> tI would like to write a program, that reads from the network a fixed
> amount of bytes and appends them to a list. This should happen once a
> second.
>
> Another part of the program should take the list, as it has been
> filled so far, every 6 hours or so, and do some computations on the
> data (a FFT).
>
> Every so often (say once a week) the list should be saved to a file,
> shorthened in the front by so many items, and filled further with the
> data coming fom the network. After the first saving of the whole list,
> only the new part (the data that have come since the last saving)
> should be appended to the file. A timestamp is in the data, so it's
> easy to say what is new and what was already there.
>
> I'm not sure how to do this properly: can I write a part of a program
> that keeps doing its job (appending data to the list once every
> second) while another part computes something on the data of the same
> list, ignoring the new data being written?
>
> Basically the question boils down to wether it is possible to have
> parts of a program (could be functions) that keep doing their job
> while other parts do something else on the same data, and what is the
> best way to do this.

You might be able to do what you need by making the file system work for
you:

Use numbered files, something like DATA/0001, DATA/0002, etc.

Start by initializing a file number variable to 1 and creating an empty
file, DATA/0001. The current time will be your start time.

In an infinite loop, just as in Stefan's example:

Read from the network and append to the current data file. This
shouldn't take long unless the file is on a remote system.

If six hours have gone by (compare the current time to the start time),
close the current date file, create a thread (see Stefan's example) to
call your FFT with the name of the current file, increment the file
number, and open a new empty data file.

If you want to, you can consolidate files every week or so. The Python
library has functions that will let you get a list files in a directory.
If you're on a Linux or UNIX system, you can use shell commands to
append, copy or rename files.

Have fun.

Louis







--
https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python [ In reply to ]
>> But, an easier and often
>> better option for concurrent data access is use a (relational)
>> database, then the appropriate transaction isolation levels
>> when reading and/or writing.
>>
>
> That would obviusly save some coding (but would introduce the need to
> code the interaction with the database), but I'm not sure it would speed
> up the thing. Would the RDBMS allow to read a table while something else
> is writing to it? I doubt it and I'm not sure it doesn't flush the cache
> before letting you read, which would include a normally slow disk access.

SQLite for example allows only 1 write transaction at a time, but in WAL mode you can have as many read transactions as you want all going along at the same time as that 1 writer. It also allows you to specify how thorough it is in flushing data to disk, including not forcing a sync to disk at all and just leaving that to the OS to do on its own time.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On 2022-08-08 12:20, Stefan Ram wrote:
> Andreas Croci <andrea.croci@gmx.de> writes:
>>Basically the question boils down to wether it is possible to have parts
>>of a program (could be functions) that keep doing their job while other
>>parts do something else on the same data, and what is the best way to do
>>this.
>
> Yes, but this is difficult. If you ask this question here,
> you might not be ready for this.
>
> I haven't learned it yet myself, but nevertheless tried to
> write a small example program quickly, which might still
> contain errors because of my lack of education.
>
> import threading
> import time
>
> def write_to_list( list, lock, event ):
> for i in range( 10 ):
> lock.acquire()
> try:
> list.append( i )
> finally:
> lock.release()
> event.set()
> time.sleep( 3 )
>
> def read_from_list( list, lock, event ):
> while True:
> event.wait()
> print( "Waking up." )
> event.clear()
> if len( list ):
> print( "List contains " + str( list[ 0 ]) + "." )
> lock.acquire()
> try:
> del list[ 0 ]
> finally:
> lock.release()
> else:
> print( "List is empty." )
>
> list = []
> lock = threading.Lock()
> event = threading.Event()
> threading.Thread( target=write_to_list, args=[ list, lock, event ]).start()
> threading.Thread( target=read_from_list, args=[ list, lock, event ]).start()
>
> In basketball, first you must learn to dribble and pass,
> before you can begin to shoot.
>
> With certain reservations, texts that can be considered
> to learn Python are:
>
> "Object-Oriented Programming in Python Documentation" - a PDF file,
> Introduction to Programming Using Python - Y Daniel Liang (2013),
> How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12),
> The Coder's Apprentice - Pieter Spronck (2016-09-21), and
> Python Programming - John Zelle (2009).
>
When working with threads, you should use queues, not lists, because
queues do their own locking and can wait for items to arrive, with a
timeout, if desired:


import queue
import threading
import time

def write_to_item_queue(item_queue):
for i in range(10):
print("Put", i, "in queue.", flush=True)
item_queue.put(i)
time.sleep(3)

# Using None to indicate that there's no more to come.
item_queue.put(None)

def read_from_item_queue(item_queue):
while True:
try:
item = item_queue.get()
except item_queue.Empty:
print("Queue is empty; should've have got here!", flush=True)
else:
print("Queue contains " + str(item) + ".", flush=True)

if item is None:
# Using None to indicate that there's no more to come.
break

item_queue = queue.Queue()

write_thread = threading.Thread(target=write_to_item_queue,
args=[item_queue])
write_thread.start()

read_thread = threading.Thread(target=read_from_item_queue,
args=[item_queue])
read_thread.start()

# Wait for the threads to finish.
write_thread.join()
read_thread.join()

print("Finished.")
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
> On 8 Aug 2022, at 20:24, MRAB <python@mrabarnett.plus.com> wrote:
>
> ?On 2022-08-08 12:20, Stefan Ram wrote:
>> Andreas Croci <andrea.croci@gmx.de> writes:
>>> Basically the question boils down to wether it is possible to have parts of a program (could be functions) that keep doing their job while other parts do something else on the same data, and what is the best way to do this.
>> Yes, but this is difficult. If you ask this question here,
>> you might not be ready for this.
>> I haven't learned it yet myself, but nevertheless tried to
>> write a small example program quickly, which might still
>> contain errors because of my lack of education.
>> import threading
>> import time
>> def write_to_list( list, lock, event ):
>> for i in range( 10 ):
>> lock.acquire()
>> try:
>> list.append( i )
>> finally:
>> lock.release()
>> event.set()
>> time.sleep( 3 )
>> def read_from_list( list, lock, event ):
>> while True:
>> event.wait()
>> print( "Waking up." )
>> event.clear()
>> if len( list ):
>> print( "List contains " + str( list[ 0 ]) + "." )
>> lock.acquire()
>> try:
>> del list[ 0 ]
>> finally:
>> lock.release()
>> else:
>> print( "List is empty." )
>> list = []
>> lock = threading.Lock()
>> event = threading.Event()
>> threading.Thread( target=write_to_list, args=[ list, lock, event ]).start()
>> threading.Thread( target=read_from_list, args=[ list, lock, event ]).start()
>> In basketball, first you must learn to dribble and pass,
>> before you can begin to shoot.
>> With certain reservations, texts that can be considered
>> to learn Python are:
>> "Object-Oriented Programming in Python Documentation" - a PDF file,
>> Introduction to Programming Using Python - Y Daniel Liang (2013),
>> How to Think Like a Computer Scientist - Peter Wentworth (2012-08-12),
>> The Coder's Apprentice - Pieter Spronck (2016-09-21), and
>> Python Programming - John Zelle (2009).
> When working with threads, you should use queues, not lists, because queues do their own locking and can wait for items to arrive, with a timeout, if desired:

Lists do not need to be locked in python because of the GIL.
However you need locks to synchronise between threads.
And as you say a queue has all that locking built in.

Barry

>
>
> import queue
> import threading
> import time
>
> def write_to_item_queue(item_queue):
> for i in range(10):
> print("Put", i, "in queue.", flush=True)
> item_queue.put(i)
> time.sleep(3)
>
> # Using None to indicate that there's no more to come.
> item_queue.put(None)
>
> def read_from_item_queue(item_queue):
> while True:
> try:
> item = item_queue.get()
> except item_queue.Empty:
> print("Queue is empty; should've have got here!", flush=True)
> else:
> print("Queue contains " + str(item) + ".", flush=True)
>
> if item is None:
> # Using None to indicate that there's no more to come.
> break
>
> item_queue = queue.Queue()
>
> write_thread = threading.Thread(target=write_to_item_queue, args=[item_queue])
> write_thread.start()
>
> read_thread = threading.Thread(target=read_from_item_queue, args=[item_queue])
> read_thread.start()
>
> # Wait for the threads to finish.
> write_thread.join()
> read_thread.join()
>
> print("Finished.")
> --
> https://mail.python.org/mailman/listinfo/python-list
>

--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On 2022-08-08 13:53:20 +0200, Andreas Croci wrote:
> I'm in principle ok with locks, if it must be. What I fear is that the lock
> could last long and prevent the function that writes into the list from
> doing so every second. With an FFT on a list that contains a few bytes taken
> every second over one week time (604.800 samples), I believe it's very
> likely that the FFT function takes longer than a second to return.

You woudn't lock the part performing the FFT, of course, only the part
manipulating the shared list.

That said, CPython (the reference implementation of Python) has what is
called the Global Interpreter Lock (GIL) which locks every single Python
instruction. So you can't have two threads actually computing anything
at the same time - at least not if the computation is written in Python.
Math packages like Numpy may or may not release the lock while they are
busy.

hp

PS: I also agree with what others have said about the perils of
multi-threaded programming.

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Re: Parallel(?) programming with python [ In reply to ]
On 08Aug2022 11:20, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
>Andreas Croci <andrea.croci@gmx.de> writes:
>>Basically the question boils down to wether it is possible to have parts
>>of a program (could be functions) that keep doing their job while other
>>parts do something else on the same data, and what is the best way to do
>>this.
>
> Yes, but this is difficult. If you ask this question here,
> you might not be ready for this.

This is a very standard requirement for any concurrent activity and the
typical approach is a mutex (mutual exclusion). You've already hit on
the "standard" approach: a `threading.Lock` object.

> lock.acquire()
> try:
> list.append( i )
> finally:
> lock.release()

Small note, which makes writing this much clearer. Lock objects are
context managers. So:

with lock:
list.append(i)

is all you need.

Cheers,
Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On Mon, 8 Aug 2022 at 19:01, Andreas Croci <andrea.croci@gmx.de> wrote:
>
> tI would like to write a program, that reads from the network a fixed
> amount of bytes and appends them to a list. This should happen once a
> second.
>
> Another part of the program should take the list, as it has been filled
> so far, every 6 hours or so, and do some computations on the data (a FFT).
>
> Every so often (say once a week) the list should be saved to a file,
> shorthened in the front by so many items, and filled further with the
> data coming fom the network. After the first saving of the whole list,
> only the new part (the data that have come since the last saving) should
> be appended to the file. A timestamp is in the data, so it's easy to say
> what is new and what was already there.
>
> I'm not sure how to do this properly: can I write a part of a program
> that keeps doing its job (appending data to the list once every second)
> while another part computes something on the data of the same list,
> ignoring the new data being written?
>
> Basically the question boils down to wether it is possible to have parts
> of a program (could be functions) that keep doing their job while other
> parts do something else on the same data, and what is the best way to do
> this.

Why do these "parts of a program" need to be part of the *same*
program. I would write this as just two separate programs. One
collects the data and writes it to a file. The other periodically
reads the file and computes the DFT.

Note that a lot of the complexity discussed in other posts to do with
threads and locks etc comes from the supposed constraint that this
needs to be done with threads or something else that can work in
parallel *within the same program*. If you relax that constraint the
problem becomes a lot simpler.

--
Oscar
--
https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python [ In reply to ]
Stefan,

You are correct that the goal of a lock is to do something rather quickly
and atomically, so your design should not do something complex or long
before releasing the lock.

In your example, you have a producer adding data as regularly as every
second and another that wakes up rarely and processes all the data since the
last time. So you may want to augment the code you had to do something fast
like point another variable at the data gathered so far and move the
original variable to an empty list or whatever. Then you release the lock
within fractions of a second and let the regular job keep adding to the
initially empty list while the other part of the code processes without a
lock.

A design like the above has the busy worker constantly checking the lock. An
alternative if you are sure the other process will only show up almost
exactly at 6 hours on the clock, is to have the busy one check the time
instead, but that may be more expensive.

Still other architectures are possible, such as writing to not a single list
for six hours, but some data structure with multiple sub-lists such as one
where you switch every minute or so. The second process can note how many
entries there are at the moment, and does all but the last and notes the
location so the next time it starts there. This would work if you did not
need every last bit of data as the two do not interfere with each other. And
no real locks would be needed as the only thing the two parts share is the
position or identity of the current last fragment which only one process
actually touches.

Just some ideas. Lots of other variations are very possible.



-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
Behalf Of Stefan Ram
Sent: Monday, August 8, 2022 7:21 AM
To: python-list@python.org
Subject: Re: Parallel(?) programming with python

Andreas Croci <andrea.croci@gmx.de> writes:
>Basically the question boils down to wether it is possible to have
>parts of a program (could be functions) that keep doing their job while
>other parts do something else on the same data, and what is the best
>way to do this.

Yes, but this is difficult. If you ask this question here,
you might not be ready for this.

I haven't learned it yet myself, but nevertheless tried to
write a small example program quickly, which might still
contain errors because of my lack of education.

import threading
import time

def write_to_list( list, lock, event ):
for i in range( 10 ):
lock.acquire()
try:
list.append( i )
finally:
lock.release()
event.set()
time.sleep( 3 )

def read_from_list( list, lock, event ):
while True:
event.wait()
print( "Waking up." )
event.clear()
if len( list ):
print( "List contains " + str( list[ 0 ]) + "." )
lock.acquire()
try:
del list[ 0 ]
finally:
lock.release()
else:
print( "List is empty." )

list = []
lock = threading.Lock()
event = threading.Event()
threading.Thread( target=write_to_list, args=[ list, lock, event ]).start()
threading.Thread( target=read_from_list, args=[ list, lock, event ]).start()

In basketball, first you must learn to dribble and pass,
before you can begin to shoot.

With certain reservations, texts that can be considered
to learn Python are:

"Object-Oriented Programming in Python Documentation" - a PDF file,
Introduction to Programming Using Python - Y Daniel Liang (2013), How to
Think Like a Computer Scientist - Peter Wentworth (2012-08-12), The Coder's
Apprentice - Pieter Spronck (2016-09-21), and Python Programming - John
Zelle (2009).


--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On Mon, 8 Aug 2022 19:39:27 +0200, Andreas Croci <andrea.croci@gmx.de>
declaimed the following:

>
>Do you mean queues in the sense of deque (the data structure)? I ask
>because I can see the advantage there when I try to pop data from the
>front of it, but I don't see the sense of the following statement ("than

Most likely this was a reference to the Queue module -- which is used
to pass data from one thread to another. Your "fetch" thread would package
up the "new" data to be processed by the FFT thread. The FFT thread is
blocked waiting for data to appear on the queue -- when it appears, the FFT
thread reads the entire packet of data and proceeds to process it.

Note that in this scheme, the FFT thread is NOT on a timer -- the fetch
thread controls the timing by when it puts data into the queue.

cf:
https://docs.python.org/3/library/threading.html
https://docs.python.org/3/library/queue.html

>
>That would obviusly save some coding (but would introduce the need to
>code the interaction with the database), but I'm not sure it would speed
>up the thing. Would the RDBMS allow to read a table while something else
>is writing to it? I doubt it and I'm not sure it doesn't flush the cache
>before letting you read, which would include a normally slow disk access.
>

Depends upon the RDBMs. Some are "multi-version concurrency" -- they
snapshot the data at the time of the read, while letting new writes
proceed. But if one is doing read/modify/write, this can cause a problem as
the RDBM will detect that a record was modified by someone else and prevent
you from changing it -- you have to reselect the data to get the current
version.

You will want to treat each of your network fetches as a transaction --
and close the transaction fast. Your FFT process would need to select all
data in the range to be processed, and load it into memory so you can free
that transaction

https://www.sqlite.org/lockingv3.html See section 3.0 and section 5.0



--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On 09Aug2022 00:22, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
>On Mon, 8 Aug 2022 at 19:01, Andreas Croci <andrea.croci@gmx.de> wrote:
>> Basically the question boils down to wether it is possible to have
>> parts
>> of a program (could be functions) that keep doing their job while other
>> parts do something else on the same data, and what is the best way to do
>> this.

Which is of course feasible, as others have outlined.

>Why do these "parts of a program" need to be part of the *same*
>program. I would write this as just two separate programs. One
>collects the data and writes it to a file. The other periodically
>reads the file and computes the DFT.

I would also write these as separate programmes, or at least as distinct
modes of the same programme (eg "myprog poll" and "myprog archive" etc).
Largely because you might run the "poll" regularly and briefly, and the
processes phase separately and less frequently. You don't need to keep a
single programme lurking around forever - fire it up as required.

However, I want to point out that this _in no way_ removes the need for
access contol and mutexes. It will change the mechanism (because your
two programmes are now operating separately) and makes it more concrete
in your mind what _actually and precisely_ needs protection.

For example, you probably want to avoid _processing_ a data file at the
same time as _updating_ that file. Depending on what you're doing this
can be as simple as keeping "to be updated" files with distinct names
from "available to be processed/archived" files. This is a standard
difficulty with "hot folder" upload areas.

A common approach might be to write a file with a "temp" style name (eg
".tmp*") until completed, then rename it to its official name (eg
"datafile*"). And then your processing/archiving side can simply ignore
the "in progress" files because they do not match the names it cares
about.

Anyway, those are specifics, which will be driven by what you're
actually doing. The point is that you still need to coordinate use of
the files suitably for your needs. Doing this in one long running
programme with Threads/mutexes or separate programmes sharing a data
directory just changes the mechanisms.

Cheers,
Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
Queues are better than lists for concurrency. If you get the right kind,
they have implicit locking, making your code simpler and more robust at the
same time.

CPython threading is mediocre for software systems that have one or more
CPU-bound threads, and your FFT might be CPU-bound.

Rather than using threading directly, you probably should use
https://docs.python.org/3/library/concurrent.futures.html , which gives you
easy switching between threads and processes.

Or if you, like me, get inordinately joyous over programs that run on more
than one kind of Python, you could give up concurrent.futures and use
_thread. Sadly, that gives up easy flipping between threads and processes,
but gives you easy flipping between CPython and micropython. Better still,
micropython appears to have more scalable threading than CPython, so if you
decide you need 20 CPU-hungry threads someday, you are less likely to be in
a bind.

For reading from a socket, if you're not going the REST route, may I
suggest https://stromberg.dnsalias.org/~strombrg/bufsock.html ? It deals
with framing and lengths relatively smoothly. Otherwise, robust socket
code tends to need while loops and tedious arithmetic.

HTH

On Mon, Aug 8, 2022 at 10:59 AM Andreas Croci <andrea.croci@gmx.de> wrote:

> I would like to write a program, that reads from the network a fixed
> amount of bytes and appends them to a list. This should happen once a
> second.
>
--
https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python [ In reply to ]
Why would this application *require* parallel programming? This could be done in one, single thread program. Call time to get time and save it as start_time. Keep a count of the number of 6 hour intervals, initialize it to 0.

Once a second read data an append to list. At 6 hours after start time, call a function that does an FFT (see comment about scipy below) and increment the count of 6 hour intervals. Call time and save new start time. Continue execution.

After 28 six hour intervals, save the list and then slice the list to shorten it as you want. Reset the count of 6 hour intervals to zero.

The FFT might take a second, even if you use scipy, depending on how long the list is (If you don’t know about numpy and scipy, look them up! You need them. Your list can be an array in numpy).
Saving and slicing the list should take less than a second.

This single thread approach avoids thinking about multiprocessing, locking and unlocking data structures, all that stuff that does not contribute to the goal of the program.

--- Joseph S.


Teledyne Confidential; Commercially Sensitive Business Data

-----Original Message-----
From: Andreas Croci <andrea.croci@gmx.de>
Sent: Monday, August 8, 2022 6:47 AM
To: python-list@python.org
Subject: Parallel(?) programming with python

tI would like to write a program, that reads from the network a fixed amount of bytes and appends them to a list. This should happen once a second.

Another part of the program should take the list, as it has been filled so far, every 6 hours or so, and do some computations on the data (a FFT).

Every so often (say once a week) the list should be saved to a file, shorthened in the front by so many items, and filled further with the data coming fom the network. After the first saving of the whole list, only the new part (the data that have come since the last saving) should be appended to the file. A timestamp is in the data, so it's easy to say what is new and what was already there.

I'm not sure how to do this properly: can I write a part of a program that keeps doing its job (appending data to the list once every second) while another part computes something on the data of the same list, ignoring the new data being written?

Basically the question boils down to wether it is possible to have parts of a program (could be functions) that keep doing their job while other parts do something else on the same data, and what is the best way to do this.
--
https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python [ In reply to ]
Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:
>Why would this application *require* parallel programming? This could be done in one, single thread program. Call time to get time and save it as start_time. Keep a count of the number of 6 hour intervals, initialize it to 0.

You could also use the `sched` module from Python's library.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" <dieter@handshake.de>
declaimed the following:

>Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:
>>Why would this application *require* parallel programming? This could be done in one, single thread program. Call time to get time and save it as start_time. Keep a count of the number of 6 hour intervals, initialize it to 0.
>
>You could also use the `sched` module from Python's library.

<sigh> Time to really read the library reference manual again...

Though if I read this correctly, a long running action /will/ delay
others -- which could mean the (FFT) process could block collecting new
1-second readings while it is active. It also is "one-shot" on the
scheduled actions, meaning those actions still have to reschedule
themselves for the next time period.


--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On 2022-08-09 at 17:04:51 +0000,
"Schachner, Joseph (US)" <Joseph.Schachner@Teledyne.com> wrote:

> Why would this application *require* parallel programming? This could
> be done in one, single thread program. Call time to get time and save
> it as start_time. Keep a count of the number of 6 hour intervals,
> initialize it to 0.

In theory, you are correct.

In practice, [stuff] happens. What if your program crashes? Or the
computer crashes? Or there's a Python update? Or an OS update? Where
does all that pending data go, and how will you recover it after you've
addressed whatever happened? ?

OTOH, once you start writing the pending data to a file, then it's an
extremely simple leap to multiple programs (rather than multiple
threads) for all kinds of good reasons.

? FWIW, I used to develop highly available systems, such as telephone
switches, which allow [stuff] to happen, and yet continue to function.
It's pretty cool to yank a board (yes, physically remove it, without
warning) from the system without [apparently] disrupting anything. Such
systems also allow for hardware, OS, and application upgrades, too
(IIRC, we were allowed a handful of seconds of downtime per year to meet
our availability requirements). That said, designing and building such
a system for the sakes of simplicity and convenience of the application
we're talking about here would make a pretty good definition of
"overkill."
--
https://mail.python.org/mailman/listinfo/python-list
RE: Parallel(?) programming with python [ In reply to ]
There are many possible discussions we can have here and some are not really
about whether and how to use Python.

The user asked how to do what is a fairly standard task for some people and
arguably is not necessarily best done using a single application running
things in parallel.

So, yes, if you have full access to your machine and can schedule tasks,
then some obvious answers come to mind where one process listens and
receives data and stores it, and another process periodically wakes up and
grabs recent data and processes it and perhaps still another process comes
up even less often and does some re-arrangement of old data.

And, yes, for such large volumes of data it may be a poor design to hold all
the data in memory for many hours or even days and various ways of using a
database or files/folders with a naming structure are a good idea.

But the original question remains, in my opinion, a not horrible one. All
kinds of applications can be written with sets of tasks run largely in
parallel with some form of communication between tasks using shared data
structures like queues and perhaps locks and with a requirement that any
tasks that take nontrivial time need a way to buffer any communications to
not block others.

Also, for people who want to start ONE process and let it run, and perhaps
may not be able to easily schedule other processes on a system level, it can
be advantageous to know how to set up something along those lines within a
single python session.

Of course, for efficiency reasons, any I/O to files slows things down but
what is described here as the situation seems to be somewhat easier and
safer to do in so many other ways. I think a main point is that there are
good ways to avoid the data from being acted on by two parties that share
memory. One is NOT to share memory for this purpose. Another might be to
have the 6-hour process use a lock to move the data aside or send a message
to the receiving process to pause a moment and set the data aside and begin
collecting anew while the old is processed and so on.

There are many such choices and the parts need not be in the same process or
all written in python. But some solutions can be generalized easier than
others. For example, can there become a need to collect data from multiple
sources, perhaps using multiple listeners?

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
Behalf Of Dieter Maurer
Sent: Wednesday, August 10, 2022 1:33 PM
To: Schachner, Joseph (US) <Joseph.Schachner@Teledyne.com>
Cc: Andreas Croci <andrea.croci@gmx.de>; python-list@python.org
Subject: RE: Parallel(?) programming with python

Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:
>Why would this application *require* parallel programming? This could be
done in one, single thread program. Call time to get time and save it as
start_time. Keep a count of the number of 6 hour intervals, initialize it
to 0.

You could also use the `sched` module from Python's library.
--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
On 2022-08-10 14:19:37 -0400, Dennis Lee Bieber wrote:
> On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" <dieter@handshake.de>
> declaimed the following:
> >Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:
> >>Why would this application *require* parallel programming? This
> >>could be done in one, single thread program. Call time to get time
> >>and save it as start_time. Keep a count of the number of 6 hour
> >>intervals, initialize it to 0.
[...]
> Though if I read this correctly, a long running action /will/
> delay others -- which could mean the (FFT) process could block
> collecting new 1-second readings while it is active.

Certainly, but does it matter? Data is received from some network
connection and network connections often involve quite a bit of
buffering. If the consumer is blocked for 3 or 4 or maybe even 20
seconds, the producer might not even notice. (This of course depends
very much on the details which we know nothing about.)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Re: Parallel(?) programming with python [ In reply to ]
Thanks again for the info.

On Wed, Aug 10, 2022 at 9:31 PM Peter J. Holzer <hjp-python@hjp.at> wrote:

> On 2022-08-10 14:19:37 -0400, Dennis Lee Bieber wrote:
> > On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" <dieter@handshake.de
> >
> > declaimed the following:
> > >Schachner, Joseph (US) wrote at 2022-8-9 17:04 +0000:
> > >>Why would this application *require* parallel programming? This
> > >>could be done in one, single thread program. Call time to get time
> > >>and save it as start_time. Keep a count of the number of 6 hour
> > >>intervals, initialize it to 0.
> [...]
> > Though if I read this correctly, a long running action /will/
> > delay others -- which could mean the (FFT) process could block
> > collecting new 1-second readings while it is active.
>
> Certainly, but does it matter? Data is received from some network
> connection and network connections often involve quite a bit of
> buffering. If the consumer is blocked for 3 or 4 or maybe even 20
> seconds, the producer might not even notice. (This of course depends
> very much on the details which we know nothing about.)
>
> hp
>
> --
> _ | Peter J. Holzer | Story must make more sense than reality.
> |_|_) | |
> | | | hjp@hjp.at | -- Charles Stross, "Creative writing
> __/ | http://www.hjp.at/ | challenge!"
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
Please let me know if that is okay.

On Wed, Aug 10, 2022 at 7:46 PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:

> On 2022-08-09 at 17:04:51 +0000,
> "Schachner, Joseph (US)" <Joseph.Schachner@Teledyne.com> wrote:
>
> > Why would this application *require* parallel programming? This could
> > be done in one, single thread program. Call time to get time and save
> > it as start_time. Keep a count of the number of 6 hour intervals,
> > initialize it to 0.
>
> In theory, you are correct.
>
> In practice, [stuff] happens. What if your program crashes? Or the
> computer crashes? Or there's a Python update? Or an OS update? Where
> does all that pending data go, and how will you recover it after you've
> addressed whatever happened? ¹
>
> OTOH, once you start writing the pending data to a file, then it's an
> extremely simple leap to multiple programs (rather than multiple
> threads) for all kinds of good reasons.
>
> ¹ FWIW, I used to develop highly available systems, such as telephone
> switches, which allow [stuff] to happen, and yet continue to function.
> It's pretty cool to yank a board (yes, physically remove it, without
> warning) from the system without [apparently] disrupting anything. Such
> systems also allow for hardware, OS, and application upgrades, too
> (IIRC, we were allowed a handful of seconds of downtime per year to meet
> our availability requirements). That said, designing and building such
> a system for the sakes of simplicity and convenience of the application
> we're talking about here would make a pretty good definition of
> "overkill."
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
Dennis Lee Bieber wrote at 2022-8-10 14:19 -0400:
>On Wed, 10 Aug 2022 19:33:04 +0200, "Dieter Maurer" <dieter@handshake.de>
> ...
>>You could also use the `sched` module from Python's library.
>
> <sigh> Time to really read the library reference manual again...
>
> Though if I read this correctly, a long running action /will/ delay
>others -- which could mean the (FFT) process could block collecting new
>1-second readings while it is active. It also is "one-shot" on the
>scheduled actions, meaning those actions still have to reschedule
>themselves for the next time period.

Both true.

With `multiprocessing`, you can delegate long running activity
to a separate process.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Parallel(?) programming with python [ In reply to ]
I would like to thank everybody who answered my question. The insight
was very informative. This seems to be one of the few newsgroups still
alive and kicking, with a lot of knowledgeable people taking the time to
help others. I like how quick and easy it is to post questions and
receive answers here as compared to web-based forums (although there are
some disadvantages too).

I'm implementing some of the ideas received here and I will surely have
other questions as I go. But the project will take a long time because
I'm doing this as a hobby during my vacation, that are unfortunately
about to end.

Thanks again, Community.

On 08.08.22 12:47, Andreas Croci wrote:
> tI would like to write a program, that reads from the network a fixed
> amount of bytes and appends them to a list. This should happen once a
> second.
>
> Another part of the program should take the list, as it has been filled
> so far, every 6 hours or so, and do some computations on the data (a FFT).
>
> Every so often (say once a week) the list should be saved to a file,
> shorthened in the front by so many items, and filled further with the
> data coming fom the network. After the first saving of the whole list,
> only the new part (the data that have come since the last saving) should
> be appended to the file. A timestamp is in the data, so it's easy to say
> what is new and what was already there.
>
> I'm not sure how to do this properly: can I write a part of a program
> that keeps doing its job (appending data to the list once every second)
> while another part computes something on the data of the same list,
> ignoring the new data being written?
>
> Basically the question boils down to wether it is possible to have parts
> of a program (could be functions) that keep doing their job while other
> parts do something else on the same data, and what is the best way to do
> this.

--
https://mail.python.org/mailman/listinfo/python-list