Mailing List Archive

Determine what the calling program is
I should state at the start that I have a solution to my problem. I am
writing to see if there is a better solution.

I have a program that runs via crontab every five minutes. It polls a
Box.com folder for files and, if any are found, it copies them locally and
performs a computation on them that can exceed five minutes. It pushes the
results back up to Box. (Box.com ensures that only complete files are
visible when I poll.) Files are dropped into this Box.com folder rarely,
but to ensure a good customer experience I do not want to set my crontab to
run less frequently. My hardware cannot support multiple simultaneous
computations.

I have written a piece of code to detect if more than 1 instance of my
program is running, and I put this code into a separate module (support.py)
so that other programs can use it.

support.py contains:
--------------------------------------------------------------------------------
import sys
def check(calling_program):
import psutil
# some logic here to count
# count = N
if count > 1:
print(f"I was called by {calling_program}.")
sys.exit()
if __name__ == "__main__":
check()
--------------------------------------------------------------------------------

actual-program.py contains:
--------------------------------------------------------------------------------
import support.py
support.check(__file__)
# Poll, and if files download, perform expensive computations, push results
--------------------------------------------------------------------------------

To me it would be more elegant to be able to do something like this:

def check():
# Something here that tells me the name of the calling program
import psutil
# ...

And then the calling program just does:
support.check()
--
https://mail.python.org/mailman/listinfo/python-list
Re: Determine what the calling program is [ In reply to ]
Jason Friedman wrote at 2021-4-18 07:46 -0600:
> ...
>I have a program that runs via crontab every five minutes. It polls a
>Box.com folder for files and, if any are found, it copies them locally and
>performs a computation on them that can exceed five minutes. It pushes the
>results back up to Box. (Box.com ensures that only complete files are
>visible when I poll.) Files are dropped into this Box.com folder rarely,
>but to ensure a good customer experience I do not want to set my crontab to
>run less frequently. My hardware cannot support multiple simultaneous
>computations.

Programs typically use some form of file locking to detect
attempts to run the same program multiple times.

The optimal form of locking depends on the operating system.
Under *nix, so called advisory locks seem promising.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Determine what the calling program is [ In reply to ]
On Sun, Apr 18 2021 at 07:46:53 AM, Jason Friedman <jsf80238@gmail.com> wrote:
> I should state at the start that I have a solution to my problem. I am
> writing to see if there is a better solution.
>
> I have a program that runs via crontab every five minutes. It polls a
> Box.com folder for files and, if any are found, it copies them locally and
> performs a computation on them that can exceed five minutes. It pushes the
> results back up to Box. (Box.com ensures that only complete files are
> visible when I poll.) Files are dropped into this Box.com folder rarely,
> but to ensure a good customer experience I do not want to set my crontab to
> run less frequently. My hardware cannot support multiple simultaneous
> computations.
>
> I have written a piece of code to detect if more than 1 instance of my
> program is running, and I put this code into a separate module (support.py)
> so that other programs can use it.
>
> support.py contains:
> --------------------------------------------------------------------------------
> import sys
> def check(calling_program):
> import psutil
> # some logic here to count
> # count = N
> if count > 1:
> print(f"I was called by {calling_program}.")
> sys.exit()
> if __name__ == "__main__":
> check()
> --------------------------------------------------------------------------------
>
> actual-program.py contains:
> --------------------------------------------------------------------------------
> import support.py
> support.check(__file__)
> # Poll, and if files download, perform expensive computations, push results
> --------------------------------------------------------------------------------
>
> To me it would be more elegant to be able to do something like this:
>
> def check():
> # Something here that tells me the name of the calling program
> import psutil
> # ...
>
> And then the calling program just does:
> support.check()

The standard library provides locking primitives in the Unix-specific
fcntl module. You can use those to make sure only a single instance of
your process runs. Use the non-blocking forms of the lock to ensure
that if you are unable to get the lock, you exit rather than wait. If
your process waits for locks, the crontab will keep piling on waiters.

There are libraries[1][2] on pypi that wrap the platform-specific
locking primitives and provide terse APIs.

A simpler solution might be to use the flock(1) command, if you have it
available, directly in the crontab entry.

[1] https://pypi.org/project/fasteners/
[2] https://pypi.org/project/oslo.concurrency/

--
regards,
kushal
--
https://mail.python.org/mailman/listinfo/python-list
Re: Determine what the calling program is [ In reply to ]
On 19/04/2021 01.46, Jason Friedman wrote:
> I should state at the start that I have a solution to my problem. I am
> writing to see if there is a better solution.
>
> I have a program that runs via crontab every five minutes. It polls a
> Box.com folder for files and, if any are found, it copies them locally and
> performs a computation on them that can exceed five minutes. It pushes the
> results back up to Box. (Box.com ensures that only complete files are
> visible when I poll.) Files are dropped into this Box.com folder rarely,
> but to ensure a good customer experience I do not want to set my crontab to
> run less frequently. My hardware cannot support multiple simultaneous
> computations.
>
> I have written a piece of code to detect if more than 1 instance of my
> program is running, and I put this code into a separate module (support.py)
> so that other programs can use it.


In a similar situation, one of my teams used an (OpSys) environment
variable (available in both *nux and MS-Win).
- when the application starts, it checks for the variable
- if exists, stops running, else may proceed

During code review (when I noticed this tactic) I was slightly
surprised, because back when I was young (men were men, and knights were
bold, ...), we used file-stubs.


However, such systems face two complementary, potential-problems:
'single-instance' (which is being addressed), and 'blocking-instance'.

If there is a risk that the long-running computations may fail into a
never-ending loop, the system effectively dies (but silently!) and
source files (ie at Box.com) may accumulate without receiving attention.

Accordingly, the above-mentioned environment-variable was filled with a
time-stamp. Then a second step in the check-routine reviewed the time
since the 'blocking' instance started, in order to log or raise suitable
alerts if things went awry.

YMMV!


An alternative, if the system already uses a database, is to keep a
local record in the DB of all the files lodged at box.com. This can
include a note that each file has/not been processed (plus any other
stats or logging you may deem appropriate). A third state would be 'in
process'. Now, at start-up, the application can quickly check to see if
there is any file in that state...
--
Regards,
=dn
--
https://mail.python.org/mailman/listinfo/python-list
Re: Determine what the calling program is [ In reply to ]
On 18Apr2021 07:46, Jason Friedman <jsf80238@gmail.com> wrote:
>I should state at the start that I have a solution to my problem. I am
>writing to see if there is a better solution.
[...]
>I have written a piece of code to detect if more than 1 instance of my
>program is running, and I put this code into a separate module (support.py)
>so that other programs can use it.
[... sniff the process table ...]

Sniffing ps has always seemed unreliable to me.

It is usually better to use some kind of filesystem based lock, named to
represent your task.

My personal preference is lock directories. Shell version goes like
this:

if mkdir /my/lock/directory/name-of-task
then
.. do task ..
rmdir /my/lock/directory/name-of-task
else
echo "lock /my/lock/directory/name-of-task already taken"
fi

Simple, reliable, even works over NFS if you care.

In Python this looks like (sketch, untested):

try:
os.mkdir('/my/lock/directory/name-of-task')
except FileExistsError:
error("lock taken")
else:
.. do task ..
os.rmdir('/my/lock/directory/name-of-task')

You can even put a pid file in there for added richness, identifying the
pid of the competing process. Or whatever.

You can also make O_EXCL or O_CREAT/unwriteable files for locks:

# untested, check spelling etc
os.open('/my/lock/directory/name-of-task', O_CREAT|O_WRONLY, 0o000)

On a UNIX system this opens an unwriteable file for write (you get to
open it for write because it is new, but its permissions are
unwriteable, preventing anyone else from opening it for write).

These (mkdir, os.open) have the benefits of making a nice direct
filesystem object rather than hoping to see your task in ps. And ps
sniffing is racey, in addition to its other issues.

Cheers,
Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Determine what the calling program is [ In reply to ]
I use this little program for shell-level locking.

It just checks for a pid file. If the pid file does not exist, or the pid
no longer exists, it'll start the process, and write the new process' pid
to the pid file.

It's at:
https://stromberg.dnsalias.org/svn/just-one/

...and usage looks like:
$ ./just-one -h
below cmd output started 2021 Sun Apr 18 04:44:03 PM PDT
Usage: ./just-one --command command --string string

The "string" part needs to be a unique identifier for each process you want
only one of. The command, naturally, is a shell command.

I just noticed that I don't have a web page describing it yet. I'll
probably set one up a little later.

It does not use locking: advisory or mandatory. Just a lock file
containing a pid.

HTH.


On Sun, Apr 18, 2021 at 6:47 AM Jason Friedman <jsf80238@gmail.com> wrote:

> I should state at the start that I have a solution to my problem. I am
> writing to see if there is a better solution.
>
> I have a program that runs via crontab every five minutes. It polls a
> Box.com folder for files and, if any are found, it copies them locally and
> performs a computation on them that can exceed five minutes. It pushes the
> results back up to Box. (Box.com ensures that only complete files are
> visible when I poll.) Files are dropped into this Box.com folder rarely,
> but to ensure a good customer experience I do not want to set my crontab to
> run less frequently. My hardware cannot support multiple simultaneous
> computations.
>
> I have written a piece of code to detect if more than 1 instance of my
> program is running, and I put this code into a separate module (support.py)
> so that other programs can use it.
>
> support.py contains:
>
> --------------------------------------------------------------------------------
> import sys
> def check(calling_program):
> import psutil
> # some logic here to count
> # count = N
> if count > 1:
> print(f"I was called by {calling_program}.")
> sys.exit()
> if __name__ == "__main__":
> check()
>
> --------------------------------------------------------------------------------
>
> actual-program.py contains:
>
> --------------------------------------------------------------------------------
> import support.py
> support.check(__file__)
> # Poll, and if files download, perform expensive computations, push results
>
> --------------------------------------------------------------------------------
>
> To me it would be more elegant to be able to do something like this:
>
> def check():
> # Something here that tells me the name of the calling program
> import psutil
> # ...
>
> And then the calling program just does:
> support.check()
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Determine what the calling program is [ In reply to ]
Re: Determine what the calling program is [ In reply to ]
> On 18 Apr 2021, at 14:46, Jason Friedman <jsf80238@gmail.com> wrote:
>
> I should state at the start that I have a solution to my problem. I am
> writing to see if there is a better solution.
>
> I have a program that runs via crontab every five minutes. It polls a
> Box.com folder for files and, if any are found, it copies them locally and
> performs a computation on them that can exceed five minutes. It pushes the
> results back up to Box. (Box.com ensures that only complete files are
> visible when I poll.) Files are dropped into this Box.com folder rarely,
> but to ensure a good customer experience I do not want to set my crontab to
> run less frequently. My hardware cannot support multiple simultaneous
> computations.
>
> I have written a piece of code to detect if more than 1 instance of my
> program is running, and I put this code into a separate module (support.py)
> so that other programs can use it.

The way to do this simply on a unix system is to use a lock file and code like this:

lock_file = open(os.path.join(lock_dir, 'lockfile'), 'w')
try:
fcntl.flock(lock_file, fcntl.LOCK_EX|fcntl.LOCK_NB)
except IOError as e:
if e.errno == errno.EWOULDBLOCK:
log('CA base directory "%s" already locked, exiting', ca_base_dir)
sys.exit(0)
else:
log('Non-locking related IOError for file %s', lock_file)
raise

Only the first time the code runs will the lock be granted.
You can then do the possible long running task.

When a second copy of the program runs from cron it will get the
EWOULDBLOCK error and you can just exit.

Barry



>
> support.py contains:
> --------------------------------------------------------------------------------
> import sys
> def check(calling_program):
> import psutil
> # some logic here to count
> # count = N
> if count > 1:
> print(f"I was called by {calling_program}.")
> sys.exit()
> if __name__ == "__main__":
> check()
> --------------------------------------------------------------------------------
>
> actual-program.py contains:
> --------------------------------------------------------------------------------
> import support.py
> support.check(__file__)
> # Poll, and if files download, perform expensive computations, push results
> --------------------------------------------------------------------------------
>
> To me it would be more elegant to be able to do something like this:
>
> def check():
> # Something here that tells me the name of the calling program
> import psutil
> # ...
>
> And then the calling program just does:
> support.check()
> --
> https://mail.python.org/mailman/listinfo/python-list
>

--
https://mail.python.org/mailman/listinfo/python-list
Re: Determine what the calling program is [ In reply to ]
On 2021-04-19 08:04:10 +1200, dn via Python-list wrote:
> In a similar situation, one of my teams used an (OpSys) environment
> variable (available in both *nux and MS-Win).
> - when the application starts, it checks for the variable
> - if exists, stops running, else may proceed

That doesn't work on Unix-like OSs. An environment variable can only be
passwd to child processes, not to the parent or unrelated processes. So
it can't be used to lock out other processes - they wouldn't ever see
the variable.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Re: Determine what the calling program is [ In reply to ]
On 2021-04-19 08:54:06 +1000, Cameron Simpson wrote:
> My personal preference is lock directories. Shell version goes like
> this:
>
> if mkdir /my/lock/directory/name-of-task
> then
> .. do task ..
> rmdir /my/lock/directory/name-of-task
> else
> echo "lock /my/lock/directory/name-of-task already taken"
> fi
>
> Simple, reliable, even works over NFS if you care.

Reliable only if "fail locked" is acceptable. If that process dies for
some reason the lock directory will stay behind, blocking other
processes until somebody notices the problem and removes it.

The fcntl method suggested by several people has the advantage that the
lock vanished with the process which holds it.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Re: Determine what the calling program is [ In reply to ]
On 19Apr2021 23:13, Peter J. Holzer <hjp-python@hjp.at> wrote:
>On 2021-04-19 08:54:06 +1000, Cameron Simpson wrote:
>> My personal preference is lock directories. Shell version goes like
>> this:
>>
>> if mkdir /my/lock/directory/name-of-task
>> then
>> .. do task ..
>> rmdir /my/lock/directory/name-of-task
>> else
>> echo "lock /my/lock/directory/name-of-task already taken"
>> fi
>>
>> Simple, reliable, even works over NFS if you care.
>
>Reliable only if "fail locked" is acceptable. If that process dies for
>some reason the lock directory will stay behind, blocking other
>processes until somebody notices the problem and removes it.

A Python context manager narrows the range of circumstances for this
failure quite a lot. But yes.

>The fcntl method suggested by several people has the advantage that the
>lock vanished with the process which holds it.

This is very true. OTOH, mkdir's easy to debug if it hangs around.

Cheers,
Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Determine what the calling program is [ In reply to ]
> On 19 Apr 2021, at 22:49, Cameron Simpson <cs@cskk.id.au> wrote:
>
> ?On 19Apr2021 23:13, Peter J. Holzer <hjp-python@hjp.at> wrote:
>>> On 2021-04-19 08:54:06 +1000, Cameron Simpson wrote:
>>> My personal preference is lock directories. Shell version goes like
>>> this:
>>>
>>> if mkdir /my/lock/directory/name-of-task
>>> then
>>> .. do task ..
>>> rmdir /my/lock/directory/name-of-task
>>> else
>>> echo "lock /my/lock/directory/name-of-task already taken"
>>> fi
>>>
>>> Simple, reliable, even works over NFS if you care.
>>
>> Reliable only if "fail locked" is acceptable. If that process dies for
>> some reason the lock directory will stay behind, blocking other
>> processes until somebody notices the problem and removes it.
>
> A Python context manager narrows the range of circumstances for this
> failure quite a lot. But yes.
>
>> The fcntl method suggested by several people has the advantage that the
>> lock vanished with the process which holds it.
>
> This is very true. OTOH, mkdir's easy to debug if it hangs around.

Only the fcntl method is robust. Your suggestion with mkdir is not
reliable in practice. If you need a lock in the sh env then there are
standard patterns using the flock command. See the man page
for examples.

Barry



>
> Cheers,
> Cameron Simpson <cs@cskk.id.au>
> --
> https://mail.python.org/mailman/listinfo/python-list
>

--
https://mail.python.org/mailman/listinfo/python-list