Mailing List Archive: Have exceeded the maximum number of attempts (1000) to open temp file/dir

Have exceeded the maximum number of attempts (1000) to open temp file/dir

Oct 25, 2013, 6:51 AM

Post #1 of 5 (3596 views)

I have an API where requests can include JSON. HTTP::Body saves those off
to temp files.

Yesterday got a very large number of errors:

[ERROR] "Caught exception in engine "Error in tempfile() using
/tmp/XXXXXXXXXX: Have exceeded the maximum number of attempts (1000) to
open temp file/dir

The File::Temp docs say:

If you are forking many processes in parallel that are all creating
> temporary files, you may need to reset the random number seed using
> srand(EXPR) in each child else all the children will attempt to walk
> through the same set of random file names and may well cause
> themselves to give up if they exceed the number of retry attempts.

We are running under mod_perl. Could it be as simple as the procs all
were in sync? I'm just surprised this has not happened before. Is there
another explanation?

Where would you suggest to call srand()?

Another problem, and one I've
commented<https://rt.cpan.org/Public/Bug/Display.html?id=84004>on
before, is that HTTP::Body doesn't use File::Temp's unlink feature and
depends on Catalyst cleaning up. This results in orphaned files left on
temp disk.

--
Bill Moseley
moseley@hank.org

Re: Have exceeded the maximum number of attempts (1000) to open temp file/dir [ In reply to ]

moseley at hank

Oct 31, 2013, 4:58 PM

Post #2 of 5 (3530 views)

Permalink

On Thu, Oct 31, 2013 at 2:44 PM, John Napiorkowski <jjn1056@yahoo.com>wrote:

>
> am calling ->cleanup(1) when we create the HTTP::Body. is that not enough
> to cleanup tmp files ?
>

I haven't look at this in a while, but I think it's described here:

https://rt.cpan.org/Public/Bug/Display.html?id=84004

HTTP::Body assumes $self->{upload} exists before deleting, and that might
not be created yet.

I have my own version for handling 'multipart/form-data' that sets UNLINK

Now, the application/octet-stream handling is another issue. There
HTTP::Body uses the default File::Temp (e.g. UNLINK => 1), but I'm still
finding a large number of those files left around.

In my dev environment I have not been able to make it leave files on /tmp.
On production I can run watch 'ls /tmp | wc -l' and see the counts
increase and decrease so I know files are being deleted, but every once in
a while a file gets left behind. I don't see segfaults in the logs, and
I've tested with Apache's MaxRequestPerChild low (so recycling child
processes often) and not seeing that leave files behind.

I'm going to update our copy of HTTP::Body and put the process ID in the
temp file template to essentially namespace and use cron to keep /tmp
cleaner. But, I still have yet to figure out why those are left behind.
With UNLINK => 1 they should not be left there. File::Temp doesn't appear
to check the return value from unlink.

They come and go but some stick around:

$ for i in $(seq 10); do ls /tmp | wc -l; sleep 2; done
23861
23865
23863
23864
23862
23862
23865
23865
23864
23866

$ ls -lt /tmp | head -2
total 95492
-rw------- 1 tii-rest tii-rest 14 Oct 31 16:40 Nudjp9WDNy

$ ls -lt /tmp | tail -2
-rw------- 1 tii-rest tii-rest 16 Oct 28 13:36 NWwxOhwhRW
-rw------- 1 tii-rest tii-rest 16 Oct 28 13:35 Ll1Ze0TNPL

>
> regarding the tmp file thing, wow I have no idea, but I hope you find out
> and report it to us!
>
> Johnn
>
>
> On Friday, October 25, 2013 8:53 AM, Bill Moseley <moseley@hank.org>
> wrote:
> I have an API where requests can include JSON. HTTP::Body saves those
> off to temp files.
>
> Yesterday got a very large number of errors:
>
> [ERROR] "Caught exception in engine "Error in tempfile() using
> /tmp/XXXXXXXXXX: Have exceeded the maximum number of attempts (1000) to
> open temp file/dir
>
> The File::Temp docs say:
>
> If you are forking many processes in parallel that are all creating
> temporary files, you may need to reset the random number seed using
> srand(EXPR) in each child else all the children will attempt to walk
> through the same set of random file names and may well cause
> themselves to give up if they exceed the number of retry attempts.
>
>
> We are running under mod_perl. Could it be as simple as the procs all
> were in sync? I'm just surprised this has not happened before. Is there
> another explanation?
>
> Where would you suggest to call srand()?
>
>
> Another problem, and one I've commented<https://rt.cpan.org/Public/Bug/Display.html?id=84004>on before, is that HTTP::Body doesn't use File::Temp's unlink feature and
> depends on Catalyst cleaning up. This results in orphaned files left on
> temp disk.
>
>
>
>
>
> --
> Bill Moseley
> moseley@hank.org
>
> _______________________________________________
> List: Catalyst@lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive:
> http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>
>
>
> _______________________________________________
> List: Catalyst@lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive:
> http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>
>

--
Bill Moseley
moseley@hank.org

Re: Have exceeded the maximum number of attempts (1000) to open temp file/dir [ In reply to ]

moseley at hank

Nov 4, 2013, 6:24 AM

Post #3 of 5 (3508 views)

Permalink

On Fri, Oct 25, 2013 at 6:51 AM, Bill Moseley <moseley@hank.org> wrote:

>
> [ERROR] "Caught exception in engine "Error in tempfile() using
> /tmp/XXXXXXXXXX: Have exceeded the maximum number of attempts (1000) to
> open temp file/dir
>

I don't really see how this can be a Catalyst issue, but I can't reproduce
it outside of Catalyst -- and outside of our production environment.

Can anyone think of anything else that might be going on here?

The template has 10 "X" that are replaced by I think 63 random ascii
characters. 63^10 is a huge number of random strings. File::Temp only
loops when sysopen returns EEXISTS -- that is, when sysopen fails AND the
error is that the file already exists.

Sure, there's 50 web processes but the odds of them all being in exact
lock-step with calling rand() is unlikely. And even if they started out
that way if two processes opened the exact same name at the same time one
process would just try the next random name and be done.

I have something like 26K files in /tmp, so nothing compared to 63^10.
And each web server is only seeing about 10 request/sec.

It's just not making sense.

Again, I'm unable to replicate the problem with a simple test script that
is designed to clash.

I fork 50 (or more) child processes to replicate the web server processes
and then in each one I do this:

# Wait until top of the second so each child procss starts about
the same time.

my $t = time(); # Time::HiRes
sleep( int( $t ) + 1 - $t );

for ( 1 .. 500 ) {

my $fh = File::Temp->new(
TEMPLATE => 'bill_XXXXX',
DIR => '/tmp',
);

}

And never see any contention.

>
> The File::Temp docs say:
>
> If you are forking many processes in parallel that are all creating
>> temporary files, you may need to reset the random number seed using
>> srand(EXPR) in each child else all the children will attempt to walk
>> through the same set of random file names and may well cause
>> themselves to give up if they exceed the number of retry attempts.
>
>
> We are running under mod_perl. Could it be as simple as the procs all
> were in sync? I'm just surprised this has not happened before. Is there
> another explanation?
>
> Where would you suggest to call srand()?
>
>
> Another problem, and one I've commented<https://rt.cpan.org/Public/Bug/Display.html?id=84004>on before, is that HTTP::Body doesn't use File::Temp's unlink feature and
> depends on Catalyst cleaning up. This results in orphaned files left on
> temp disk.
>
>
>
>
>
> --
> Bill Moseley
> moseley@hank.org
>

--
Bill Moseley
moseley@hank.org

Re: Re: Have exceeded the maximum number of attempts (1000) to open temp file/dir [ In reply to ]

neil at mylunn

Nov 4, 2013, 4:20 PM

Post #4 of 5 (3506 views)

Permalink

On 5/11/2013 1:24 AM, Bill Moseley wrote:
>
> On Fri, Oct 25, 2013 at 6:51 AM, Bill Moseley <moseley@hank.org
> <mailto:moseley@hank.org>> wrote:
>
>
> [ERROR] "Caught exception in engine "Error in tempfile() using
> /tmp/XXXXXXXXXX: Have exceeded the maximum number of attempts
> (1000) to open temp file/dir
>
>
>
> I don't really see how this can be a Catalyst issue, but I can't
> reproduce it outside of Catalyst -- and outside of our production
> environment.
>
> Can anyone think of anything else that might be going on here?

I'd be thinking along the lines of mod_perl is evil. From a quick google
of "mod_perl srand" there seem to be some similar cases. And a where to
call srand in this post:
http://blogs.perl.org/users/brian_phillips/2010/06/when-rand-isnt-random.html

Give it a try.

>
>
> The template has 10 "X" that are replaced by I think 63 random ascii
> characters. 63^10 is a huge number of random strings. File::Temp
> only loops when sysopen returns EEXISTS -- that is, when sysopen fails
> AND the error is that the file already exists.
>
> Sure, there's 50 web processes but the odds of them all being in exact
> lock-step with calling rand() is unlikely. And even if they started
> out that way if two processes opened the exact same name at the same
> time one process would just try the next random name and be done.
>
> I have something like 26K files in /tmp, so nothing compared to 63^10.
> And each web server is only seeing about 10 request/sec.
>
> It's just not making sense.
>
>
> Again, I'm unable to replicate the problem with a simple test script
> that is designed to clash.
>
> I fork 50 (or more) child processes to replicate the web server
> processes and then in each one I do this:
>
>
> # Wait until top of the second so each child procss starts
> about the same time.
>
> my $t = time(); # Time::HiRes
> sleep( int( $t ) + 1 - $t );
>
>
> for ( 1 .. 500 ) {
>
> my $fh = File::Temp->new(
> TEMPLATE => 'bill_XXXXX',
> DIR => '/tmp',
> );
>
> }
>
>
> And never see any contention.
>
>
> The File::Temp docs say:
>
> If you are forking many processes in parallel that are all
> creating
> temporary files, you may need to reset the random number seed
> using
> srand(EXPR) in each child else all the children will attempt
> to walk
> through the same set of random file names and may well cause
> themselves to give up if they exceed the number of retry attempts.
>
>
> We are running under mod_perl. Could it be as simple as the
> procs all were in sync? I'm just surprised this has not happened
> before. Is there another explanation?
>
> Where would you suggest to call srand()?
>
>
> Another problem, and one I've commented
> <https://rt.cpan.org/Public/Bug/Display.html?id=84004> on before,
> is that HTTP::Body doesn't use File::Temp's unlink feature and
> depends on Catalyst cleaning up. This results in orphaned files
> left on temp disk.
>
>
>
>
>
> --
> Bill Moseley
> moseley@hank.org <mailto:moseley@hank.org>
>
>
>
>
> --
> Bill Moseley
> moseley@hank.org <mailto:moseley@hank.org>
>
>
> _______________________________________________
> List: Catalyst@lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

Re: Re: Have exceeded the maximum number of attempts (1000) to open temp file/dir [ In reply to ]

moseley at hank

Nov 6, 2013, 11:19 AM

Post #5 of 5 (3506 views)

Permalink

On Mon, Nov 4, 2013 at 4:20 PM, neil.lunn <neil@mylunn.id.au> wrote:

>
> I'd be thinking along the lines of mod_perl is evil. From a quick google
> of "mod_perl srand" there seem to be some similar cases. And a where to
> call srand in this post:
> http://blogs.perl.org/users/brian_phillips/2010/06/when-rand-isnt-random.html
>
> Give it a try.
>

Thanks. That indeed was part of the problem. Other part was a leak.

First, our code to manage database replication was causing a leak of $c.
That's why I could not replicate in dev where I don't normally run with a
replicated database.

The result was that a temp file would be created but not destroyed at the
end of the request. It would be destroyed at the start of the *next*
request.

So, the temp files would be deleted, but just not at the right time. The
one exception was the last request -- when Apache killed off a child
process then that final temp file would be left behind.

Here's where mod_perl comes in.

If I understand Perl correctly, the first time rand() is called a new seed
is generated. So, if you fork a bunch of children and each calls rand()
each will get a new seed. But, if you explicitly call srand() once in
the parent before forking then all the children get the same seed, and all
the children will generate the same random sequence.

When using Starman or even the forking test server this is not a problem as
I believe they both call srand() for each child. With Apache the children
end up with the same seed which likely means the parent process called
rand() or srand().

Over time, /tmp would fill, and fill with a pre-defined set of filenames
due to the common seed eventually the first 1000 names would get used up.

My fix is simply this in the app base class:

my $srand;
before handle_request => sub { srand() unless $srand++ };

The question if this is something Catalyst should handle.

As for the leak, we use the same replication code in different apps -- so
need a way for the specific app to work with this.

The broken code was in an ACCEPT_CONTEXT that included:

$replication_object->callback( sub { $self->callback_method( $c ) } );

I'm not thrilled by $c getting passed here, but the callback need quite a
bit from $c.

The "fix" that seems to work is simply this:

weaken( $c );
$replication_object->callback( sub { $self->callback_method( $c ) } );

--
Bill Moseley
moseley@hank.org