Mailing List Archive

FileNotFoundException: Too many open files
Hello,

I'm running into this exception quiet often while using Lucene (the
situation is so bad with the latest rc, that I had to revert to the last
com.lucene package). I'm sure I have my fair share of bugs in my app,
but nonetheless, how can I "control" Lucene usage of RandomAccessFile?
The indexes are optimized and I try to keep a close look at how many
IndexWriter/Reader exists at any point in time... Nevertheless, I run
into that exception much too often :-( Any help appreciated!

"04/26 00:07:11 (Warning) Finder.findObjectsWithSpecificationInStore:
java.io.FileNotFoundException: _la.f9 (Too many open files)"

Also, on a somewhat related note, how do I "shut down" Lucene properly.
Eg, do I need to do anything with the IndexWriter and so on?

Last, but not least, is there a way to turn of the file locking in the
latest rc as it's really getting in the way :-(

Finally, I just wanted to make sure: Lucene is fully multi-threaded
right? I can do search *and* write concurrently in different threads at
the same time on the same index?

Any insight much appreciated!

Thanks.

PA.

BTW, should I post this kind of question to user or dev?


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: FileNotFoundException: Too many open files [ In reply to ]
Hello,

> I'm running into this exception quiet often while using Lucene (the
> situation is so bad with the latest rc, that I had to revert to the
> last
> com.lucene package). I'm sure I have my fair share of bugs in my app,
>
> but nonetheless, how can I "control" Lucene usage of
> RandomAccessFile?
> The indexes are optimized and I try to keep a close look at how many
> IndexWriter/Reader exists at any point in time... Nevertheless, I run
>
> into that exception much too often :-( Any help appreciated!
>
> "04/26 00:07:11 (Warning) Finder.findObjectsWithSpecificationInStore:
>
> java.io.FileNotFoundException: _la.f9 (Too many open files)"

I looked only at your application's screenshots and based on that my
guess is that you have a fairly high number of index fields, and if I
recall correctly that can cause the above error.
This was mentioned on one of the lists fairly recently, I believe.

> Also, on a somewhat related note, how do I "shut down" Lucene
> properly.
> Eg, do I need to do anything with the IndexWriter and so on?

This was mentioned on the list once, too.
I suggested using a shutdown hook in Runtime package, but then somebody
responded with a drawback of that approach.

> Last, but not least, is there a way to turn of the file locking in
> the latest rc as it's really getting in the way :-(

Not that I know. If locking is getting in the way maybe you are not
using Lucene properly. I haven't downloaded your application yet, so I
haven't had the chance to peek at the source.

> Finally, I just wanted to make sure: Lucene is fully multi-threaded
> right? I can do search *and* write concurrently in different threads
> at the same time on the same index?

Yes, I believe so - I never encountered any problems with that.

> BTW, should I post this kind of question to user or dev?

I suggest -user until/unless we determine that there is something in
Lucene that we can fix or improve.

Otis


__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: FileNotFoundException: Too many open files [ In reply to ]
Hi Otis,

> I looked only at your application's screenshots and based on that my
> guess is that you have a fairly high number of index fields, and if I
> recall correctly that can cause the above error.

Well, I used to have an index per class. And I have around a dozen
classes that get indexed. When trying to switch to the latest rc (with
the exact same code base), I ran into so many problems with the now
infamous "FileNotFoundException" that I consolidated everything in one
index per object store. And switched back to the com.lucene package that
-as far as I can personally tell- is *much* more stable. I do not store
the content of the objects in the index, just some uuid as Field.Keyword
and other attributes as Field.UnStored. On average, there seem to be
less than one hundred Lucene files per index.

> This was mentioned on the list once, too.
> I suggested using a shutdown hook in Runtime package, but then somebody
> responded with a drawback of that approach.

I have this one under control... Thanks.

> Not that I know. If locking is getting in the way maybe you are not
> using Lucene properly. I haven't downloaded your application yet, so I
> haven't had the chance to peek at the source.

Please feel free to do so... ;-)

> Yes, I believe so - I never encountered any problems with that.

Great. That was my assumption all along...

R.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: FileNotFoundException: Too many open files [ In reply to ]
PA,

> On average, there seem to be less than one hundred Lucene files per
index.

You are probably past this point by now, but since I didn't see anyone
pick up on this, I wanted to respond.
"Less then a hundred" is definetely too many files for a Lucene index,
unless you have a very large number of stored fields!

An optimized index should have about a dozen. So this either means that
you have many stored fields, or you are not calling optimize, or that,
if you are, there are unclosed IndexReader instances floating around
that are still using segments that existed before the optimization
(which replaces all segments with one new one).

About file names:
Here's the naming convention of the files in the index. This might help
you understand which kind of a situation you are facing:
The index directory has the following files:
deletable - one, lists segment ids that can be deleted when no
longer locked by the filesystem because they are open
segments - one, lists segment ids of the current set of segments
_<n>.tii - one per segment, "term index" file
_<n>.tis - one per segment, "term infos" file
_<n>.frq - one per segment, "term frequency" file
_<n>.prx - one per segment, "term positions" file
_<n>.fdx - one per segment, "field index" file
_<n>.fdt - one per segment, "field infos" file
_<n>.fnm - one per segment, "field infos" file
_<n>.f<m> - one per segment per stored field, "field data" file

<n> - is the segment number, encoded using numbers and letters
<m> - is the field number, which is a unique field id in that segment.
(I realize that this is still too vague, but I had not looked through
that code in a while, so I can't do better than "term infos" and "field
infos" right now. However, this should give you an idea of what to
expect I think).
An index should have 2 + n * (7 + m) files, where n is the number of
segments and m is the number of stored fields. For an optimized index
with one stored field this gives 10 files (not a 100!).

About garbage collection:
I believe that the IndexReader instances will attempt to close
themselves upon finalization, however that may occur very differently
between different VMs and OSs. So, unless IndexReaders are closed
explicitly, this might explain why an application runs fine under
Windows, but has problems under OSX, or whatever.

About the file handles:
I'm not familiar with BSD (which is the basis for OSX on which you are
having these problems, right?), so I don't know how the number of open
files is managed there. I know that on Solaris it is a per-process
setting with a "soft" limit, "hard" limit, both controlled by each user,
and a system-wide max to the "hard" limit which only a root can change.
I agree that a desktop application should not require changes to system
configuration, but it might resonably expect a default value to be
present and it might change the soft limit (which is usually set very
low) in the startup script.

On NT, so far as I know, there is no explicit setting for the number of
open files. Rather, it is limited by the amount of available memory in a
particular NT kernel memory pool (not just the free memory on the
system). The pool size can be controlled probably, but I've found that
it is usually generous enough - more so than the Solaris settings.

If BSD is like NT in this regard (at least to some degree), the number
of open files will be determined for the entire system, so depending on
what other applications are running, your tests may produce a different
results.


Good luck.
Dmitry.



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: FileNotFoundException: Too many open files [ In reply to ]
On Wednesday, May 1, 2002, at 10:16 PM, Dmitry Serebrennikov wrote:

> "Less then a hundred" is definetely too many files for a Lucene index,
> unless you have a very large number of stored fields!

Since changing indexing strategy, I have between 12 to 20 files per
index (including deletable and segments).

> An optimized index should have about a dozen.

That what I see for small objects (eg with few fields).

> So this either means that you have many stored fields

My "richest" object has around a dozen fields.

> or you are not calling optimize, or that, if you are, there are
> unclosed IndexReader instances floating around that are still using
> segments that existed before the optimization (which replaces all
> segments with one new one).

I guess I have this part under control now.

> About file names:

Thanks for the explanation :-) I mostly have <n>.f<m> type files as one
may expect.

> An index should have 2 + n * (7 + m) files, where n is the number of
> segments and m is the number of stored fields. For an optimized index
> with one stored field this gives 10 files (not a 100!).

It seems that I'm getting there... ;-)

> So, unless IndexReaders are closed explicitly, this might explain why
> an application runs fine under Windows, but has problems under OSX, or
> whatever.

I decided to be much more "aggressive" with all the file handles... But
I still rely heavely on the garbage collector as I'm using the reference
api extensively... Seems to work fine so far...

> I agree that a desktop application should not require changes to system
> configuration, but it might resonably expect a default value to be
> present and it might change the soft limit (which is usually set very
> low) in the startup script.

So far, my app seems to be doing fine without having to mess around with
any system parameters... Also, it seems to be more responsive since I
have more indexes... Go figure ;-)

> Good luck.

Thanks.

PA.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: FileNotFoundException: Too many open files [ In reply to ]
Thanks, Dmitry. Here's a little more detail:

> From: Dmitry Serebrennikov
>
> The index directory has the following files:
> deletable - one, lists segment ids that can be deleted when no
> longer locked by the filesystem because they are open
> segments - one, lists segment ids of the current set
> of segments
> _<n>.tii - one per segment, "term index" file

This is the term infos index file. It contains every 128th entry from the
"tis" file, along with its location in the "tis" file. This is read
entirely into memory and is used to provide random access to the "tis" file.

> _<n>.tis - one per segment, "term infos" file

This is the term infos file. Its logical format is <t,df,freqLoc,proxLoc>*,
where t is the term, df is the "document frequency", or count of documents
containing t, freqLoc is the location of t's data in the "frq" file, and
proxLoc is the location of t's data in the "prx" file.

> _<n>.frq - one per segment, "term frequency" file

This is the frequency file. It contains the frequency of each term in each
document. Its logical format is <<d,f>*>*, where d is a document number,
and f is the number of times the term ocurred in that document. The
TermDocs interface is used to access this data.

> _<n>.prx - one per segment, "term positions" file

This is the proximity file. It contains the positions of each term in each
document. Its logical format is <<p>*>*, where p is an ordinal position of
a term. The TermPositions interface is used to access this data.

> _<n>.fdx - one per segment, "field index" file

This is the field index file. It contains the location of each document's
stored fields in the "fdt" file. Its logical format is <docLoc>*, where
docLoc_i is the location in the "fdt" of document i. This is read entirely
into memory and is used to provide random access to a document's stored
fields.

> _<n>.fdt - one per segment, "field infos" file

This is the field data file. It contains each document's stored fields.
Its logical format is <<field,value>*>*.

> _<n>.fnm - one per segment, "field infos" file

This is the field info file. It contains the names of the fields.

> _<n>.f<m> - one per segment per stored field, "field data" file

These are the normalization files. They contain one byte for each field in
each document that is multiplied into the score of hits on that field of
that document.

> <n> - is the segment number, encoded using numbers and letters
> <m> - is the field number, which is a unique field id in that segment.

> An index should have 2 + n * (7 + m) files, where n is the number of
> segments and m is the number of stored fields. For an optimized index
> with one stored field this gives 10 files (not a 100!).

The maximum number of segments an unoptimized index can have is:
(m-1) * (log_m(n)-1)
Where m is the mergeFactor, 10 by default and n is the number of documents
added since the index was last optimized. The average number of segments is
about half that. So a ~1M document index that is never optimized can have,
at most, 45 segments. If you optimize every 10k documents, then you can
limit things to 27 segments. Or you can manage things more explicitly with
tools like RAMDirectory and IndexWriter.addIndexes().

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>