Mailing List Archive

Stress Testing Lucene
Hey people,

I'm running a Lucene (v1.2) servlet on resin and I must say compared to
Oracle Intermedia
it's working beautifully. BUT today, I started stress testing and I
downloaded a program called
Web Roller, witch simulates clients, requests , multi-threading .. the works
and I was testing
I was doing something like 50 simultaneous requests and I was repeating that
10 times in a row.

but then something happened and the index got corrupted, every time I try
opening the index
with the reader to search or open with the writer to optimize I get that
damned too-many files
open error. I can imagine that every application on the market has a
breaking point and these breaking
points have side effects, so is the corruption of the index a side effect
and if so is there a way that
I configure my web server to crash before the corruption occurs, I'd rather
re-start the web server and
throw some people off wack rather that have to re-build the index or revert
to an older version.

Do you know of any way to safeguard against this ?

General Info:
The index is about 45 MB with 60 000 XML files each containing 18-25 fields.


Nader S. Henein
Bayt.com , Dubai Internet City
Tel. +9714 3911900
Fax. +9714 3911915
GSM. +9715 05659557
www.bayt.com


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Stress Testing Lucene [ In reply to ]
1) Are you sure that the index is corrupted? Maybe the file handles just
haven't been released yet. Did you try to reboot and try again?

2) To avoid the too-many files problem: a) increase the system file handle
limits, b) make sure that you reuse IndexReaders as much as you can rather
across requests and client rather than opening and closing them.

> -----Original Message-----
> From: Nader S. Henein [mailto:nsh@bayt.net]
> Sent: Wednesday, June 26, 2002 10:11 AM
> To: lucene-user@jakarta.apache.org
> Subject: Stress Testing Lucene
> Importance: High
>
>
>
> Hey people,
>
> I'm running a Lucene (v1.2) servlet on resin and I must say
> compared to
> Oracle Intermedia
> it's working beautifully. BUT today, I started stress testing and I
> downloaded a program called
> Web Roller, witch simulates clients, requests ,
> multi-threading .. the works
> and I was testing
> I was doing something like 50 simultaneous requests and I was
> repeating that
> 10 times in a row.
>
> but then something happened and the index got corrupted,
> every time I try
> opening the index
> with the reader to search or open with the writer to optimize
> I get that
> damned too-many files
> open error. I can imagine that every application on the market has a
> breaking point and these breaking
> points have side effects, so is the corruption of the index a
> side effect
> and if so is there a way that
> I configure my web server to crash before the corruption
> occurs, I'd rather
> re-start the web server and
> throw some people off wack rather that have to re-build the
> index or revert
> to an older version.
>
> Do you know of any way to safeguard against this ?
>
> General Info:
> The index is about 45 MB with 60 000 XML files each
> containing 18-25 fields.
>
>
> Nader S. Henein
> Bayt.com , Dubai Internet City
> Tel. +9714 3911900
> Fax. +9714 3911915
> GSM. +9715 05659557
> www.bayt.com
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
RE: Stress Testing Lucene [ In reply to ]
--- Scott Ganyo <scott.ganyo@eTapestry.com> wrote:
> 1) Are you sure that the index is corrupted? Maybe the file handles
> just
> haven't been released yet. Did you try to reboot and try again?

You can also do something like this:

# lsof | wc -l
8727

# lsof | grep -c java
5382

# lsof | grep java | head
mozilla-b 8428 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8453 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8454 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8455 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8457 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8471 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so

> 2) To avoid the too-many files problem: a) increase the system file
> handle
> limits, b) make sure that you reuse IndexReaders as much as you can
> rather
> across requests and client rather than opening and closing them.
>
> > -----Original Message-----
> > From: Nader S. Henein [mailto:nsh@bayt.net]
> > Sent: Wednesday, June 26, 2002 10:11 AM
> > To: lucene-user@jakarta.apache.org
> > Subject: Stress Testing Lucene
> > Importance: High
> >
> >
> >
> > Hey people,
> >
> > I'm running a Lucene (v1.2) servlet on resin and I must say
> > compared to
> > Oracle Intermedia
> > it's working beautifully. BUT today, I started stress testing and I
> > downloaded a program called
> > Web Roller, witch simulates clients, requests ,
> > multi-threading .. the works
> > and I was testing
> > I was doing something like 50 simultaneous requests and I was
> > repeating that
> > 10 times in a row.
> >
> > but then something happened and the index got corrupted,
> > every time I try
> > opening the index
> > with the reader to search or open with the writer to optimize
> > I get that
> > damned too-many files
> > open error. I can imagine that every application on the market has
> a
> > breaking point and these breaking
> > points have side effects, so is the corruption of the index a
> > side effect
> > and if so is there a way that
> > I configure my web server to crash before the corruption
> > occurs, I'd rather
> > re-start the web server and
> > throw some people off wack rather that have to re-build the
> > index or revert
> > to an older version.
> >
> > Do you know of any way to safeguard against this ?
> >
> > General Info:
> > The index is about 45 MB with 60 000 XML files each
> > containing 18-25 fields.
> >
> >
> > Nader S. Henein
> > Bayt.com , Dubai Internet City
> > Tel. +9714 3911900
> > Fax. +9714 3911915
> > GSM. +9715 05659557
> > www.bayt.com
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Stress Testing Lucene [ In reply to ]
I rebooted my machine and still the same issue .. if I know
what caused that to happen, I would be able to solve it with
some source tweaking, and it's not the files handles on the machine I
got over that problem months ago. Let's consider worst case scenario and
that
corruption did occur what could be the reasons, I'm goig to need some
insider
help to get through this one.

N.

-----Original Message-----
From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
Sent: Wednesday, June 26, 2002 7:15 PM
To: 'Lucene Users List'
Subject: RE: Stress Testing Lucene


1) Are you sure that the index is corrupted? Maybe the file handles just
haven't been released yet. Did you try to reboot and try again?

2) To avoid the too-many files problem: a) increase the system file handle
limits, b) make sure that you reuse IndexReaders as much as you can rather
across requests and client rather than opening and closing them.

> -----Original Message-----
> From: Nader S. Henein [mailto:nsh@bayt.net]
> Sent: Wednesday, June 26, 2002 10:11 AM
> To: lucene-user@jakarta.apache.org
> Subject: Stress Testing Lucene
> Importance: High
>
>
>
> Hey people,
>
> I'm running a Lucene (v1.2) servlet on resin and I must say
> compared to
> Oracle Intermedia
> it's working beautifully. BUT today, I started stress testing and I
> downloaded a program called
> Web Roller, witch simulates clients, requests ,
> multi-threading .. the works
> and I was testing
> I was doing something like 50 simultaneous requests and I was
> repeating that
> 10 times in a row.
>
> but then something happened and the index got corrupted,
> every time I try
> opening the index
> with the reader to search or open with the writer to optimize
> I get that
> damned too-many files
> open error. I can imagine that every application on the market has a
> breaking point and these breaking
> points have side effects, so is the corruption of the index a
> side effect
> and if so is there a way that
> I configure my web server to crash before the corruption
> occurs, I'd rather
> re-start the web server and
> throw some people off wack rather that have to re-build the
> index or revert
> to an older version.
>
> Do you know of any way to safeguard against this ?
>
> General Info:
> The index is about 45 MB with 60 000 XML files each
> containing 18-25 fields.
>
>
> Nader S. Henein
> Bayt.com , Dubai Internet City
> Tel. +9714 3911900
> Fax. +9714 3911915
> GSM. +9715 05659557
> www.bayt.com
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Stress Testing Lucene [ In reply to ]
sorry .. but still the same problem, I've saved the index in a seperate
direcory
and I've re-indexed overnight so, testing (witch is currently underway) on
the
system can resume. Like I said in my previous email , worst case scenarios
and the index
is corrupted any ideas as to why, I'll gladly go into the source but some
guidance as
to a starting point would be nice


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Wednesday, June 26, 2002 7:33 PM
To: Lucene Users List
Subject: RE: Stress Testing Lucene



--- Scott Ganyo <scott.ganyo@eTapestry.com> wrote:
> 1) Are you sure that the index is corrupted? Maybe the file handles
> just
> haven't been released yet. Did you try to reboot and try again?

You can also do something like this:

# lsof | wc -l
8727

# lsof | grep -c java
5382

# lsof | grep java | head
mozilla-b 8428 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8453 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8454 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8455 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8457 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so
mozilla-b 8471 otis mem REG 3,5 1242726 1287892
/usr/local/.version/IBMJava2-13/jre/bin/libjavaplugin_oji.so

> 2) To avoid the too-many files problem: a) increase the system file
> handle
> limits, b) make sure that you reuse IndexReaders as much as you can
> rather
> across requests and client rather than opening and closing them.
>
> > -----Original Message-----
> > From: Nader S. Henein [mailto:nsh@bayt.net]
> > Sent: Wednesday, June 26, 2002 10:11 AM
> > To: lucene-user@jakarta.apache.org
> > Subject: Stress Testing Lucene
> > Importance: High
> >
> >
> >
> > Hey people,
> >
> > I'm running a Lucene (v1.2) servlet on resin and I must say
> > compared to
> > Oracle Intermedia
> > it's working beautifully. BUT today, I started stress testing and I
> > downloaded a program called
> > Web Roller, witch simulates clients, requests ,
> > multi-threading .. the works
> > and I was testing
> > I was doing something like 50 simultaneous requests and I was
> > repeating that
> > 10 times in a row.
> >
> > but then something happened and the index got corrupted,
> > every time I try
> > opening the index
> > with the reader to search or open with the writer to optimize
> > I get that
> > damned too-many files
> > open error. I can imagine that every application on the market has
> a
> > breaking point and these breaking
> > points have side effects, so is the corruption of the index a
> > side effect
> > and if so is there a way that
> > I configure my web server to crash before the corruption
> > occurs, I'd rather
> > re-start the web server and
> > throw some people off wack rather that have to re-build the
> > index or revert
> > to an older version.
> >
> > Do you know of any way to safeguard against this ?
> >
> > General Info:
> > The index is about 45 MB with 60 000 XML files each
> > containing 18-25 fields.
> >
> >
> > Nader S. Henein
> > Bayt.com , Dubai Internet City
> > Tel. +9714 3911900
> > Fax. +9714 3911915
> > GSM. +9715 05659557
> > www.bayt.com
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>




--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Stress Testing Lucene [ In reply to ]
Which came first--the out of file handles error or the corruption? I
haven't looked, but I would guess that if you ran into the file handles
exception while writing, that might leave Lucene in a bad state. Lucene
isn't transactional and doesn't really have the ACID properties of a
database...

> -----Original Message-----
> From: Nader S. Henein [mailto:nsh@bayt.net]
> Sent: Wednesday, June 26, 2002 11:45 PM
> To: Lucene Users List
> Subject: RE: Stress Testing Lucene
>
>
> I rebooted my machine and still the same issue .. if I know
> what caused that to happen, I would be able to solve it with
> some source tweaking, and it's not the files handles on the machine I
> got over that problem months ago. Let's consider worst case
> scenario and
> that
> corruption did occur what could be the reasons, I'm goig to need some
> insider
> help to get through this one.
>
> N.
>
> -----Original Message-----
> From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> Sent: Wednesday, June 26, 2002 7:15 PM
> To: 'Lucene Users List'
> Subject: RE: Stress Testing Lucene
>
>
> 1) Are you sure that the index is corrupted? Maybe the file
> handles just
> haven't been released yet. Did you try to reboot and try again?
>
> 2) To avoid the too-many files problem: a) increase the
> system file handle
> limits, b) make sure that you reuse IndexReaders as much as
> you can rather
> across requests and client rather than opening and closing them.
>
> > -----Original Message-----
> > From: Nader S. Henein [mailto:nsh@bayt.net]
> > Sent: Wednesday, June 26, 2002 10:11 AM
> > To: lucene-user@jakarta.apache.org
> > Subject: Stress Testing Lucene
> > Importance: High
> >
> >
> >
> > Hey people,
> >
> > I'm running a Lucene (v1.2) servlet on resin and I must say
> > compared to
> > Oracle Intermedia
> > it's working beautifully. BUT today, I started stress testing and I
> > downloaded a program called
> > Web Roller, witch simulates clients, requests ,
> > multi-threading .. the works
> > and I was testing
> > I was doing something like 50 simultaneous requests and I was
> > repeating that
> > 10 times in a row.
> >
> > but then something happened and the index got corrupted,
> > every time I try
> > opening the index
> > with the reader to search or open with the writer to optimize
> > I get that
> > damned too-many files
> > open error. I can imagine that every application on the market has a
> > breaking point and these breaking
> > points have side effects, so is the corruption of the index a
> > side effect
> > and if so is there a way that
> > I configure my web server to crash before the corruption
> > occurs, I'd rather
> > re-start the web server and
> > throw some people off wack rather that have to re-build the
> > index or revert
> > to an older version.
> >
> > Do you know of any way to safeguard against this ?
> >
> > General Info:
> > The index is about 45 MB with 60 000 XML files each
> > containing 18-25 fields.
> >
> >
> > Nader S. Henein
> > Bayt.com , Dubai Internet City
> > Tel. +9714 3911900
> > Fax. +9714 3911915
> > GSM. +9715 05659557
> > www.bayt.com
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
RE: Stress Testing Lucene [ In reply to ]
> Which came first--the out of file handles error or the corruption? I
> haven't looked, but I would guess that if you ran into the file handles
> exception while writing, that might leave Lucene in a bad state. Lucene
> isn't transactional and doesn't really have the ACID properties of a
> database...

A little bit offtopic...

I have implemented JDSDirectory that stores Lucene indices in JDataStore
database from Borland. JDataStore implements FS-like storage, where SQL
tables are just "files" of type "table". They do have transaction
management, but I do not know if it covers any file type or only tables, I
haven't make any testing yet.

If you want, I can send this implementation to you. This might solve your
problem, because there's no file handles involved, if TxManager spans the
whole database, you get ACID properties there. The drawbacks are that
JDataStore is not for free (there's very funny licensing involved, sometimes
$100 USD, sometimes $1,000 or more) and I haven't tested it with big
databases (but Borland claims to support couple of terabytes :)). Speed was
similar to FSDirectory in my tests (difference of ~5-10% max.).

I was thinking of posting it to Lucene contributions, but recently got
completely different task and have no time to check if the code is clean
enough to make it public.

Best regards,
Roman Rokytskyy


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Stress Testing Lucene [ In reply to ]
It's very hard to leave an index in a bad state. Updating the
"segments" file atomically updates the index. So the only way to
corrupt things is to only partly update the segments file. But that too
is hard, since it's first written to a temporary file, which is then
renamed "segments". The only vulnerability I know if is that in Java on
Win32 you can't atomically rename a file to something that already
exists, so Lucene has to first remove the old version. So if you were
to crash between the time that the old version of "segments" is removed
and the new version is moved into place, then the index would be
corrupt, because it would have no "segments" file.

Doug

Scott Ganyo wrote:
> Which came first--the out of file handles error or the corruption? I
> haven't looked, but I would guess that if you ran into the file handles
> exception while writing, that might leave Lucene in a bad state. Lucene
> isn't transactional and doesn't really have the ACID properties of a
> database...
>
>
>>-----Original Message-----
>>From: Nader S. Henein [mailto:nsh@bayt.net]
>>Sent: Wednesday, June 26, 2002 11:45 PM
>>To: Lucene Users List
>>Subject: RE: Stress Testing Lucene
>>
>>
>>I rebooted my machine and still the same issue .. if I know
>>what caused that to happen, I would be able to solve it with
>>some source tweaking, and it's not the files handles on the machine I
>>got over that problem months ago. Let's consider worst case
>>scenario and
>>that
>>corruption did occur what could be the reasons, I'm goig to need some
>>insider
>>help to get through this one.
>>
>>N.
>>
>>-----Original Message-----
>>From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
>>Sent: Wednesday, June 26, 2002 7:15 PM
>>To: 'Lucene Users List'
>>Subject: RE: Stress Testing Lucene
>>
>>
>>1) Are you sure that the index is corrupted? Maybe the file
>>handles just
>>haven't been released yet. Did you try to reboot and try again?
>>
>>2) To avoid the too-many files problem: a) increase the
>>system file handle
>>limits, b) make sure that you reuse IndexReaders as much as
>>you can rather
>>across requests and client rather than opening and closing them.
>>
>>
>>>-----Original Message-----
>>>From: Nader S. Henein [mailto:nsh@bayt.net]
>>>Sent: Wednesday, June 26, 2002 10:11 AM
>>>To: lucene-user@jakarta.apache.org
>>>Subject: Stress Testing Lucene
>>>Importance: High
>>>
>>>
>>>
>>>Hey people,
>>>
>>>I'm running a Lucene (v1.2) servlet on resin and I must say
>>>compared to
>>>Oracle Intermedia
>>>it's working beautifully. BUT today, I started stress testing and I
>>>downloaded a program called
>>>Web Roller, witch simulates clients, requests ,
>>>multi-threading .. the works
>>>and I was testing
>>>I was doing something like 50 simultaneous requests and I was
>>>repeating that
>>>10 times in a row.
>>>
>>>but then something happened and the index got corrupted,
>>>every time I try
>>>opening the index
>>>with the reader to search or open with the writer to optimize
>>>I get that
>>>damned too-many files
>>>open error. I can imagine that every application on the market has a
>>>breaking point and these breaking
>>>points have side effects, so is the corruption of the index a
>>>side effect
>>>and if so is there a way that
>>>I configure my web server to crash before the corruption
>>>occurs, I'd rather
>>>re-start the web server and
>>>throw some people off wack rather that have to re-build the
>>>index or revert
>>>to an older version.
>>>
>>>Do you know of any way to safeguard against this ?
>>>
>>>General Info:
>>>The index is about 45 MB with 60 000 XML files each
>>>containing 18-25 fields.
>>>
>>>
>>>Nader S. Henein
>>>Bayt.com , Dubai Internet City
>>>Tel. +9714 3911900
>>>Fax. +9714 3911915
>>>GSM. +9715 05659557
>>>www.bayt.com
>>>
>>>
>>>--
>>>To unsubscribe, e-mail:
>>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>>For additional commands, e-mail:
>>><mailto:lucene-user-help@jakarta.apache.org>
>>>
>>
>>--
>>To unsubscribe, e-mail:
>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>For additional commands, e-mail:
>><mailto:lucene-user-help@jakarta.apache.org>
>>
>



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Stress Testing Lucene [ In reply to ]
I'm not sure witch one hit first, I kinda ran into both at the same time,
what I'm thinking of now is taking Otis's advice and going for an
IndexReader
pool using the POOL package in the Jakarta Commons, and that way I can limit
the number of concurrent searches and avoid that stress testing corruption
issue
until I figure it out. But coming back to your question, I wasn't writing at
the
time that's the strange thing I was strictly searching, I even has all my
schedulers
off to avoid updates or deletes during the stress testing (one step at time
approach)

Nader


-----Original Message-----
From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
Sent: Thursday, June 27, 2002 7:06 PM
To: 'Lucene Users List'
Subject: RE: Stress Testing Lucene


Which came first--the out of file handles error or the corruption? I
haven't looked, but I would guess that if you ran into the file handles
exception while writing, that might leave Lucene in a bad state. Lucene
isn't transactional and doesn't really have the ACID properties of a
database...

> -----Original Message-----
> From: Nader S. Henein [mailto:nsh@bayt.net]
> Sent: Wednesday, June 26, 2002 11:45 PM
> To: Lucene Users List
> Subject: RE: Stress Testing Lucene
>
>
> I rebooted my machine and still the same issue .. if I know
> what caused that to happen, I would be able to solve it with
> some source tweaking, and it's not the files handles on the machine I
> got over that problem months ago. Let's consider worst case
> scenario and
> that
> corruption did occur what could be the reasons, I'm goig to need some
> insider
> help to get through this one.
>
> N.
>
> -----Original Message-----
> From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> Sent: Wednesday, June 26, 2002 7:15 PM
> To: 'Lucene Users List'
> Subject: RE: Stress Testing Lucene
>
>
> 1) Are you sure that the index is corrupted? Maybe the file
> handles just
> haven't been released yet. Did you try to reboot and try again?
>
> 2) To avoid the too-many files problem: a) increase the
> system file handle
> limits, b) make sure that you reuse IndexReaders as much as
> you can rather
> across requests and client rather than opening and closing them.
>
> > -----Original Message-----
> > From: Nader S. Henein [mailto:nsh@bayt.net]
> > Sent: Wednesday, June 26, 2002 10:11 AM
> > To: lucene-user@jakarta.apache.org
> > Subject: Stress Testing Lucene
> > Importance: High
> >
> >
> >
> > Hey people,
> >
> > I'm running a Lucene (v1.2) servlet on resin and I must say
> > compared to
> > Oracle Intermedia
> > it's working beautifully. BUT today, I started stress testing and I
> > downloaded a program called
> > Web Roller, witch simulates clients, requests ,
> > multi-threading .. the works
> > and I was testing
> > I was doing something like 50 simultaneous requests and I was
> > repeating that
> > 10 times in a row.
> >
> > but then something happened and the index got corrupted,
> > every time I try
> > opening the index
> > with the reader to search or open with the writer to optimize
> > I get that
> > damned too-many files
> > open error. I can imagine that every application on the market has a
> > breaking point and these breaking
> > points have side effects, so is the corruption of the index a
> > side effect
> > and if so is there a way that
> > I configure my web server to crash before the corruption
> > occurs, I'd rather
> > re-start the web server and
> > throw some people off wack rather that have to re-build the
> > index or revert
> > to an older version.
> >
> > Do you know of any way to safeguard against this ?
> >
> > General Info:
> > The index is about 45 MB with 60 000 XML files each
> > containing 18-25 fields.
> >
> >
> > Nader S. Henein
> > Bayt.com , Dubai Internet City
> > Tel. +9714 3911900
> > Fax. +9714 3911915
> > GSM. +9715 05659557
> > www.bayt.com
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Stress Testing Lucene [ In reply to ]
Yes could you please post the implementation of the Lucene-User board
I think you need to send it to one of the guys so they can post it as
an attachment to your message.

thanks

Nader

-----Original Message-----
From: Roman Rokytskyy [mailto:rrokytskyy@yahoo.co.uk]
Sent: Thursday, June 27, 2002 7:24 PM
To: Lucene Users List
Subject: RE: Stress Testing Lucene


> Which came first--the out of file handles error or the corruption? I
> haven't looked, but I would guess that if you ran into the file handles
> exception while writing, that might leave Lucene in a bad state. Lucene
> isn't transactional and doesn't really have the ACID properties of a
> database...

A little bit offtopic...

I have implemented JDSDirectory that stores Lucene indices in JDataStore
database from Borland. JDataStore implements FS-like storage, where SQL
tables are just "files" of type "table". They do have transaction
management, but I do not know if it covers any file type or only tables, I
haven't make any testing yet.

If you want, I can send this implementation to you. This might solve your
problem, because there's no file handles involved, if TxManager spans the
whole database, you get ACID properties there. The drawbacks are that
JDataStore is not for free (there's very funny licensing involved, sometimes
$100 USD, sometimes $1,000 or more) and I haven't tested it with big
databases (but Borland claims to support couple of terabytes :)). Speed was
similar to FSDirectory in my tests (difference of ~5-10% max.).

I was thinking of posting it to Lucene contributions, but recently got
completely different task and have no time to check if the code is clean
enough to make it public.

Best regards,
Roman Rokytskyy


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>




--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Stress Testing Lucene [ In reply to ]
That's the weird thing I wasn't writing to the index at the
time I was searching (hardcore searching) 20 clients each one
issuing 20 simultaneous search request .. it was going fine until
it started throwing errors at me and when I looked at the logs, I found
a set of "Too many files open" error. Previously this only happened if
their was a crash on the server while indexing leaving an un-optimized index
with 800+ files.

-----Original Message-----
From: Doug Cutting [mailto:cutting@lucene.com]
Sent: Thursday, June 27, 2002 7:36 PM
To: Lucene Users List
Subject: Re: Stress Testing Lucene


It's very hard to leave an index in a bad state. Updating the
"segments" file atomically updates the index. So the only way to
corrupt things is to only partly update the segments file. But that too
is hard, since it's first written to a temporary file, which is then
renamed "segments". The only vulnerability I know if is that in Java on
Win32 you can't atomically rename a file to something that already
exists, so Lucene has to first remove the old version. So if you were
to crash between the time that the old version of "segments" is removed
and the new version is moved into place, then the index would be
corrupt, because it would have no "segments" file.

Doug

Scott Ganyo wrote:
> Which came first--the out of file handles error or the corruption? I
> haven't looked, but I would guess that if you ran into the file handles
> exception while writing, that might leave Lucene in a bad state. Lucene
> isn't transactional and doesn't really have the ACID properties of a
> database...
>
>
>>-----Original Message-----
>>From: Nader S. Henein [mailto:nsh@bayt.net]
>>Sent: Wednesday, June 26, 2002 11:45 PM
>>To: Lucene Users List
>>Subject: RE: Stress Testing Lucene
>>
>>
>>I rebooted my machine and still the same issue .. if I know
>>what caused that to happen, I would be able to solve it with
>>some source tweaking, and it's not the files handles on the machine I
>>got over that problem months ago. Let's consider worst case
>>scenario and
>>that
>>corruption did occur what could be the reasons, I'm goig to need some
>>insider
>>help to get through this one.
>>
>>N.
>>
>>-----Original Message-----
>>From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
>>Sent: Wednesday, June 26, 2002 7:15 PM
>>To: 'Lucene Users List'
>>Subject: RE: Stress Testing Lucene
>>
>>
>>1) Are you sure that the index is corrupted? Maybe the file
>>handles just
>>haven't been released yet. Did you try to reboot and try again?
>>
>>2) To avoid the too-many files problem: a) increase the
>>system file handle
>>limits, b) make sure that you reuse IndexReaders as much as
>>you can rather
>>across requests and client rather than opening and closing them.
>>
>>
>>>-----Original Message-----
>>>From: Nader S. Henein [mailto:nsh@bayt.net]
>>>Sent: Wednesday, June 26, 2002 10:11 AM
>>>To: lucene-user@jakarta.apache.org
>>>Subject: Stress Testing Lucene
>>>Importance: High
>>>
>>>
>>>
>>>Hey people,
>>>
>>>I'm running a Lucene (v1.2) servlet on resin and I must say
>>>compared to
>>>Oracle Intermedia
>>>it's working beautifully. BUT today, I started stress testing and I
>>>downloaded a program called
>>>Web Roller, witch simulates clients, requests ,
>>>multi-threading .. the works
>>>and I was testing
>>>I was doing something like 50 simultaneous requests and I was
>>>repeating that
>>>10 times in a row.
>>>
>>>but then something happened and the index got corrupted,
>>>every time I try
>>>opening the index
>>>with the reader to search or open with the writer to optimize
>>>I get that
>>>damned too-many files
>>>open error. I can imagine that every application on the market has a
>>>breaking point and these breaking
>>>points have side effects, so is the corruption of the index a
>>>side effect
>>>and if so is there a way that
>>>I configure my web server to crash before the corruption
>>>occurs, I'd rather
>>>re-start the web server and
>>>throw some people off wack rather that have to re-build the
>>>index or revert
>>>to an older version.
>>>
>>>Do you know of any way to safeguard against this ?
>>>
>>>General Info:
>>>The index is about 45 MB with 60 000 XML files each
>>>containing 18-25 fields.
>>>
>>>
>>>Nader S. Henein
>>>Bayt.com , Dubai Internet City
>>>Tel. +9714 3911900
>>>Fax. +9714 3911915
>>>GSM. +9715 05659557
>>>www.bayt.com
>>>
>>>
>>>--
>>>To unsubscribe, e-mail:
>>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>>For additional commands, e-mail:
>>><mailto:lucene-user-help@jakarta.apache.org>
>>>
>>
>>--
>>To unsubscribe, e-mail:
>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>For additional commands, e-mail:
>><mailto:lucene-user-help@jakarta.apache.org>
>>
>



--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>




--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>