Mailing List Archive

Multiple instances of Lucene IndexWriter
We are currently evaluating Lucene for document indexing and a question came
up regarding multiple instances of IndexWriter possibly accessing the same
index (directory).

This would be a consequence of multiple instances of our application
possibly accessing the same index. Where multiple instances are used for
load balancing and fail over of the application.

The index could be either on a local drive when virtualization is used to
achieve multiple instances on a single box. The index could also be on a
shared drive (windows file sharing) and multiple server instances trying to
update it.

I have been looking around in the forums and it is always advised against
multiple instances of IndexWriter writing to the same index but I was
wondering whether the group has any suggestions for workarounds. Surely
there must be other load balanced applications using Lucene?

Some the workarounds I can think of OTTOMH:

1. each instance writing to a local index and merge these local indexes
periodically to a shared index where searching is performed

2. implement our own queuing algorithms by testing for write locks and wait
until locks are cleared

thank you,
David
--
View this message in context: http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Multiple instances of Lucene IndexWriter [ In reply to ]
David,

Have a look at Solr, http://lucene.apache.org/solr - it addresses
this issue and many others that you would likely encounter with using
pure Lucene.

Erik


On Oct 12, 2007, at 6:26 AM, David K wrote:

>
> We are currently evaluating Lucene for document indexing and a
> question came
> up regarding multiple instances of IndexWriter possibly accessing
> the same
> index (directory).
>
> This would be a consequence of multiple instances of our application
> possibly accessing the same index. Where multiple instances are
> used for
> load balancing and fail over of the application.
>
> The index could be either on a local drive when virtualization is
> used to
> achieve multiple instances on a single box. The index could also be
> on a
> shared drive (windows file sharing) and multiple server instances
> trying to
> update it.
>
> I have been looking around in the forums and it is always advised
> against
> multiple instances of IndexWriter writing to the same index but I was
> wondering whether the group has any suggestions for workarounds.
> Surely
> there must be other load balanced applications using Lucene?
>
> Some the workarounds I can think of OTTOMH:
>
> 1. each instance writing to a local index and merge these local
> indexes
> periodically to a shared index where searching is performed
>
> 2. implement our own queuing algorithms by testing for write locks
> and wait
> until locks are cleared
>
> thank you,
> David
> --
> View this message in context: http://www.nabble.com/Multiple-
> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Multiple instances of Lucene IndexWriter [ In reply to ]
Thank you for the quick response but at the moment we are interested in our
own (small) usage of Lucene. It may be that in the future it turns out that
Solr is the solution we need.

At the moment I was hoping for a more descriptive workaround for the issue
of using multiple instances of IndexWriter on the same index.




Erik Hatcher wrote:
>
> David,
>
> Have a look at Solr, http://lucene.apache.org/solr - it addresses
> this issue and many others that you would likely encounter with using
> pure Lucene.
>
> Erik
>
>
> On Oct 12, 2007, at 6:26 AM, David K wrote:
>
>>
>> We are currently evaluating Lucene for document indexing and a
>> question came
>> up regarding multiple instances of IndexWriter possibly accessing
>> the same
>> index (directory).
>>
>> This would be a consequence of multiple instances of our application
>> possibly accessing the same index. Where multiple instances are
>> used for
>> load balancing and fail over of the application.
>>
>> The index could be either on a local drive when virtualization is
>> used to
>> achieve multiple instances on a single box. The index could also be
>> on a
>> shared drive (windows file sharing) and multiple server instances
>> trying to
>> update it.
>>
>> I have been looking around in the forums and it is always advised
>> against
>> multiple instances of IndexWriter writing to the same index but I was
>> wondering whether the group has any suggestions for workarounds.
>> Surely
>> there must be other load balanced applications using Lucene?
>>
>> Some the workarounds I can think of OTTOMH:
>>
>> 1. each instance writing to a local index and merge these local
>> indexes
>> periodically to a shared index where searching is performed
>>
>> 2. implement our own queuing algorithms by testing for write locks
>> and wait
>> until locks are cleared
>>
>> thank you,
>> David
>> --
>> View this message in context: http://www.nabble.com/Multiple-
>> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>
>

--
View this message in context: http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13174201
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Multiple instances of Lucene IndexWriter [ In reply to ]
What you suggested is generally the most easygoing way to deal with
it, i.ehaving a separate index per writer and one serial merging
process. I have
dabbled with disabling (file system) locks and synchronizing the writing
processes by different means, but it's failure-prone unless you're very
familiar with the Lucene internals.
So, if it isn't a big hassle to create a serial merger (depends on your
hardware/communiction setup mostly I guess) I would recommend that.

On 10/12/07, David K <dkaspar@asite.com> wrote:
>
>
> Thank you for the quick response but at the moment we are interested in
> our
> own (small) usage of Lucene. It may be that in the future it turns out
> that
> Solr is the solution we need.
>
> At the moment I was hoping for a more descriptive workaround for the issue
> of using multiple instances of IndexWriter on the same index.
>
>
>
>
> Erik Hatcher wrote:
> >
> > David,
> >
> > Have a look at Solr, http://lucene.apache.org/solr - it addresses
> > this issue and many others that you would likely encounter with using
> > pure Lucene.
> >
> > Erik
> >
> >
> > On Oct 12, 2007, at 6:26 AM, David K wrote:
> >
> >>
> >> We are currently evaluating Lucene for document indexing and a
> >> question came
> >> up regarding multiple instances of IndexWriter possibly accessing
> >> the same
> >> index (directory).
> >>
> >> This would be a consequence of multiple instances of our application
> >> possibly accessing the same index. Where multiple instances are
> >> used for
> >> load balancing and fail over of the application.
> >>
> >> The index could be either on a local drive when virtualization is
> >> used to
> >> achieve multiple instances on a single box. The index could also be
> >> on a
> >> shared drive (windows file sharing) and multiple server instances
> >> trying to
> >> update it.
> >>
> >> I have been looking around in the forums and it is always advised
> >> against
> >> multiple instances of IndexWriter writing to the same index but I was
> >> wondering whether the group has any suggestions for workarounds.
> >> Surely
> >> there must be other load balanced applications using Lucene?
> >>
> >> Some the workarounds I can think of OTTOMH:
> >>
> >> 1. each instance writing to a local index and merge these local
> >> indexes
> >> periodically to a shared index where searching is performed
> >>
> >> 2. implement our own queuing algorithms by testing for write locks
> >> and wait
> >> until locks are cleared
> >>
> >> thank you,
> >> David
> >> --
> >> View this message in context: http://www.nabble.com/Multiple-
> >> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13174201
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>
Re: Multiple instances of Lucene IndexWriter [ In reply to ]
I can't really say I'm "very familiar with the Lucene internals" :-)

What method would you recommend for checking for locked indexes? I have seen
mainly two methods and would be interested in the faster one with less
overhead:

Directory directory = FSDirectory.getDirectory(indexDir);
directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked()

or
Directory directory = FSDirectory.getDirectory(indexDir);
IndexReader.isLocked(directory)

many thanks,
David


Fredrik Andersson-2 wrote:
>
> What you suggested is generally the most easygoing way to deal with
> it, i.ehaving a separate index per writer and one serial merging
> process. I have
> dabbled with disabling (file system) locks and synchronizing the writing
> processes by different means, but it's failure-prone unless you're very
> familiar with the Lucene internals.
> So, if it isn't a big hassle to create a serial merger (depends on your
> hardware/communiction setup mostly I guess) I would recommend that.
>
> On 10/12/07, David K <dkaspar@asite.com> wrote:
>>
>>
>> Thank you for the quick response but at the moment we are interested in
>> our
>> own (small) usage of Lucene. It may be that in the future it turns out
>> that
>> Solr is the solution we need.
>>
>> At the moment I was hoping for a more descriptive workaround for the
>> issue
>> of using multiple instances of IndexWriter on the same index.
>>
>>
>>
>>
>> Erik Hatcher wrote:
>> >
>> > David,
>> >
>> > Have a look at Solr, http://lucene.apache.org/solr - it addresses
>> > this issue and many others that you would likely encounter with using
>> > pure Lucene.
>> >
>> > Erik
>> >
>> >
>> > On Oct 12, 2007, at 6:26 AM, David K wrote:
>> >
>> >>
>> >> We are currently evaluating Lucene for document indexing and a
>> >> question came
>> >> up regarding multiple instances of IndexWriter possibly accessing
>> >> the same
>> >> index (directory).
>> >>
>> >> This would be a consequence of multiple instances of our application
>> >> possibly accessing the same index. Where multiple instances are
>> >> used for
>> >> load balancing and fail over of the application.
>> >>
>> >> The index could be either on a local drive when virtualization is
>> >> used to
>> >> achieve multiple instances on a single box. The index could also be
>> >> on a
>> >> shared drive (windows file sharing) and multiple server instances
>> >> trying to
>> >> update it.
>> >>
>> >> I have been looking around in the forums and it is always advised
>> >> against
>> >> multiple instances of IndexWriter writing to the same index but I was
>> >> wondering whether the group has any suggestions for workarounds.
>> >> Surely
>> >> there must be other load balanced applications using Lucene?
>> >>
>> >> Some the workarounds I can think of OTTOMH:
>> >>
>> >> 1. each instance writing to a local index and merge these local
>> >> indexes
>> >> periodically to a shared index where searching is performed
>> >>
>> >> 2. implement our own queuing algorithms by testing for write locks
>> >> and wait
>> >> until locks are cleared
>> >>
>> >> thank you,
>> >> David
>> >> --
>> >> View this message in context: http://www.nabble.com/Multiple-
>> >> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
>> >> Sent from the Lucene - General mailing list archive at Nabble.com.
>> >
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13174201
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>
>>
>
>

--
View this message in context: http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13177008
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Multiple instances of Lucene IndexWriter [ In reply to ]
Hi,

you would probably want to use the Lock.obtain() method to get atomicity
since IndexReader.isLocked doesn't actually acquire the lock. Another
process can swipe the lock between your IndexReader.isLocked and your actual
writes. So something like

if(directory.makeLock(...).obtain()) {
try { your writing stuff }
finally { directory.clearLock(...); }
} else {
wait for the lock
}

Best off testing this, been many major versions since I fiddled with locks..
but should work.

On 10/12/07, David K <dkaspar@asite.com> wrote:
>
>
> I can't really say I'm "very familiar with the Lucene internals" :-)
>
> What method would you recommend for checking for locked indexes? I have
> seen
> mainly two methods and would be interested in the faster one with less
> overhead:
>
> Directory directory = FSDirectory.getDirectory(indexDir);
> directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked()
>
> or
> Directory directory = FSDirectory.getDirectory(indexDir);
> IndexReader.isLocked(directory)
>
> many thanks,
> David
>
>
> Fredrik Andersson-2 wrote:
> >
> > What you suggested is generally the most easygoing way to deal with
> > it, i.ehaving a separate index per writer and one serial merging
> > process. I have
> > dabbled with disabling (file system) locks and synchronizing the writing
> > processes by different means, but it's failure-prone unless you're very
> > familiar with the Lucene internals.
> > So, if it isn't a big hassle to create a serial merger (depends on your
> > hardware/communiction setup mostly I guess) I would recommend that.
> >
> > On 10/12/07, David K <dkaspar@asite.com> wrote:
> >>
> >>
> >> Thank you for the quick response but at the moment we are interested in
> >> our
> >> own (small) usage of Lucene. It may be that in the future it turns out
> >> that
> >> Solr is the solution we need.
> >>
> >> At the moment I was hoping for a more descriptive workaround for the
> >> issue
> >> of using multiple instances of IndexWriter on the same index.
> >>
> >>
> >>
> >>
> >> Erik Hatcher wrote:
> >> >
> >> > David,
> >> >
> >> > Have a look at Solr, http://lucene.apache.org/solr - it addresses
> >> > this issue and many others that you would likely encounter with using
> >> > pure Lucene.
> >> >
> >> > Erik
> >> >
> >> >
> >> > On Oct 12, 2007, at 6:26 AM, David K wrote:
> >> >
> >> >>
> >> >> We are currently evaluating Lucene for document indexing and a
> >> >> question came
> >> >> up regarding multiple instances of IndexWriter possibly accessing
> >> >> the same
> >> >> index (directory).
> >> >>
> >> >> This would be a consequence of multiple instances of our application
> >> >> possibly accessing the same index. Where multiple instances are
> >> >> used for
> >> >> load balancing and fail over of the application.
> >> >>
> >> >> The index could be either on a local drive when virtualization is
> >> >> used to
> >> >> achieve multiple instances on a single box. The index could also be
> >> >> on a
> >> >> shared drive (windows file sharing) and multiple server instances
> >> >> trying to
> >> >> update it.
> >> >>
> >> >> I have been looking around in the forums and it is always advised
> >> >> against
> >> >> multiple instances of IndexWriter writing to the same index but I
> was
> >> >> wondering whether the group has any suggestions for workarounds.
> >> >> Surely
> >> >> there must be other load balanced applications using Lucene?
> >> >>
> >> >> Some the workarounds I can think of OTTOMH:
> >> >>
> >> >> 1. each instance writing to a local index and merge these local
> >> >> indexes
> >> >> periodically to a shared index where searching is performed
> >> >>
> >> >> 2. implement our own queuing algorithms by testing for write locks
> >> >> and wait
> >> >> until locks are cleared
> >> >>
> >> >> thank you,
> >> >> David
> >> >> --
> >> >> View this message in context: http://www.nabble.com/Multiple-
> >> >> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> >> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> >> >
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13174201
> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13177008
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>
Re: Multiple instances of Lucene IndexWriter [ In reply to ]
I think you can just rely on IndexWriter's locking: it will acquire
the write lock, and, throw a LockObtainFailedException if it failed to
acquire it. You can simply catch this exception, wait some amount of
time, and retry?

Beware, though, that if a writer died ungracefully (JVM crashed, or,
writer was not actually closed before JVM exited) then the current
default LockFactory (SimpleFSLockFactory) will leave the lock acquired
and you must manually release it or delete the lock file. You can
switch to NativeFSLockFactory to avoid that, however, that locking
implementation has issues over NFS (which you won't hit if your app is
all Windows).

Also, for future reference, this kind of question really should be
asked on java-user instead (general is for broader questions that span
all of the Lucene projects).

Mike

"Fredrik Andersson" <fidde.andersson@gmail.com> wrote:
> Hi,
>
> you would probably want to use the Lock.obtain() method to get atomicity
> since IndexReader.isLocked doesn't actually acquire the lock. Another
> process can swipe the lock between your IndexReader.isLocked and your
> actual
> writes. So something like
>
> if(directory.makeLock(...).obtain()) {
> try { your writing stuff }
> finally { directory.clearLock(...); }
> } else {
> wait for the lock
> }
>
> Best off testing this, been many major versions since I fiddled with
> locks..
> but should work.
>
> On 10/12/07, David K <dkaspar@asite.com> wrote:
> >
> >
> > I can't really say I'm "very familiar with the Lucene internals" :-)
> >
> > What method would you recommend for checking for locked indexes? I have
> > seen
> > mainly two methods and would be interested in the faster one with less
> > overhead:
> >
> > Directory directory = FSDirectory.getDirectory(indexDir);
> > directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked()
> >
> > or
> > Directory directory = FSDirectory.getDirectory(indexDir);
> > IndexReader.isLocked(directory)
> >
> > many thanks,
> > David
> >
> >
> > Fredrik Andersson-2 wrote:
> > >
> > > What you suggested is generally the most easygoing way to deal with
> > > it, i.ehaving a separate index per writer and one serial merging
> > > process. I have
> > > dabbled with disabling (file system) locks and synchronizing the writing
> > > processes by different means, but it's failure-prone unless you're very
> > > familiar with the Lucene internals.
> > > So, if it isn't a big hassle to create a serial merger (depends on your
> > > hardware/communiction setup mostly I guess) I would recommend that.
> > >
> > > On 10/12/07, David K <dkaspar@asite.com> wrote:
> > >>
> > >>
> > >> Thank you for the quick response but at the moment we are interested in
> > >> our
> > >> own (small) usage of Lucene. It may be that in the future it turns out
> > >> that
> > >> Solr is the solution we need.
> > >>
> > >> At the moment I was hoping for a more descriptive workaround for the
> > >> issue
> > >> of using multiple instances of IndexWriter on the same index.
> > >>
> > >>
> > >>
> > >>
> > >> Erik Hatcher wrote:
> > >> >
> > >> > David,
> > >> >
> > >> > Have a look at Solr, http://lucene.apache.org/solr - it addresses
> > >> > this issue and many others that you would likely encounter with using
> > >> > pure Lucene.
> > >> >
> > >> > Erik
> > >> >
> > >> >
> > >> > On Oct 12, 2007, at 6:26 AM, David K wrote:
> > >> >
> > >> >>
> > >> >> We are currently evaluating Lucene for document indexing and a
> > >> >> question came
> > >> >> up regarding multiple instances of IndexWriter possibly accessing
> > >> >> the same
> > >> >> index (directory).
> > >> >>
> > >> >> This would be a consequence of multiple instances of our application
> > >> >> possibly accessing the same index. Where multiple instances are
> > >> >> used for
> > >> >> load balancing and fail over of the application.
> > >> >>
> > >> >> The index could be either on a local drive when virtualization is
> > >> >> used to
> > >> >> achieve multiple instances on a single box. The index could also be
> > >> >> on a
> > >> >> shared drive (windows file sharing) and multiple server instances
> > >> >> trying to
> > >> >> update it.
> > >> >>
> > >> >> I have been looking around in the forums and it is always advised
> > >> >> against
> > >> >> multiple instances of IndexWriter writing to the same index but I
> > was
> > >> >> wondering whether the group has any suggestions for workarounds.
> > >> >> Surely
> > >> >> there must be other load balanced applications using Lucene?
> > >> >>
> > >> >> Some the workarounds I can think of OTTOMH:
> > >> >>
> > >> >> 1. each instance writing to a local index and merge these local
> > >> >> indexes
> > >> >> periodically to a shared index where searching is performed
> > >> >>
> > >> >> 2. implement our own queuing algorithms by testing for write locks
> > >> >> and wait
> > >> >> until locks are cleared
> > >> >>
> > >> >> thank you,
> > >> >> David
> > >> >> --
> > >> >> View this message in context: http://www.nabble.com/Multiple-
> > >> >> instances-of-Lucene-IndexWriter-tf4612568.html#a13172543
> > >> >> Sent from the Lucene - General mailing list archive at Nabble.com.
> > >> >
> > >> >
> > >> >
> > >>
> > >> --
> > >> View this message in context:
> > >>
> > http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13174201
> > >> Sent from the Lucene - General mailing list archive at Nabble.com.
> > >>
> > >>
> > >
> > >
> >
> > --
> > View this message in context:
> > http://www.nabble.com/Multiple-instances-of-Lucene-IndexWriter-tf4612568.html#a13177008
> > Sent from the Lucene - General mailing list archive at Nabble.com.
> >
> >