Mailing List Archive: Searcher/Reader/Writer Management

Searcher/Reader/Writer Management

Apr 2, 2002, 1:28 PM

Post #1 of 10 (1029 views)

It seems that a lot of people run into the same set of problems with
maintaining readers, writers, and searchers in Lucene. Part of the problem
comes from the fact that delete() is on a reader and part of it comes from
the need to keep a reader or search open as long as possible to avoid
open/close overhead. Add to that the need for some applications (like mine)
to maintain multiple indexes and you have a pretty complex problem space.
Bottom line: Use of Lucene (especially in an online, interactive
application) doesn't seem as easy as it could be.

Anyway... I've attempted to address these issues in my application with a
control class. And, I'm submitting what I have to lucene dev for review...
and consideration for possible inclusion in the core lucene framework. I
tried to be fairly efficient and to handle all cases that I know about so
let me know if it fails in either case and what you think overall.

Thanks,
Scott

-------

import java.io.*;
import java.util.*;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.search.*;
import org.apache.lucene.analysis.*;

/** Rules:
* Once created, searchers are valid until closed.
* If doing an update, searchers must not be created between close of the
reader
* that deletes and close of the writer that adds.
* Writers may be reused until Reader or Searcher is needed.
* Readers may be reused until Writer or Searcher is needed.
* Searchers may be reused until the index is changed.
* There may be only one of a Reader and/or Writer at a time.
* If you get it, release it.
*/
public class IndexAccessControl
{
public static final Analyzer LUCENE_ANALYZER = new LuceneAnalyzer();
private static final Map WRITER_PATHS = new HashMap(); // path ->
CheckoutInfo
private static final Map SEARCHER_PATHS = new HashMap(); // path ->
Searcher
private static final Map OLD_SEARCHERS = new HashMap(); // Searcher ->
CheckoutInfo

/** get for adding documents.
* blocks: readers until released
*/
public static IndexWriter getWriter(File path) throws IOException
{
IndexWriter writer = null;
String sync = path.getAbsolutePath().intern();
synchronized (sync) // sync on specific index
{
do
{
CheckoutInfo info = (CheckoutInfo)WRITER_PATHS.get(path);
if (info != null) // may already have a writer, use it
{
if (info.writer != null) // yup, have a writer
{
info.checkoutCount++;
writer = info.writer;
}
else // not a writer, it must be a reader, wait for it
to finish to try again
{
try
{
info.wait(); // wait for info to be released
}
catch (InterruptedException e)
{
// TODO: Will this ever happen?
e.printStackTrace();
return null;
}
}
}
else // no writer, create one
{
boolean missing = !path.exists();
if (missing) path.mkdir();
writer = new IndexWriter(path, LUCENE_ANALYZER,
/*create*/missing);
writer.mergeFactor = 2;
info = new CheckoutInfo(writer);
}
}
while (writer == null);
}
return writer;
}

public static void releaseWriter(File path, IndexWriter writer) throws
IOException
{
String sync = path.getAbsolutePath().intern();
synchronized (sync) // sync on specific index
{
CheckoutInfo info = (CheckoutInfo)WRITER_PATHS.get(path);
if (info != null && writer == info.writer) // writer was checked
out
{
if (info.checkoutCount > 1) // writer has other references
{
info.checkoutCount--;
writer = null; // avoid close()
}
else // last reference to writer
{
WRITER_PATHS.remove(path);
writer = info.writer;
info.notify(); // notify waiters to try again
}
}
}
// close the writer (unless it still has checkouts)
if (writer != null) writer.close();
}

/** get for search. */
public static Searcher getSearcher(File path) throws IOException
{
IndexSearcher is;
String sync = path.getAbsolutePath().intern();
synchronized (sync) // sync on specific index
{
CheckoutInfo info = (CheckoutInfo)SEARCHER_PATHS.get(path);
if (info == null || IndexReader.lastModified(path) >
info.creationTime)
{
// need new searcher
is = new IndexSearcher(IndexReader.open(path));
info = (CheckoutInfo)SEARCHER_PATHS.put(path, new
CheckoutInfo(is));
if (info != null)
{
if (info.checkoutCount > 1) // searcher has other
references
{
info.checkoutCount--;
OLD_SEARCHERS.put(info.searcher, info);
}
else // last reference to searcher
{
info.searcher.close();
}
}
}
else
{
// use existing searcher
is = info.searcher;
info.checkoutCount++;
}
}
return is;
}

public static void releaseSearcher(File path, Searcher searcher) throws
IOException
{
String sync = path.getAbsolutePath().intern();
synchronized (sync) // sync on specific index
{
CheckoutInfo info = (CheckoutInfo)SEARCHER_PATHS.get(path);
if (info == null || searcher != info.searcher) // this isn't the
info we're looking for
{
info = (CheckoutInfo)OLD_SEARCHERS.get(searcher);
}
if (info != null) // found a searcher
{
if (info.checkoutCount > 1) // searcher has other references
{
info.checkoutCount--;
}
else // last reference to searcher
{
info.searcher.close();
}
}
else // can't find searcher, just close it
{
searcher.close();
}
}
}

/** get for deleting documents.
* blocks: writers until released
*/
public static IndexReader getReader(File path) throws IOException
{
IndexReader reader = null;
String sync = path.getAbsolutePath().intern();
synchronized (sync) // sync on specific index
{
do
{
CheckoutInfo info = (CheckoutInfo)WRITER_PATHS.get(path);
if (info != null) // may already have a reader, use it
{
if (info.reader != null) // yup, have a reader
{
info.checkoutCount++;
reader = info.reader;
}
else // not a reader, it must be a writer, wait for it
to finish to try again
{
try
{
info.wait(); // wait for info to be released
}
catch (InterruptedException e)
{
// TODO: Will this ever happen?
e.printStackTrace();
return null;
}
}
}
else // no reader, create one
{
reader = IndexReader.open(path);
info = new CheckoutInfo(reader);
}
}
while (reader == null);
}
return reader;
}

public static void releaseReader(File path, IndexReader reader) throws
IOException
{
String sync = path.getAbsolutePath().intern();
synchronized (sync) // sync on specific index
{
CheckoutInfo info = (CheckoutInfo)WRITER_PATHS.get(path);
if (info != null && reader == info.reader) // reader was checked
out
{
if (info.checkoutCount > 1) // reader has other references
{
info.checkoutCount--;
reader = null; // avoid close()
}
else // last reference to reader
{
WRITER_PATHS.remove(path);
reader = info.reader;
info.notify(); // notify waiters to try again
}
}
}
// close the reader (unless it still has checkouts)
if (reader != null) reader.close();
}

/** used for updates to make sure nobody else grabs a writer or reader
between
* release and get operations.
*/
public static IndexWriter releaseReaderAndGetWriter(File path,
IndexReader reader) throws IOException
{
String sync = path.getAbsolutePath().intern();
synchronized (sync) // sync on specific index
{
releaseReader(path, reader);
return getWriter(path);
}
}

private static class CheckoutInfo
{
CheckoutInfo(IndexWriter writer)
{
this.writer = writer;
}
CheckoutInfo(IndexReader reader)
{
this.reader = reader;
}
CheckoutInfo(IndexSearcher searcher)
{
this.searcher = searcher;
this.creationTime = System.currentTimeMillis();
}

public IndexReader reader;
public IndexWriter writer;
public IndexSearcher searcher;
public int checkoutCount = 1;
public long creationTime;
}

/** no instances */
private IndexAccessControl() { }
}

RE: Searcher/Reader/Writer Management [ In reply to ]

halacsy.peter at axelero

Apr 3, 2002, 12:38 AM

Post #2 of 10 (1019 views)

Permalink

Hello,
> -----Original Message-----
> From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> Sent: Tuesday, April 02, 2002 10:29 PM
> To: Lucene-Dev (E-mail)
> Subject: Searcher/Reader/Writer Management
>
>
> It seems that a lot of people run into the same set of problems with
> maintaining readers, writers, and searchers in Lucene. Part
> of the problem
> comes from the fact that delete() is on a reader and part of
> it comes from
> the need to keep a reader or search open as long as possible to avoid
> open/close overhead. Add to that the need for some
> applications (like mine)
> to maintain multiple indexes and you have a pretty complex
> problem space.
> Bottom line: Use of Lucene (especially in an online, interactive
> application) doesn't seem as easy as it could be.
>
I agree that it's very important to solve this problem and provide a standard solution.

> And, I'm submitting what I have to lucene dev
> for review...
> and consideration for possible inclusion in the core lucene
> framework. I
> tried to be fairly efficient and to handle all cases that I
> know about so
> let me know if it fails in either case and what you think overall.

1. Why don't we have as many control/manager object as index we want to use?
2. Why is it a static class with only static methods? I think:
IndexAccesControl control = new IndexAccesControl(); // or IndexControl.newInstance();
control.setAnalyzer(...);
control.setMergeFactor(..);

Searcher searcher = control.getSearcher();
// do something here
control.release(searcher);

3. Couldn't we wrap the release logic into a ControledSearcher (Yesterday I've writter CachedSearcher)?
IndexControl control = new IndexControl(); // or IndexControl.newInstance();
control.setAnalyzer(...);
control.setMergeFactor(..);

Searcher searcher = control.getSearcher();
// do something here
searcher.close();

The ControledSearcher is something like this (it could be an inner class of IndexControlAcces so we don't have to save reference to control):
private class ControledSearcher {
Searcher m_searcher;
IndexAccesControl m_control;
ControledSearcher(Controler control, Searcher searcher) {
m_control = control;
m_searcher = searcher;
}

public void close() throws IOException {
control.release(searcher);
}

public Hits search(Query query) {
// delegate the work
return m_searcher.search(query);
}
...
}

4. If I understand your code index can't be written while there is opened searcher? Is it right? I don't understand what is the connection between Reader and Writer instances on the same index.

5. Can someone imagine situation when more than one Analyzers are used in an application?

6. Shouldn't we only manage IndexReader and create new instance of Searcher on every request?

7. Finally, I think IndexAccesControl should have the next interface:
public interface IndexAccessControl {
public Searcher getSearcher();
public IndexReader getReader();
public IndexWriter getWriter();
}

and we could have more than one implementation (using JAXP style factories or Avalon or something similar)

>
> Thanks,
> Scott
>

peter

> -------
>
> import java.io.*;
> import java.util.*;
> import org.apache.lucene.index.*;
> import org.apache.lucene.document.*;
> import org.apache.lucene.search.*;
> import org.apache.lucene.analysis.*;
>
> /** Rules:
> * Once created, searchers are valid until closed.
> * If doing an update, searchers must not be created between
> close of the
> reader
> * that deletes and close of the writer that adds.
> * Writers may be reused until Reader or Searcher is needed.
> * Readers may be reused until Writer or Searcher is needed.
> * Searchers may be reused until the index is changed.
> * There may be only one of a Reader and/or Writer at a time.
> * If you get it, release it.
> */
> public class IndexAccessControl
> {
> public static final Analyzer LUCENE_ANALYZER = new
> LuceneAnalyzer();
> private static final Map WRITER_PATHS = new HashMap(); // path ->
> CheckoutInfo
> private static final Map SEARCHER_PATHS = new HashMap();
> // path ->
> Searcher
> private static final Map OLD_SEARCHERS = new HashMap();
> // Searcher ->
> CheckoutInfo
>
> /** get for adding documents.
> * blocks: readers until released
> */
> public static IndexWriter getWriter(File path) throws IOException
> {
> IndexWriter writer = null;
> String sync = path.getAbsolutePath().intern();
> synchronized (sync) // sync on specific index
> {
> do
> {
> CheckoutInfo info =
> (CheckoutInfo)WRITER_PATHS.get(path);
> if (info != null) // may already have a writer, use it
> {
> if (info.writer != null) // yup, have a writer
> {
> info.checkoutCount++;
> writer = info.writer;
> }
> else // not a writer, it must be a
> reader, wait for it
> to finish to try again
> {
> try
> {
> info.wait(); // wait for info to
> be released
> }
> catch (InterruptedException e)
> {
> // TODO: Will this ever happen?
> e.printStackTrace();
> return null;
> }
> }
> }
> else // no writer, create one
> {
> boolean missing = !path.exists();
> if (missing) path.mkdir();
> writer = new IndexWriter(path, LUCENE_ANALYZER,
> /*create*/missing);
> writer.mergeFactor = 2;
> info = new CheckoutInfo(writer);
> }
> }
> while (writer == null);
> }
> return writer;
> }
>
> public static void releaseWriter(File path, IndexWriter
> writer) throws
> IOException
> {
> String sync = path.getAbsolutePath().intern();
> synchronized (sync) // sync on specific index
> {
> CheckoutInfo info = (CheckoutInfo)WRITER_PATHS.get(path);
> if (info != null && writer == info.writer) //
> writer was checked
> out
> {
> if (info.checkoutCount > 1) // writer has
> other references
> {
> info.checkoutCount--;
> writer = null; // avoid close()
> }
> else // last reference to writer
> {
> WRITER_PATHS.remove(path);
> writer = info.writer;
> info.notify(); // notify waiters to try again
> }
> }
> }
> // close the writer (unless it still has checkouts)
> if (writer != null) writer.close();
> }
>
> /** get for search. */
> public static Searcher getSearcher(File path) throws IOException
> {
> IndexSearcher is;
> String sync = path.getAbsolutePath().intern();
> synchronized (sync) // sync on specific index
> {
> CheckoutInfo info =
> (CheckoutInfo)SEARCHER_PATHS.get(path);
> if (info == null || IndexReader.lastModified(path) >
> info.creationTime)
> {
> // need new searcher
> is = new IndexSearcher(IndexReader.open(path));
> info = (CheckoutInfo)SEARCHER_PATHS.put(path, new
> CheckoutInfo(is));
> if (info != null)
> {
> if (info.checkoutCount > 1) // searcher has other
> references
> {
> info.checkoutCount--;
> OLD_SEARCHERS.put(info.searcher, info);
> }
> else // last reference to searcher
> {
> info.searcher.close();
> }
> }
> }
> else
> {
> // use existing searcher
> is = info.searcher;
> info.checkoutCount++;
> }
> }
> return is;
> }
>
> public static void releaseSearcher(File path, Searcher
> searcher) throws
> IOException
> {
> String sync = path.getAbsolutePath().intern();
> synchronized (sync) // sync on specific index
> {
> CheckoutInfo info =
> (CheckoutInfo)SEARCHER_PATHS.get(path);
> if (info == null || searcher != info.searcher) //
> this isn't the
> info we're looking for
> {
> info = (CheckoutInfo)OLD_SEARCHERS.get(searcher);
> }
> if (info != null) // found a searcher
> {
> if (info.checkoutCount > 1) // searcher has
> other references
> {
> info.checkoutCount--;
> }
> else // last reference to searcher
> {
> info.searcher.close();
> }
> }
> else // can't find searcher, just close it
> {
> searcher.close();
> }
> }
> }
>
> /** get for deleting documents.
> * blocks: writers until released
> */
> public static IndexReader getReader(File path) throws IOException
> {
> IndexReader reader = null;
> String sync = path.getAbsolutePath().intern();
> synchronized (sync) // sync on specific index
> {
> do
> {
> CheckoutInfo info =
> (CheckoutInfo)WRITER_PATHS.get(path);
> if (info != null) // may already have a reader, use it
> {
> if (info.reader != null) // yup, have a reader
> {
> info.checkoutCount++;
> reader = info.reader;
> }
> else // not a reader, it must be a
> writer, wait for it
> to finish to try again
> {
> try
> {
> info.wait(); // wait for info to
> be released
> }
> catch (InterruptedException e)
> {
> // TODO: Will this ever happen?
> e.printStackTrace();
> return null;
> }
> }
> }
> else // no reader, create one
> {
> reader = IndexReader.open(path);
> info = new CheckoutInfo(reader);
> }
> }
> while (reader == null);
> }
> return reader;
> }
>
> public static void releaseReader(File path, IndexReader
> reader) throws
> IOException
> {
> String sync = path.getAbsolutePath().intern();
> synchronized (sync) // sync on specific index
> {
> CheckoutInfo info = (CheckoutInfo)WRITER_PATHS.get(path);
> if (info != null && reader == info.reader) //
> reader was checked
> out
> {
> if (info.checkoutCount > 1) // reader has
> other references
> {
> info.checkoutCount--;
> reader = null; // avoid close()
> }
> else // last reference to reader
> {
> WRITER_PATHS.remove(path);
> reader = info.reader;
> info.notify(); // notify waiters to try again
> }
> }
> }
> // close the reader (unless it still has checkouts)
> if (reader != null) reader.close();
> }
>
> /** used for updates to make sure nobody else grabs a
> writer or reader
> between
> * release and get operations.
> */
> public static IndexWriter releaseReaderAndGetWriter(File path,
> IndexReader reader) throws IOException
> {
> String sync = path.getAbsolutePath().intern();
> synchronized (sync) // sync on specific index
> {
> releaseReader(path, reader);
> return getWriter(path);
> }
> }
>
> private static class CheckoutInfo
> {
> CheckoutInfo(IndexWriter writer)
> {
> this.writer = writer;
> }
> CheckoutInfo(IndexReader reader)
> {
> this.reader = reader;
> }
> CheckoutInfo(IndexSearcher searcher)
> {
> this.searcher = searcher;
> this.creationTime = System.currentTimeMillis();
> }
>
> public IndexReader reader;
> public IndexWriter writer;
> public IndexSearcher searcher;
> public int checkoutCount = 1;
> public long creationTime;
> }
>
> /** no instances */
> private IndexAccessControl() { }
> }
>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

RE: Searcher/Reader/Writer Management [ In reply to ]

scott.ganyo at eTapestry

Apr 3, 2002, 7:43 AM

Post #3 of 10 (1025 views)

Permalink

(quotes clipped for brevity)

> 1. Why don't we have as many control/manager object as index
> we want to use?
> 2. Why is it a static class with only static methods?

That would destroy what this class is attempting to accomplish: Guarding
the index resources so that you don't have to think about concurrency. If
you could have multiple controllers, they would then have to somehow
coordinate between themselves... making the class even more complex.

> 3. Couldn't we wrap the release logic into a
> ControledSearcher (Yesterday I've writter CachedSearcher)?

Nope. For example, in my application I need to hold onto a single Searcher
during the course of a transaction... if I was forced to use a new Searcher
for each query, I couldn't be guaranteed consistent results throughout the
transaction.

> 4. If I understand your code index can't be written while
> there is opened searcher? Is it right? I don't understand
> what is the connection between Reader and Writer instances on
> the same index.

You should be able to get a writer while searchers are in use, but not while
a reader is in use. Did you see a place where this is not the case? I
should have mentioned: In this scenario, Readers are assumed to be used for
delete (since they can be), so a Writer cannot be retrieved while a Reader
is open and vice versa.

> 5. Can someone imagine situation when more than one Analyzers
> are used in an application?

I could imagine one, but you're right: I certainly didn't design for it.

> 6. Shouldn't we only manage IndexReader and create new
> instance of Searcher on every request?

Hmm... potentially... haven't really thought about that...

> 7. Finally, I think IndexAccesControl should have the next interface:
> public interface IndexAccessControl {
> public Searcher getSearcher();
> public IndexReader getReader();
> public IndexWriter getWriter();
> }
>
> and we could have more than one implementation (using JAXP
> style factories or Avalon or something similar)

To what end? The rules of index access must be obeyed for proper operation.
If you need specialized behavior, you could just wrap this class with your
specialized class and delegate. (Which is exactly what I do in my
framework.)

Scott

RE: Searcher/Reader/Writer Management [ In reply to ]

halacsy.peter at axelero

Apr 3, 2002, 8:05 AM

Post #4 of 10 (1035 views)

Permalink

> -----Original Message-----
> From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> Sent: Wednesday, April 03, 2002 4:44 PM
> To: 'Lucene Developers List'
> Subject: RE: Searcher/Reader/Writer Management
>
>
> (quotes clipped for brevity)
>
> > 1. Why don't we have as many control/manager object as index
> > we want to use?
> > 2. Why is it a static class with only static methods?
>
> That would destroy what this class is attempting to
> accomplish: Guarding
> the index resources so that you don't have to think about
> concurrency. If
> you could have multiple controllers, they would then have to somehow
> coordinate between themselves... making the class even more complex.
>
I mean we could have as many instance as many index (path or directory) we want to manage. For example
IndexAccessControl iac = new IndexAccessControl(myPath);
Searcher searcher = iac.getSearcher();

> > 3. Couldn't we wrap the release logic into a
> > ControledSearcher (Yesterday I've writter CachedSearcher)?
>
> Nope. For example, in my application I need to hold onto a
> single Searcher
> during the course of a transaction... if I was forced to use
> a new Searcher
> for each query, I couldn't be guaranteed consistent results
> throughout the
> transaction.

You are not forced. You can hold reference to ControledSearcher as long you want. I only suggest that developer should call
searcher.close()
instead of
IndexAccessControl.releaseSearcher(searcher)

>
> > 4. If I understand your code index can't be written while
> > there is opened searcher? Is it right? I don't understand
> > what is the connection between Reader and Writer instances on
> > the same index.
>
> You should be able to get a writer while searchers are in
> use, but not while
> a reader is in use. Did you see a place where this is not
> the case? I
> should have mentioned: In this scenario, Readers are assumed
> to be used for
> delete (since they can be), so a Writer cannot be retrieved
> while a Reader
> is open and vice versa.
>

Ok, I see.
>
> Scott
>

peter

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

RE: Searcher/Reader/Writer Management [ In reply to ]

scott.ganyo at eTapestry

Apr 3, 2002, 8:24 AM

Post #5 of 10 (1025 views)

Permalink

> > > 1. Why don't we have as many control/manager object as index
> > > we want to use?
> > > 2. Why is it a static class with only static methods?
> >
> > That would destroy what this class is attempting to
> > accomplish: Guarding
> > the index resources so that you don't have to think about
> > concurrency. If
> > you could have multiple controllers, they would then have to somehow
> > coordinate between themselves... making the class even more complex.
> >
> I mean we could have as many instance as many index (path or
> directory) we want to manage. For example
> IndexAccessControl iac = new IndexAccessControl(myPath);
> Searcher searcher = iac.getSearcher();

Yes, but then you rely on the user to make sure that they don't create two
on the same path, right? If you really don't like the static style for some
reason (why?), I suppose we could have a static accessor like:
"IndexAccessControl getController(path)" and maintain an internal map of
controllers to avoid conflicts...

> > > 3. Couldn't we wrap the release logic into a
> > > ControledSearcher (Yesterday I've writter CachedSearcher)?
> >
> > Nope. For example, in my application I need to hold onto a
> > single Searcher
> > during the course of a transaction... if I was forced to use
> > a new Searcher
> > for each query, I couldn't be guaranteed consistent results
> > throughout the
> > transaction.
>
> You are not forced. You can hold reference to
> ControledSearcher as long you want. I only suggest that
> developer should call
> searcher.close()
> instead of
> IndexAccessControl.releaseSearcher(searcher)

Ah. Yes, good point. You're right, we should wrap all 3 (IndexReader,
IndexWriter, and Searcher) classes for control purposes. (I guess I just
hadn't thought about this because I already wrap and consolidate these 3
classes into 2 classes: A Searcher and a Writer where the writer handles
add(), update(), and delete(). Maybe folks would be interested in that kind
of thing as well...?)

Scott

RE: Searcher/Reader/Writer Management [ In reply to ]

scott.ganyo at eTapestry

Apr 3, 2002, 8:29 AM

Post #6 of 10 (1024 views)

Permalink

> Yes, but then you rely on the user to make sure that they
> don't create two
> on the same path, right? If you really don't like the static
> style for some
> reason (why?), I suppose we could have a static accessor like:
> "IndexAccessControl getController(path)" and maintain an
> internal map of
> controllers to avoid conflicts...

Oh, never mind. I kept missing the fact that the Analyzer is hard-coded
into the class. You're right, if we wanted to have a difference Analyzer
per path, we'd probably want to break it up...

Scott

RE: Searcher/Reader/Writer Management [ In reply to ]

halacsy.peter at axelero

Apr 3, 2002, 8:43 AM

Post #7 of 10 (1023 views)

Permalink

> -----Original Message-----
> From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> Sent: Wednesday, April 03, 2002 5:30 PM
> To: 'Lucene Developers List'
> Subject: RE: Searcher/Reader/Writer Management
>
>
> > Yes, but then you rely on the user to make sure that they
> > don't create two
> > on the same path, right? If you really don't like the static
> > style for some
> > reason (why?), I suppose we could have a static accessor like:
> > "IndexAccessControl getController(path)" and maintain an
> > internal map of
> > controllers to avoid conflicts...
>
> Oh, never mind. I kept missing the fact that the Analyzer is
> hard-coded
> into the class. You're right, if we wanted to have a
> difference Analyzer
> per path, we'd probably want to break it up...
>
> Scott
>

That's why I've written that finally the interface is:
public interface IndexAccessControl {
public Searcher getSearcher();
public IndexWriter getWriter(boolean create);
pubic IndexReader getReader();
}

I'd like to use two implementation:
a. Avalon component: class of analyzer, mergeFactory, maxDocs etc. can be configured by XML elements
b. simple implementation where this properties can be set by setter methods

this is very similar to DataSource interface that has only one method getConnection()

> > I suppose we could have a static accessor like:
> > "IndexAccessControl getController(path)" and maintain an
> > internal map of
> > controllers to avoid conflicts...
yes, this is a good way as far as the "not Avalon implementation" is concerned

peter

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Searcher/Reader/Writer Management [ In reply to ]

reyes at charabia

Apr 3, 2002, 12:34 PM

Post #8 of 10 (1024 views)

Permalink

Hi!

I'd like to respond on this point:

> 5. Can someone imagine situation when more than one Analyzers are used in
an application?

Not only can I imagine such a situation, but I'd also strongly recommand it
for any high-quality application! If you are just targetting speed and light
cpu usage, sure, one single analyzer is enough. But your application will
get the precision/recall it deserves. A nice search engine should be
flexible enough to use several analyzers, and combine their result to
retrieve the best possible recall/precision. For example, say you are
looking for something related to "selling toothbrushes". The application
should retrieve all the occurrences matching exactly "selling toothbrushes"
(using a strict analyzer), but it may also retrieve "sell toothbrush" (using
a stemming normalizer). Why not retrieving "buy toothbrush" or "sell dental
tools" as well (kind of semantic normalizer/analyzer). One could also
imagine retrieving "Selin Toothbrushies" (phonetic normalizer).

Ok, so this increases the precision, but unfortunately increases
drastically the recall, right ? wrong : all this analyzers should be
ordered, and the final result should be a calculation using the results of
all those indexes. For instance, the results of the strict-analyzer-index
should be heavier than stemming, which should be heavier than phonetic, etc.
The very simple reason is that the more aggressive is the normalization
process, the less likely /hazardous is it to be exactly what the user is
looking for. Sure, it's CPU intensive, but here is the dilemma of the search
engines : be fast or be smart. My belief is that lucene, as a search engine,
should allow both kind of application (and I personnaly prefer smart SE,
rather than fast ones).

Rodrigo

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

RE: Searcher/Reader/Writer Management [ In reply to ]

halacsy.peter at axelero

Apr 9, 2002, 5:33 AM

Post #9 of 10 (1027 views)

Permalink

Hello Scott,
I've attached a new version of IndexAccessControl and a TEST file. First of all I think I've found a failure in you code:
1. get a searcher
2. use it
3. release it
4. get an other searcher
5. use it

In step 3. -- since there is only one reference to the searhcer -- the real searcher is closed:
<original-code>
else // last reference to searcher
{
info.searcher.close();
}
</original-code>
The info isn't deleted.

In step 4. in method getSearcher the info is found and the closed Searcher object is returned.

Other:
a. I changed the IndexAccessControl it's not static any more. You can make a new instance for every index.
b. I created a ManagedSearcher class that is returned by getSearcher. If you call close method of this class it notifies the IndexAccessControl to release the searcher.
So the usage:
IndexAccessControl iac = IndexAccessControl.getInstance("/pathToIndex");
Searcher searcher = iac.getSearcher();
// use the searcher
searcher.close();

If you like this architecture we could improve the code:
1. decoupling factory method and access control logic
2. decoupling searcher managment and reader/writer managment
3. solve the problem of last used searcher: if the last used searcher is released the searcher is closed. Getting searcher again takes a lot of time. We should have a pool of searcher where the size of pool is 1.

Unfortunatla to use compile and use ManagedSearcher class you have to modify lucene source: the Searcher has not public abstract methods --> you can't maka subclass of Searcher in other package that org.apache.lucene.search.

To use the class modify all abstract method of org.apache.lucene.search.Searcher to public or protected.

peter

> -----Original Message-----
> From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> Sent: Wednesday, April 03, 2002 5:24 PM
> To: 'Lucene Developers List'
> Subject: RE: Searcher/Reader/Writer Management
>
>
> > > > 1. Why don't we have as many control/manager object as index
> > > > we want to use?
> > > > 2. Why is it a static class with only static methods?
> > >
> > > That would destroy what this class is attempting to
> > > accomplish: Guarding
> > > the index resources so that you don't have to think about
> > > concurrency. If
> > > you could have multiple controllers, they would then have
> to somehow
> > > coordinate between themselves... making the class even
> more complex.
> > >
> > I mean we could have as many instance as many index (path or
> > directory) we want to manage. For example
> > IndexAccessControl iac = new IndexAccessControl(myPath);
> > Searcher searcher = iac.getSearcher();
>
> Yes, but then you rely on the user to make sure that they
> don't create two
> on the same path, right? If you really don't like the static
> style for some
> reason (why?), I suppose we could have a static accessor like:
> "IndexAccessControl getController(path)" and maintain an
> internal map of
> controllers to avoid conflicts...
>
> > > > 3. Couldn't we wrap the release logic into a
> > > > ControledSearcher (Yesterday I've writter CachedSearcher)?
> > >
> > > Nope. For example, in my application I need to hold onto a
> > > single Searcher
> > > during the course of a transaction... if I was forced to use
> > > a new Searcher
> > > for each query, I couldn't be guaranteed consistent results
> > > throughout the
> > > transaction.
> >
> > You are not forced. You can hold reference to
> > ControledSearcher as long you want. I only suggest that
> > developer should call
> > searcher.close()
> > instead of
> > IndexAccessControl.releaseSearcher(searcher)
>
> Ah. Yes, good point. You're right, we should wrap all 3
> (IndexReader,
> IndexWriter, and Searcher) classes for control purposes. (I
> guess I just
> hadn't thought about this because I already wrap and
> consolidate these 3
> classes into 2 classes: A Searcher and a Writer where the
> writer handles
> add(), update(), and delete(). Maybe folks would be
> interested in that kind
> of thing as well...?)
>
> Scott
>

RE: Searcher/Reader/Writer Management [ In reply to ]

scott.ganyo at eTapestry

Apr 9, 2002, 8:48 AM

Post #10 of 10 (1028 views)

Permalink

Peter wrote:
> In step 3. -- since there is only one reference to the
> searhcer -- the real searcher is closed:
> <original-code>
> else // last reference to searcher
> {
> info.searcher.close();
> }
> </original-code>
> The info isn't deleted.
>
> In step 4. in method getSearcher the info is found and the
> closed Searcher object is returned.

Yes, thanks for the bug report! I had already fixed that bug in my code,
but I figured it was just an example anyway, so I didn't mention it. :) On
the other hand, the way it was intended to work is (I think) slightly
different than what you did. In step 3, the searcher should *not* be closed
on release unless it is an old, no longer in use one. Here is the diff of
the fix:

--- IndexAccessControl.java 2002/02/11 20:28:39 1.2
+++ IndexAccessControl.java 2002/04/02 22:40:55 1.3
@@ -137,10 +137,12 @@
String sync = path.getAbsolutePath().intern();
synchronized (sync) // sync on specific index
{
+ boolean old = false;
CheckoutInfo info = (CheckoutInfo)SEARCHER_PATHS.get(path);
if (info == null || searcher != info.searcher) // this isn't
the info we're looking for
{
info = (CheckoutInfo)OLD_SEARCHERS.get(searcher);
+ old = true;
}
if (info != null) // found a searcher
{
@@ -148,7 +150,7 @@
{
info.checkoutCount--;
}
- else // last reference to searcher
+ else if (old)// last reference to old searcher
{
info.searcher.close();
}

> Other:
> a. I changed the IndexAccessControl it's not static any more.
> You can make a new instance for every index.
> b. I created a ManagedSearcher class that is returned by
> getSearcher. If you call close method of this class it
> notifies the IndexAccessControl to release the searcher.
> So the usage:
> IndexAccessControl iac =
> IndexAccessControl.getInstance("/pathToIndex");
> Searcher searcher = iac.getSearcher();
> // use the searcher
> searcher.close();

Yes, that is good.

> If you like this architecture we could improve the code:
> 1. decoupling factory method and access control logic
> 2. decoupling searcher managment and reader/writer managment

Yes, there is still a lot of room for improvement on this. :)

> 3. solve the problem of last used searcher: if the last used
> searcher is released the searcher is closed. Getting searcher
> again takes a lot of time. We should have a pool of searcher
> where the size of pool is 1.

This should not be problem with the corrected releaseSearcher() code above.

> Unfortunatla to use compile and use ManagedSearcher class you
> have to modify lucene source: the Searcher has not public
> abstract methods --> you can't maka subclass of Searcher in
> other package that org.apache.lucene.search.

You're right, Lucene was designed with performance, not extensibility as a
primary goal. So... some modifications to the core will no doubt need to be
made...

Scott

Mailing List Archive

Mailing List Archive

Attached Files: