Mailing List Archive

lucene & avalon (was: Proposal for Lucene / new component)
Hello,
more than 1 month ago I promissed to write an avalon example application.
Now in my project I need some avalon components so I "avalonized" lucene. I
published the package as a zip file:
www.extra.hu/halacsyp/lucelon.zip

The main idea is to make two manager component one for Searches and one for
Writers. This is something similar to DataSource/DriverManager and
Connection.
Interface of two main components:
public interface SearcherManager extends Component {
public Searcher getSearcher();
}

public interface IndexWriterManager extends Component {
public IndexWriter getWriter(boolean create);
}

You can configure:
1. exactly which implementing Manager class to use (I implemented two
SearcherManager: IndexSearcher and MultiSearcherManager and one
IndexWriterManager);
2. in my implementation you can configure which directory to use and for
writer mergeFactor, maxDocs ...

I rewrited to demo files: SearchFiles and IndexFiles to use my components.
You can compile and try it.

In my project I have two indexes in two different (filesystem) directory. I
have three Searchers:
1. one for directory I.
2. one for directory II
3. MultiSearcher

To configure this I have to write a config file:
<components logger="core">
<directories>
<filesystem
name="topics"><path>c://temp/index/topics</path></filesystem>
<filesystem
name="messages"><path>c://temp/index/messages</path></filesystem>
</directories>
<analyzers>
<!-- standard analyzer of lucene -->
<standard name="standard"> <stopwords>
<w>the</w>
<w>a</w>
<w>this</w>
<w>that</w>
<w>an</w>
<w>or</w>
</stopwords></standard>
</analyzers>
<searchers>
<directory-searcher name="topics">
<directory>topics</directory>
</directory-searcher>
<directory-searcher name="messages">
<directory>messages</directory>
</directory-searcher>
<multi-searcher name="multi">
<searcher>topics</searcher>
<searcher>messages</searcher>
</multi-searcher>
</searchers>

<writers>
<directory-writer name="topics">
<directory>topics</directory>
<analyzer>standard</analyzer>
</directory-writer>
<directory-writer name="messages">
<mergeFactor>20</mergeFactor>
<directory>topics</directory>
<analyzer>standard</analyzer>
</directory-writer>

</writers>

</component>

Why is it good for me:
1. because I can hide the implementation details from the application
developer
2. I can confugre the system via config files
3. my logging system is ready to use (provided by apache logkit)
4. I can change the component's implementation without modification the code
(I'll change the analyzer because the standard lucene analyzer can't work
with ISO-8859-2 characters [I'll check it tomorrow])

I have to work on a better SearcherManager. We know that several thread can
reuse the same IndexReader but it should be closed and reopened when the
directory is modified. My problem is: i
Thread-1 gets an searcher and Thread-2 gets an other searcher; the two
Searcher uses the same IndexReader. Thread-1 has finished it's work and
close it. The Searcher will close the IndexReader that is used by Thread-2.
I think I've to implement something similar to (SQL) connection cache.

Thread 1 uses Searcher that uses an instance of CachedIndexReader. If
Thread-1 closes the cachedIndexReader it doesn't close the physical
IndexReader only notify the cache that it's close method was called.

Notice that we don't need to change the SearcherManager interface so I can
plug in new implementation (to be honest this kind of Manager classes could
be used without avalon: this is simply a use of abtract factory design
pattern)

Somethind other:
how about an IndexWriter called BatchIndexWriter that uses a RAMDirectory
to buffer documents to add to the index:
// sketch
public void addDocument(Docuement d) {
count++;
ramWriter.addDocument(d);
if(count > aLimit) {
realWriter.addDirectory(ramWriter.getDirectory());
ramWriter = new IndexWriter(new RAMDirectory());
count = 0;
}
}

of course value of limit could be configured

peter

ps: good tutorial:
http://jakarta.apache.org/avalon/developing/introduction.html)

-------------------------------------------

RE: Proposal for Lucene / new component
From: Andrew C. Oliver
Subject: RE: Proposal for Lucene / new component
Date: Sun, 03 Mar 2002 11:48:27 -0800

> I think if you need logging, configuring, threading, pooling (for the
crawler) and
>want to be component based you need a framework some thing like avalon. It
took one
>day to understand Avalon and write the first Hello world application but
you can save
>a lot of time while coding.
>

Great! Can you post your work to get the Hello Avalon App somewhere?
If you could document along those lines as well then I'll be happy to go
and write a "getting started" guide for Avalon.

I'm not objecting to using Avalon provided I can actually understand
it. I'm really close thanks to the fine work of Ken Barrozzi
(http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/poi/cocoon-poi/), but
I'm one step away from actually being about to start using Avalon. Its
not a "I won't" its an "I can't" issue.


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>