Mailing List Archive

FW: Manually creating a RAMDirectory from pre-existing index files.
**Apologies for sending this to wrong list (for those of you on both), I
entered the mail archive from the rear-end and didn't realize there was
a user list as well as a developer list...again...sorry :)

-----Original Message-----
From: Josh Guice [mailto:Joshua.Guice@nssl.noaa.gov]
Sent: Friday, April 05, 2002 1:01 PM
To: 'lucene-dev@jakarta.apache.org'
Subject: Manually creating a RAMDirectory from pre-existing index files.

(Using version 1.2rc4 of canned Lucene)

I am working on a document navigation/browsing applet and want to
implement a client-side search that is completely independent of any
server mechanism, yet is still fast (i.e. not a "crawler"). It will be
LAN-based, so moving large files is not an issue.

Currently I create an index of the files to be searched in the
traditional Lucene manner and store them on a web server. From the
applet I open each of these pre-existing files and do a byte->byte copy
to a RAMDirectory (RD) in the client applet's memory. I am then able to
access the files from the RD in the applet, list their sizes (which
match up with the originals) and read back their contents (which also
match up with the origs, byte-for-byte). However, when I attempt to
create an IndexSearcher by passing my RAMDirectory, I get the following
Exception:

Error creating IndexSearcher!

java.lang.ArrayIndexOutOfBoundsException: 116 >= 7
at java.util.Vector.elementAt(Unknown Source)
at org.apache.lucene.index.FieldInfos.fieldInfo(Unknown Source)
at org.apache.lucene.index.FieldInfos.fieldName(Unknown Source)
at org.apache.lucene.index.SegmentTermEnum.readTerm(Unknown
Source)
at org.apache.lucene.index.SegmentTermEnum.next(Unknown Source)
at org.apache.lucene.index.TermInfosReader.readIndex(Unknown
Source)
at org.apache.lucene.index.TermInfosReader.<init>(Unknown
Source)
at org.apache.lucene.index.SegmentReader.<init>(Unknown Source)
at org.apache.lucene.index.SegmentReader.<init>(Unknown Source)
at org.apache.lucene.index.IndexReader$1.doBody(Unknown Source)
at org.apache.lucene.store.Lock$With.run(Unknown Source)
at org.apache.lucene.index.IndexReader.open(Unknown Source)
at org.apache.lucene.search.IndexSearcher.<init>(Unknown Source)
at Search.finishedLoad(Search.java:200)
at fileLoader.run(fileLoader.java:98)

I can successfully pass the physical index directory to an IndexReader
and everything is fine, so I know the index files are ok. Here is the
code I use to get and copy each file to the client applets memory:

public void addItem(String dataLine) // dataLine is the real
filename // of an
index component
{
// Get file
try
{
InputStreamReader fileIn;
org.apache.lucene.store.OutputStream outFile;

URL url;
URLConnection urlConn;
int EOF;

// Process file
url = new URL(
_applet.getCodeBase().toString()
+ "index/" + dataLine);
urlConn = url.openConnection();
urlConn.setDoInput(true);
urlConn.setUseCaches(false);

fileIn = new
InputStreamReader(urlConn.getInputStream());

System.out.println("File: " + dataLine +
", Length: "
+ urlConn.getContentLength());

// indices is a pre-defined RAMDirectory
outFile = indices.createFile(dataLine);

EOF = fileIn.read();

while (EOF != -1)
{
outFile.writeByte((byte)EOF);
EOF = fileIn.read();
}

outFile.close();
}
catch (MalformedURLException mue)
{
System.out.println("!!Error Bad data file link "
+
_applet.getCodeBase().toString() + "/" +
dataLine);
}
catch (IOException ioe)
{
System.out.println("!!Error Reading " +
_applet.getCodeBase().toString() + "/" +
dataLine);
}
}

And here is the code I use to create the IndexSearcher:

indices.close();

try
{
searcher = new IndexSearcher(indices);
}
catch (Exception ioe)
{
System.out.println("Error creating
IndexSearcher!");
ioe.printStackTrace();
}

This is when the exception gets thrown...

Any ideas on why this wouldn't be working?

Thanks,
Josh


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Manually creating a RAMDirectory from pre-existing index files. [ In reply to ]
I'll be answering my own question here...sort of:

Apparently all my back-tracking through code has put me in a mode to
approach *other* things backwards as well. After just discovering the
archive search engine, I located this:

______________________________________________________________________

//Variable to store the names of the indexfiles in
Stack stackFileNames = new Stack();

//open the file with the filenames
//you need this, because you cannot search a directory for files
an
an URL
URL source = new URL(getCodeBase(), "filenames.txt");
BufferedReader in = new BufferedReader(new
InputStreamReader(source.openStream()));
while(true)
{
String s = in.readLine();
if(s==null)
break;
stackFileNames.push(s);
}
in.close();

//open the index
while(!stackFileNames.empty())
{
String sFileName = (String)stackFileNames.pop();
//streaming the file to the RAMDirectory
org.apache.lucene.store.OutputStream os =
ramDir.createFile(sFileName);
java.io.InputStream is = new URL(getCodeBase(),
"repository/"+sFileName).openStream();

byte[] baBuffer = new byte[1024];
while(true)
{
synchronized(baBuffer)
{
int iBytesRead = is.read(baBuffer);
if (iBytesRead == -1)
break;
os.writeBytes(baBuffer, iBytesRead);
}
}
is.close();
os.close();
}

//IndexReader
IndexReader ir = IndexReader.open(ramDir);
With this I can perform searchen in the Applet. Work´s fine.


Have fun,

Christoph Breidert
www.sitewaerts.de

______________________________________________________________________

This code works perfectly fine and appears to only be different from
mine in that it uses InputStream instead of an InputStreamReader and a
byte buffer as opposed to single bytes...?...offhand I don't see why it
works and my attempt failed, but chasing rabbits is a leisure-time
activity :)

Thanks to Christoph and anyone else who was considering my problem...

Josh

> -----Original Message-----
> From: Josh Guice [mailto:Joshua.Guice@nssl.noaa.gov]
> Sent: Friday, April 05, 2002 1:09 PM
> To: lucene-user@jakarta.apache.org
> Subject: FW: Manually creating a RAMDirectory from pre-existing index
> files.
>
> **Apologies for sending this to wrong list (for those of you on both),
I
> entered the mail archive from the rear-end and didn't realize there
was
> a user list as well as a developer list...again...sorry :)
>
> -----Original Message-----
> From: Josh Guice [mailto:Joshua.Guice@nssl.noaa.gov]
> Sent: Friday, April 05, 2002 1:01 PM
> To: 'lucene-dev@jakarta.apache.org'
> Subject: Manually creating a RAMDirectory from pre-existing index
files.
>
> (Using version 1.2rc4 of canned Lucene)
>
> I am working on a document navigation/browsing applet and want to
> implement a client-side search that is completely independent of any
> server mechanism, yet is still fast (i.e. not a "crawler"). It will
be
> LAN-based, so moving large files is not an issue.
>
> Currently I create an index of the files to be searched in the
> traditional Lucene manner and store them on a web server. From the
> applet I open each of these pre-existing files and do a byte->byte
copy
> to a RAMDirectory (RD) in the client applet's memory. I am then able
to
> access the files from the RD in the applet, list their sizes (which
> match up with the originals) and read back their contents (which also
> match up with the origs, byte-for-byte). However, when I attempt to
> create an IndexSearcher by passing my RAMDirectory, I get the
following
> Exception:
>
> Error creating IndexSearcher!
>
> java.lang.ArrayIndexOutOfBoundsException: 116 >= 7
> at java.util.Vector.elementAt(Unknown Source)
> at org.apache.lucene.index.FieldInfos.fieldInfo(Unknown Source)
> at org.apache.lucene.index.FieldInfos.fieldName(Unknown Source)
> at org.apache.lucene.index.SegmentTermEnum.readTerm(Unknown
> Source)
> at org.apache.lucene.index.SegmentTermEnum.next(Unknown Source)
> at org.apache.lucene.index.TermInfosReader.readIndex(Unknown
> Source)
> at org.apache.lucene.index.TermInfosReader.<init>(Unknown
> Source)
> at org.apache.lucene.index.SegmentReader.<init>(Unknown Source)
> at org.apache.lucene.index.SegmentReader.<init>(Unknown Source)
> at org.apache.lucene.index.IndexReader$1.doBody(Unknown Source)
> at org.apache.lucene.store.Lock$With.run(Unknown Source)
> at org.apache.lucene.index.IndexReader.open(Unknown Source)
> at org.apache.lucene.search.IndexSearcher.<init>(Unknown Source)
> at Search.finishedLoad(Search.java:200)
> at fileLoader.run(fileLoader.java:98)
>
> I can successfully pass the physical index directory to an IndexReader
> and everything is fine, so I know the index files are ok. Here is the
> code I use to get and copy each file to the client applets memory:
>
> public void addItem(String dataLine) // dataLine is the real
> filename
// of an
> index component
> {
> // Get file
> try
> {
> InputStreamReader fileIn;
> org.apache.lucene.store.OutputStream outFile;
>
> URL url;
> URLConnection urlConn;
> int EOF;
>
> // Process file
> url = new URL(
> _applet.getCodeBase().toString()
> + "index/" + dataLine);
> urlConn = url.openConnection();
> urlConn.setDoInput(true);
> urlConn.setUseCaches(false);
>
> fileIn = new
> InputStreamReader(urlConn.getInputStream());
>
> System.out.println("File: " + dataLine +
> ", Length: "
> + urlConn.getContentLength());
>
> // indices is a pre-defined RAMDirectory
> outFile = indices.createFile(dataLine);
>
> EOF = fileIn.read();
>
> while (EOF != -1)
> {
> outFile.writeByte((byte)EOF);
> EOF = fileIn.read();
> }
>
> outFile.close();
> }
> catch (MalformedURLException mue)
> {
> System.out.println("!!Error Bad data file link "
> +
> _applet.getCodeBase().toString() + "/" +
> dataLine);
> }
> catch (IOException ioe)
> {
> System.out.println("!!Error Reading " +
> _applet.getCodeBase().toString() + "/" +
> dataLine);
> }
> }
>
> And here is the code I use to create the IndexSearcher:
>
> indices.close();
>
> try
> {
> searcher = new IndexSearcher(indices);
> }
> catch (Exception ioe)
> {
> System.out.println("Error creating
> IndexSearcher!");
> ioe.printStackTrace();
> }
>
> This is when the exception gets thrown...
>
> Any ideas on why this wouldn't be working?
>
> Thanks,
> Josh
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-
> help@jakarta.apache.org>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>