Mailing List Archive

Efficient document spooling and indexing
Hello,

This is from a thread from about 2 weeks ago.
What is the answer to this question?
If data is written to disk only when IndexWriter's close() is called,
wouldn't the sample code below be as efficient as the sample code that
uses RAMDirectory, further down?

Thanks,
Otis


----
When using the FSWriter, the actual file io doesn't occur until I close
the writer, right? So wouldn't it be just as efficient to do the
following:

IndexWriter fsWriter = new IndexWriter(new File(...), analyzer, false);
while (... more docs to index...)
... add 100,000 docs to fsWriter ...
}
fsWriter.optimize();
fsWriter.close();

-----Original Message-----
From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
Sent: Friday, November 02, 2001 10:47 AM
To: 'Lucene Users List'
Subject: RE: Indexing problem

Well, I don't know if there's an archive of the list, so this what Doug
wrote: "
A more efficient and slightly more complex approach would be to build
large
indexes in RAM, and copy them to disk with IndexWriter.addIndexes:
IndexWriter fsWriter = new IndexWriter(new File(...), analyzer,
true);
while (... more docs to index...)
RAMDirectory ramDir = new RAMDirectory();
IndexWriter ramWriter = new IndexWriter(ramDir, analyzer, true);
... add 100,000 docs to ramWriter ...
ramWriter.optimize();
ramWriter.close();
fsWriter.addIndexes(new Directory[] { ramDir });
}
fsWriter.optimize();
fsWriter.close();
"

Scott

__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Efficient document spooling and indexing [ In reply to ]
Data may not be committed to disk, buffers flushed, files
closed, etc. until IndexWriter.close() is called, but file
IO does happen before then. So I would expect the answer
to your question to be no.


--
Ian.
ian.lea@blackwell.co.uk


Otis Gospodnetic wrote:
>
> Hello,
>
> This is from a thread from about 2 weeks ago.
> What is the answer to this question?
> If data is written to disk only when IndexWriter's close() is called,
> wouldn't the sample code below be as efficient as the sample code that
> uses RAMDirectory, further down?
>
> Thanks,
> Otis
>
> ----
> When using the FSWriter, the actual file io doesn't occur until I close
> the writer, right? So wouldn't it be just as efficient to do the
> following:
>
> IndexWriter fsWriter = new IndexWriter(new File(...), analyzer, false);
> while (... more docs to index...)
> ... add 100,000 docs to fsWriter ...
> }
> fsWriter.optimize();
> fsWriter.close();
>
> -----Original Message-----
> From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> Sent: Friday, November 02, 2001 10:47 AM
> To: 'Lucene Users List'
> Subject: RE: Indexing problem
>
> Well, I don't know if there's an archive of the list, so this what Doug
> wrote: "
> A more efficient and slightly more complex approach would be to build
> large
> indexes in RAM, and copy them to disk with IndexWriter.addIndexes:
> IndexWriter fsWriter = new IndexWriter(new File(...), analyzer,
> true);
> while (... more docs to index...)
> RAMDirectory ramDir = new RAMDirectory();
> IndexWriter ramWriter = new IndexWriter(ramDir, analyzer, true);
> ... add 100,000 docs to ramWriter ...
> ramWriter.optimize();
> ramWriter.close();
> fsWriter.addIndexes(new Directory[] { ramDir });
> }
> fsWriter.optimize();
> fsWriter.close();
> "
>
> Scott

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Efficient document spooling and indexing [ In reply to ]
Yes, I think that you are correct, since I see my index directory
growing as I add documents to the index, even though I don't call
close() until I'm finished adding all documents.
Hm, I wonder what exactly gets written to the disk between add and
close.
I shall rewrite my stuff to use RAMDirectory then. I like efficient
code and hate wasting any kind of resources, not just computing ones.

Thanks,
Otis


--- Ian Lea <ian.lea@blackwell.co.uk> wrote:
> Data may not be committed to disk, buffers flushed, files
> closed, etc. until IndexWriter.close() is called, but file
> IO does happen before then. So I would expect the answer
> to your question to be no.
>
>
> --
> Ian.
> ian.lea@blackwell.co.uk
>
>
> Otis Gospodnetic wrote:
> >
> > Hello,
> >
> > This is from a thread from about 2 weeks ago.
> > What is the answer to this question?
> > If data is written to disk only when IndexWriter's close() is
> called,
> > wouldn't the sample code below be as efficient as the sample code
> that
> > uses RAMDirectory, further down?
> >
> > Thanks,
> > Otis
> >
> > ----
> > When using the FSWriter, the actual file io doesn't occur until I
> close
> > the writer, right? So wouldn't it be just as efficient to do the
> > following:
> >
> > IndexWriter fsWriter = new IndexWriter(new File(...), analyzer,
> false);
> > while (... more docs to index...)
> > ... add 100,000 docs to fsWriter ...
> > }
> > fsWriter.optimize();
> > fsWriter.close();
> >
> > -----Original Message-----
> > From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> > Sent: Friday, November 02, 2001 10:47 AM
> > To: 'Lucene Users List'
> > Subject: RE: Indexing problem
> >
> > Well, I don't know if there's an archive of the list, so this what
> Doug
> > wrote: "
> > A more efficient and slightly more complex approach would be to
> build
> > large
> > indexes in RAM, and copy them to disk with IndexWriter.addIndexes:
> > IndexWriter fsWriter = new IndexWriter(new File(...), analyzer,
> > true);
> > while (... more docs to index...)
> > RAMDirectory ramDir = new RAMDirectory();
> > IndexWriter ramWriter = new IndexWriter(ramDir, analyzer,
> true);
> > ... add 100,000 docs to ramWriter ...
> > ramWriter.optimize();
> > ramWriter.close();
> > fsWriter.addIndexes(new Directory[] { ramDir });
> > }
> > fsWriter.optimize();
> > fsWriter.close();
> > "
> >
> > Scott
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>