Mailing List Archive

RE: Segments not merging on delete
I'm not sure if this is the cause of your problems, but when you're doing
deletions you need to close the reader before you open a writer, otherwise
deletions can be lost. You're claiming that additions are lost, but could
it really be that it is the deletions which have been lost? Try closing the
reader before you open the writer and tell me if that helps.

Doug

> -----Original Message-----
> From: Matt Read [mailto:mread@dircon.co.uk]
> Sent: Wednesday, October 24, 2001 10:35 AM
> To: lucene-user@jakarta.apache.org
> Subject: FW: Segments not merging on delete
>
>
> Hi, I'm having some problems with segments failing to merge
> when deleting
> documents from the index. I've followed the recommendations
> in the FAQ for
> avoiding a complete index rebuild when adding a new document, i.e. I'm
> deleting the document from the index and the re-adding it.
> The index however
> is growing even if I just replace the same document
> repeatedly. And although
> the reader.numDocs() method returns the correct value the
> reader.docFreq()
> for each term is increasing.
>
> I'd appreciate any help. Thanks.
>
> When I index a document, I have this code:
>
> // check if it's already there and if it is, delete it
> fsd = FSDirectory.getDirectory(indexPath, false);
> reader = IndexReader.open(fsd);
> Term t = new Term("original_path", f.getPath());
> if (reader.delete(t) > 0) {
> out.println("This file was already in the index,
> replacing it.<br>");
> }
>
> // add document to index
> writer = new IndexWriter(indexPath, new MyAnalyzer(), false);
> Document doc = new Document();
> doc.add(Field.Keyword("original_path", f.getPath()));
> doc.add(Field.Text("filename", f.getName()));
> doc.add(Field.Text("description", "a description"));
> writer.addDocument(doc);
>
> // clean up
> fsd.close();
> reader.close();
> writer.close();
>
> I then have a class to examine the documents in the index.
> When I first add
> a document it appears correctly, however as I re-add (delete then add
> documents as above), the reader.isDeleted() flag remains set
> for each of the
> re-added documents which would be ok assuming segments have
> not been merged
> yet but the re-added documents do not appear anywhere in the
> reader.documents(i) collection. The code to examine is as follows:
>
> fsd = FSDirectory.getDirectory(indexPath, false);
> reader = IndexReader.open(fsd);
>
> // show document details
> for (int i = 0; i < reader.numDocs(); i ++) {
> if (!reader.isDeleted(i)) {
> Document d = reader.document(i);
> for (Enumeration e = d.fields();
> e.hasMoreElements() ;) {
> Field f = (Field) e.nextElement();
> out.println(f.toString() + ": " +
> f.name() + "=" + f.stringValue() +
> "<br>");
> }
> }
> }
>
> // clean up
> fsd.close();
> reader.close();
> writer.close();
>
RE: Segments not merging on delete [ In reply to ]
Thanks for your response.

I tried closing the reader before opening the writer with no change, the
index still grows and doesn't merge.

I think an important point here might be that I'm adding/re-adding documents
in bulk, i.e. I iterate through files in a directory adding each document in
turn according to the code in my previous e-mail. I tried adding
reader.optimize() after writer.addDocument() with no effect however if I
open a new reader after all the documents have been added and then call
.optimize() as below the index merges correctly. I.e. the following code is
called after all documents have been added. Why does .optimize() work in
this location but not when I optmize after adding each document?

// optimize the index
FSDirectory fsd = FSDirectory.getDirectory(indexPath, false);
IndexWriter writer = new IndexWriter(fsd, new MyAnalyzer(), false);
writer.optimize();
writer.close();

Thanks,
Matt.

-----Original Message-----
From: Doug Cutting [mailto:DCutting@grandcentral.com]
Sent: 25 October 2001 15:45
To: 'Matt Read'; lucene-user@jakarta.apache.org
Subject: RE: Segments not merging on delete


I'm not sure if this is the cause of your problems, but when you're doing
deletions you need to close the reader before you open a writer, otherwise
deletions can be lost. You're claiming that additions are lost, but could
it really be that it is the deletions which have been lost? Try closing the
reader before you open the writer and tell me if that helps.

Doug

> -----Original Message-----
> From: Matt Read [mailto:mread@dircon.co.uk]
> Sent: Wednesday, October 24, 2001 10:35 AM
> To: lucene-user@jakarta.apache.org
> Subject: FW: Segments not merging on delete
>
>
> Hi, I'm having some problems with segments failing to merge
> when deleting
> documents from the index. I've followed the recommendations
> in the FAQ for
> avoiding a complete index rebuild when adding a new document, i.e. I'm
> deleting the document from the index and the re-adding it.
> The index however
> is growing even if I just replace the same document
> repeatedly. And although
> the reader.numDocs() method returns the correct value the
> reader.docFreq()
> for each term is increasing.
>
> I'd appreciate any help. Thanks.
>
> When I index a document, I have this code:
>
> // check if it's already there and if it is, delete it
> fsd = FSDirectory.getDirectory(indexPath, false);
> reader = IndexReader.open(fsd);
> Term t = new Term("original_path", f.getPath());
> if (reader.delete(t) > 0) {
> out.println("This file was already in the index,
> replacing it.<br>");
> }
>
> // add document to index
> writer = new IndexWriter(indexPath, new MyAnalyzer(), false);
> Document doc = new Document();
> doc.add(Field.Keyword("original_path", f.getPath()));
> doc.add(Field.Text("filename", f.getName()));
> doc.add(Field.Text("description", "a description"));
> writer.addDocument(doc);
>
> // clean up
> fsd.close();
> reader.close();
> writer.close();
>
> I then have a class to examine the documents in the index.
> When I first add
> a document it appears correctly, however as I re-add (delete then add
> documents as above), the reader.isDeleted() flag remains set
> for each of the
> re-added documents which would be ok assuming segments have
> not been merged
> yet but the re-added documents do not appear anywhere in the
> reader.documents(i) collection. The code to examine is as follows:
>
> fsd = FSDirectory.getDirectory(indexPath, false);
> reader = IndexReader.open(fsd);
>
> // show document details
> for (int i = 0; i < reader.numDocs(); i ++) {
> if (!reader.isDeleted(i)) {
> Document d = reader.document(i);
> for (Enumeration e = d.fields();
> e.hasMoreElements() ;) {
> Field f = (Field) e.nextElement();
> out.println(f.toString() + ": " +
> f.name() + "=" + f.stringValue() +
> "<br>");
> }
> }
> }
>
> // clean up
> fsd.close();
> reader.close();
> writer.close();
>
RE: Segments not merging on delete [ In reply to ]
Matt,
I'm also having this problem. In my application I may get an update for a
document in the index that needs to be added. I delete the existing
document, add the new document with the changes, and if you try to search
for it, you can't find it. *Sometimes* if you get another document that
needs changing and add it, the previously changed document will then show
up, but not the one you just added. This makes me think that it's an open
file problem. I've check all of my code, and that doesn't appear to be the
case. I also did the "open and optimize" trick, and it makes all updates
available for me as well. However, I may get a group of twenty or so
updates in one group (although my program can't see they are in a group, it
only sees the individual "add document" events) which means that I must
optimize after each addition to the index. This gets expensive. I'm
continuing to investigate this, and would appreciate you passing on anything
you find.


Cheers,

Tom Eskridge


> -----Original Message-----
> From: Matt Read [mailto:mread@dircon.co.uk]
> Sent: Thursday, October 25, 2001 12:04 PM
> To: lucene-user@jakarta.apache.org
> Subject: RE: Segments not merging on delete
>
>
> Thanks for your response.
>
> I tried closing the reader before opening the writer with no change, the
> index still grows and doesn't merge.
>
> I think an important point here might be that I'm
> adding/re-adding documents
> in bulk, i.e. I iterate through files in a directory adding each
> document in
> turn according to the code in my previous e-mail. I tried adding
> reader.optimize() after writer.addDocument() with no effect however if I
> open a new reader after all the documents have been added and then call
> .optimize() as below the index merges correctly. I.e. the
> following code is
> called after all documents have been added. Why does .optimize() work in
> this location but not when I optmize after adding each document?
>
> // optimize the index
> FSDirectory fsd = FSDirectory.getDirectory(indexPath, false);
> IndexWriter writer = new IndexWriter(fsd, new MyAnalyzer(), false);
> writer.optimize();
> writer.close();
>
> Thanks,
> Matt.
>
> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: 25 October 2001 15:45
> To: 'Matt Read'; lucene-user@jakarta.apache.org
> Subject: RE: Segments not merging on delete
>
>
> I'm not sure if this is the cause of your problems, but when you're doing
> deletions you need to close the reader before you open a writer, otherwise
> deletions can be lost. You're claiming that additions are lost, but could
> it really be that it is the deletions which have been lost? Try
> closing the
> reader before you open the writer and tell me if that helps.
>
> Doug
>
> > -----Original Message-----
> > From: Matt Read [mailto:mread@dircon.co.uk]
> > Sent: Wednesday, October 24, 2001 10:35 AM
> > To: lucene-user@jakarta.apache.org
> > Subject: FW: Segments not merging on delete
> >
> >
> > Hi, I'm having some problems with segments failing to merge
> > when deleting
> > documents from the index. I've followed the recommendations
> > in the FAQ for
> > avoiding a complete index rebuild when adding a new document, i.e. I'm
> > deleting the document from the index and the re-adding it.
> > The index however
> > is growing even if I just replace the same document
> > repeatedly. And although
> > the reader.numDocs() method returns the correct value the
> > reader.docFreq()
> > for each term is increasing.
> >
> > I'd appreciate any help. Thanks.
> >
> > When I index a document, I have this code:
> >
> > // check if it's already there and if it is, delete it
> > fsd = FSDirectory.getDirectory(indexPath, false);
> > reader = IndexReader.open(fsd);
> > Term t = new Term("original_path", f.getPath());
> > if (reader.delete(t) > 0) {
> > out.println("This file was already in the index,
> > replacing it.<br>");
> > }
> >
> > // add document to index
> > writer = new IndexWriter(indexPath, new MyAnalyzer(), false);
> > Document doc = new Document();
> > doc.add(Field.Keyword("original_path", f.getPath()));
> > doc.add(Field.Text("filename", f.getName()));
> > doc.add(Field.Text("description", "a description"));
> > writer.addDocument(doc);
> >
> > // clean up
> > fsd.close();
> > reader.close();
> > writer.close();
> >
> > I then have a class to examine the documents in the index.
> > When I first add
> > a document it appears correctly, however as I re-add (delete then add
> > documents as above), the reader.isDeleted() flag remains set
> > for each of the
> > re-added documents which would be ok assuming segments have
> > not been merged
> > yet but the re-added documents do not appear anywhere in the
> > reader.documents(i) collection. The code to examine is as follows:
> >
> > fsd = FSDirectory.getDirectory(indexPath, false);
> > reader = IndexReader.open(fsd);
> >
> > // show document details
> > for (int i = 0; i < reader.numDocs(); i ++) {
> > if (!reader.isDeleted(i)) {
> > Document d = reader.document(i);
> > for (Enumeration e = d.fields();
> > e.hasMoreElements() ;) {
> > Field f = (Field) e.nextElement();
> > out.println(f.toString() + ": " +
> > f.name() + "=" + f.stringValue() +
> > "<br>");
> > }
> > }
> > }
> >
> > // clean up
> > fsd.close();
> > reader.close();
> > writer.close();
> >
>
RE: Segments not merging on delete [ In reply to ]
Sounds very similar to my problem, however I've found adequate work arounds
for my system as follows, I don't know whether it'll help you out at all.
Obviously a better solution would be to have more control or at least
information about how Lucene decides to schedule segment merges and the
behaviour of deleted documents. Anyone able to do this?

When I do bulk "delete/adds", (i.e. when I'm iterating through files in a
filesystem directory and replacing existing index entries with updated
ones), at the end of all the delete/adds I ensure all readers and writers
are closed and open a new writer and then writer.optimize(). This always has
the effect of merging all segments of the index - however it didn't work
when I placed it after each .addDocument() call, possibly because I would
need to close and reopen the writer, but I never tried that.

When I'm updating individual entries one at a time my experience differs
from yours. I call writer.optimize() immediately after the call to
writer.addDocument() and it appears to correctly merge segments every time.
I have no idea why this would work for individual delete/adds but not when
doing them in bulk and I'd imagine I'll have problems in a multi-user
environment.

Matt.

-----Original Message-----
From: Tom Eskridge [mailto:teskridge@ai.uwf.edu]
Sent: 26 October 2001 21:33
To: Matt Read; lucene-user@jakarta.apache.org
Subject: RE: Segments not merging on delete


Matt,
I'm also having this problem. In my application I may get an update for a
document in the index that needs to be added. I delete the existing
document, add the new document with the changes, and if you try to search
for it, you can't find it. *Sometimes* if you get another document that
needs changing and add it, the previously changed document will then show
up, but not the one you just added. This makes me think that it's an open
file problem. I've check all of my code, and that doesn't appear to be the
case. I also did the "open and optimize" trick, and it makes all updates
available for me as well. However, I may get a group of twenty or so
updates in one group (although my program can't see they are in a group, it
only sees the individual "add document" events) which means that I must
optimize after each addition to the index. This gets expensive. I'm
continuing to investigate this, and would appreciate you passing on anything
you find.


Cheers,

Tom Eskridge


> -----Original Message-----
> From: Matt Read [mailto:mread@dircon.co.uk]
> Sent: Thursday, October 25, 2001 12:04 PM
> To: lucene-user@jakarta.apache.org
> Subject: RE: Segments not merging on delete
>
>
> Thanks for your response.
>
> I tried closing the reader before opening the writer with no change, the
> index still grows and doesn't merge.
>
> I think an important point here might be that I'm
> adding/re-adding documents
> in bulk, i.e. I iterate through files in a directory adding each
> document in
> turn according to the code in my previous e-mail. I tried adding
> reader.optimize() after writer.addDocument() with no effect however if I
> open a new reader after all the documents have been added and then call
> .optimize() as below the index merges correctly. I.e. the
> following code is
> called after all documents have been added. Why does .optimize() work in
> this location but not when I optmize after adding each document?
>
> // optimize the index
> FSDirectory fsd = FSDirectory.getDirectory(indexPath, false);
> IndexWriter writer = new IndexWriter(fsd, new MyAnalyzer(), false);
> writer.optimize();
> writer.close();
>
> Thanks,
> Matt.
>
> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: 25 October 2001 15:45
> To: 'Matt Read'; lucene-user@jakarta.apache.org
> Subject: RE: Segments not merging on delete
>
>
> I'm not sure if this is the cause of your problems, but when you're doing
> deletions you need to close the reader before you open a writer, otherwise
> deletions can be lost. You're claiming that additions are lost, but could
> it really be that it is the deletions which have been lost? Try
> closing the
> reader before you open the writer and tell me if that helps.
>
> Doug
>
> > -----Original Message-----
> > From: Matt Read [mailto:mread@dircon.co.uk]
> > Sent: Wednesday, October 24, 2001 10:35 AM
> > To: lucene-user@jakarta.apache.org
> > Subject: FW: Segments not merging on delete
> >
> >
> > Hi, I'm having some problems with segments failing to merge
> > when deleting
> > documents from the index. I've followed the recommendations
> > in the FAQ for
> > avoiding a complete index rebuild when adding a new document, i.e. I'm
> > deleting the document from the index and the re-adding it.
> > The index however
> > is growing even if I just replace the same document
> > repeatedly. And although
> > the reader.numDocs() method returns the correct value the
> > reader.docFreq()
> > for each term is increasing.
> >
> > I'd appreciate any help. Thanks.
> >
> > When I index a document, I have this code:
> >
> > // check if it's already there and if it is, delete it
> > fsd = FSDirectory.getDirectory(indexPath, false);
> > reader = IndexReader.open(fsd);
> > Term t = new Term("original_path", f.getPath());
> > if (reader.delete(t) > 0) {
> > out.println("This file was already in the index,
> > replacing it.<br>");
> > }
> >
> > // add document to index
> > writer = new IndexWriter(indexPath, new MyAnalyzer(), false);
> > Document doc = new Document();
> > doc.add(Field.Keyword("original_path", f.getPath()));
> > doc.add(Field.Text("filename", f.getName()));
> > doc.add(Field.Text("description", "a description"));
> > writer.addDocument(doc);
> >
> > // clean up
> > fsd.close();
> > reader.close();
> > writer.close();
> >
> > I then have a class to examine the documents in the index.
> > When I first add
> > a document it appears correctly, however as I re-add (delete then add
> > documents as above), the reader.isDeleted() flag remains set
> > for each of the
> > re-added documents which would be ok assuming segments have
> > not been merged
> > yet but the re-added documents do not appear anywhere in the
> > reader.documents(i) collection. The code to examine is as follows:
> >
> > fsd = FSDirectory.getDirectory(indexPath, false);
> > reader = IndexReader.open(fsd);
> >
> > // show document details
> > for (int i = 0; i < reader.numDocs(); i ++) {
> > if (!reader.isDeleted(i)) {
> > Document d = reader.document(i);
> > for (Enumeration e = d.fields();
> > e.hasMoreElements() ;) {
> > Field f = (Field) e.nextElement();
> > out.println(f.toString() + ": " +
> > f.name() + "=" + f.stringValue() +
> > "<br>");
> > }
> > }
> > }
> >
> > // clean up
> > fsd.close();
> > reader.close();
> > writer.close();
> >
>