Mailing List Archive

Request For Comments : Deleting docs & efficency
Hello All,
I'm currently having problems designing my application
with respect to synchronising the deletion of documents from
an index which is also being indexing and queried.

I am using the suggested procedure for safely indexing and
searching concurrently of recreating the IndexSearcher when the
index has been modified by the IndexWriter (new segments having
been flushed to the filesystem).

Now I am considering to use separate documents (one for each
deleted document) with two fields key=<some common key> and
deleted="true"> for marking deleted documents. As this can be
done using only an IndexWriter it relieves the application of
closing/opening the IndexWriter/IndexSearchers on every delete.

The index is then searched twice, once for documents with
matching data fields (A) and a second time for deleted documents
with "deleted:true" (B). My result set would then be documents
A minus B. As the index has a limited lifetime the extra
storage required is not a major issue. Also "out-of-hours"
maintainance could be performed if necessary.

A *possible* optimization would be in-memory cache of all deleted
documents ID's. The question is how persistant are these document
ID's? If I never delete from the index, can I assume that are
persistant, even across app. restarts? If so then the BitSet
representing the ID's of the deleted document could be used by a
HitCollector to knock out the deleted documents from a query. The
deleted ID's would then need to be saved somewhere to persist
across application shutdowns.

Any comments/ideas/shoot-downs welcome!

Nick

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>