Mailing List Archive

What exactly returns IndexReader.numDeletedDocs()
Hi

I am using Lucen 9.4.2 vector search and everything seems to work fine,
except that when I delete some documents from the index, then the method

https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs()

always returns 0, whereas I would have expected that it would return the
number of documents which I deleted from the index.

IndexReader.numDocs() returns the correct number though.

I guess I misunderstand the javadoc and in particular the note "*NOTE*:
This operation may run in O(maxDoc)."

Does somebody explain in more detail what this method is doing?

Thanks

Michael
Re: What exactly returns IndexReader.numDeletedDocs() [ In reply to ]
Did you call this method before or after commit method?
My wild guess would be that you can count deleted documents inside
transaction only.

On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner <michael.wechner@wyona.com>
wrote:

> Hi
>
> I am using Lucen 9.4.2 vector search and everything seems to work fine,
> except that when I delete some documents from the index, then the method
>
>
> https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs()
>
> always returns 0, whereas I would have expected that it would return the
> number of documents which I deleted from the index.
>
> IndexReader.numDocs() returns the correct number though.
>
> I guess I misunderstand the javadoc and in particular the note "*NOTE*:
> This operation may run in O(maxDoc)."
>
> Does somebody explain in more detail what this method is doing?
>
> Thanks
>
> Michael



--
*{{ **Horvoje.net <https://horvoje.net/> ~~ **VegCook.net
<https://vegcook.net/>* *~~* *TheVegCat.com
<https://thevegcat.com:9999/> ~~ **Cuspajz.com <https://cuspajz.com/>
~~ VintageZagreb.net <https://vintagezagreb.net/> ~~ **Sterilizacija.org
<https://sterilizacija.org/> **~~* *SmijSe.com <https://smijse.com/>
~~ **HTMLutil.net
<https://htmlutil.net/> ~~ HTTPinfo.net <https://httpinfo.net/> }}*
Re: What exactly returns IndexReader.numDeletedDocs() [ In reply to ]
IIRC, it's the number of documents marked with a "deleted" bit. They are
obliterated during merges as segments written during the merge operation no
longer include deleted contents. So eg. if you call forceMerge(1), no
previous segment is preserved and the deleted count will drop to 0 as a
result.

Regards,
AndrĂ¡s

On Thu, Dec 8, 2022 at 10:33 AM Hrvoje Lon?ar <horvoje@gmail.com> wrote:

> Did you call this method before or after commit method?
> My wild guess would be that you can count deleted documents inside
> transaction only.
>
> On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner <michael.wechner@wyona.com
> >
> wrote:
>
> > Hi
> >
> > I am using Lucen 9.4.2 vector search and everything seems to work fine,
> > except that when I delete some documents from the index, then the method
> >
> >
> >
> https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs()
> >
> > always returns 0, whereas I would have expected that it would return the
> > number of documents which I deleted from the index.
> >
> > IndexReader.numDocs() returns the correct number though.
> >
> > I guess I misunderstand the javadoc and in particular the note "*NOTE*:
> > This operation may run in O(maxDoc)."
> >
> > Does somebody explain in more detail what this method is doing?
> >
> > Thanks
> >
> > Michael
>
>
>
> --
> *{{ **Horvoje.net <https://horvoje.net/> ~~ **VegCook.net
> <https://vegcook.net/>* *~~* *TheVegCat.com
> <https://thevegcat.com:9999/> ~~ **Cuspajz.com <https://cuspajz.com/>
> ~~ VintageZagreb.net <https://vintagezagreb.net/> ~~ **Sterilizacija.org
> <https://sterilizacija.org/> **~~* *SmijSe.com <https://smijse.com/>
> ~~ **HTMLutil.net
> <https://htmlutil.net/> ~~ HTTPinfo.net <https://httpinfo.net/> }}*
>
Re: What exactly returns IndexReader.numDeletedDocs() [ In reply to ]
You have to reopen the index reader to see deletes from the indexwriter.

Am 08.12.2022 um 10:32 schrieb Hrvoje Lon?ar:
> Did you call this method before or after commit method?
> My wild guess would be that you can count deleted documents inside
> transaction only.
>
> On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner <michael.wechner@wyona.com>
> wrote:
>
>> Hi
>>
>> I am using Lucen 9.4.2 vector search and everything seems to work fine,
>> except that when I delete some documents from the index, then the method
>>
>>
>> https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs()
>>
>> always returns 0, whereas I would have expected that it would return the
>> number of documents which I deleted from the index.
>>
>> IndexReader.numDocs() returns the correct number though.
>>
>> I guess I misunderstand the javadoc and in particular the note "*NOTE*:
>> This operation may run in O(maxDoc)."
>>
>> Does somebody explain in more detail what this method is doing?
>>
>> Thanks
>>
>> Michael
>
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: What exactly returns IndexReader.numDeletedDocs() [ In reply to ]
My code at the moment is as follows:

Directory dir = FSDirectory.open(Paths.get(vectorIndexPath));

IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));
int numberOfDocsBeforeDeleting = reader.numDocs();
log.info("Number of documents: " + numberOfDocsBeforeDeleting);
log.info("Number of deleted documents: " + reader.numDeletedDocs());
reader.close();

log.info("Delete document with path '" + uuid +"' from index '" + vectorIndexPath +"' ...");
IndexWriterConfig iwc =new IndexWriterConfig();IndexWriter writer =new IndexWriter(dir, iwc);
Term term =new Term(PATH_FIELD, uuid);
writer.deleteDocuments(term);writer.close();

reader = DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));
int numberOfDocsAfterDeleting = reader.numDocs();
log.info("Number of documents: " + numberOfDocsAfterDeleting);
log.info("Number of deleted documents: " + (numberOfDocsBeforeDeleting - numberOfDocsAfterDeleting));
// TODO: Not sure whether the method numDeletedDocs() makes sense here
log.info("Number of deleted documents: " + reader.numDeletedDocs()); reader.close();


whereas this code always returns 0, whereas

numberOfDocsBeforeDeleting - numberOfDocsAfterDeleting

produces the correct result.

Should I open the reader before closing the writer?

Thanks

Michael



Am 08.12.22 um 11:36 schrieb Uwe Schindler:
> You have to reopen the index reader to see deletes from the indexwriter.
>
> Am 08.12.2022 um 10:32 schrieb Hrvoje Lon?ar:
>> Did you call this method before or after commit method?
>> My wild guess would be that you can count deleted documents inside
>> transaction only.
>>
>> On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner
>> <michael.wechner@wyona.com>
>> wrote:
>>
>>> Hi
>>>
>>> I am using Lucen 9.4.2 vector search and everything seems to work fine,
>>> except that when I delete some documents from the index, then the
>>> method
>>>
>>>
>>> https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs()
>>>
>>>
>>> always returns 0, whereas I would have expected that it would return
>>> the
>>> number of documents which I deleted from the index.
>>>
>>> IndexReader.numDocs() returns the correct number though.
>>>
>>> I guess I misunderstand the javadoc and in particular the note "*NOTE*:
>>> This operation may run in O(maxDoc)."
>>>
>>> Does somebody explain in more detail what this method is doing?
>>>
>>> Thanks
>>>
>>> Michael
>>
>>
Re: What exactly returns IndexReader.numDeletedDocs() [ In reply to ]
If this is a reader with only a few documents the likelyness of all
deletes being applied while closing is high.

Uwe

Am 08.12.2022 um 11:44 schrieb Michael Wechner:
> My code at the moment is as follows:
>
> Directory dir = FSDirectory.open(Paths.get(vectorIndexPath));
>
> IndexReader reader =
> DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));
> int numberOfDocsBeforeDeleting = reader.numDocs();
> log.info("Number of documents: " + numberOfDocsBeforeDeleting);
> log.info("Number of deleted documents: " + reader.numDeletedDocs());
> reader.close();
>
> log.info("Delete document with path '" + uuid +"' from index '" +
> vectorIndexPath +"' ...");
> IndexWriterConfig iwc =new IndexWriterConfig();IndexWriter writer =new
> IndexWriter(dir, iwc);
> Term term =new Term(PATH_FIELD, uuid);
> writer.deleteDocuments(term);writer.close();
>
> reader =
> DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));
> int numberOfDocsAfterDeleting = reader.numDocs();
> log.info("Number of documents: " + numberOfDocsAfterDeleting);
> log.info("Number of deleted documents: " + (numberOfDocsBeforeDeleting
> - numberOfDocsAfterDeleting));
> // TODO: Not sure whether the method numDeletedDocs() makes sense here
> log.info("Number of deleted documents: " + reader.numDeletedDocs());
> reader.close();
>
>
> whereas this code always returns 0, whereas
>
> numberOfDocsBeforeDeleting - numberOfDocsAfterDeleting
>
> produces the correct result.
>
> Should I open the reader before closing the writer?
>
> Thanks
>
> Michael
>
>
>
> Am 08.12.22 um 11:36 schrieb Uwe Schindler:
>> You have to reopen the index reader to see deletes from the indexwriter.
>>
>> Am 08.12.2022 um 10:32 schrieb Hrvoje Lon?ar:
>>> Did you call this method before or after commit method?
>>> My wild guess would be that you can count deleted documents inside
>>> transaction only.
>>>
>>> On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner
>>> <michael.wechner@wyona.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I am using Lucen 9.4.2 vector search and everything seems to work
>>>> fine,
>>>> except that when I delete some documents from the index, then the
>>>> method
>>>>
>>>>
>>>> https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs()
>>>>
>>>>
>>>> always returns 0, whereas I would have expected that it would
>>>> return the
>>>> number of documents which I deleted from the index.
>>>>
>>>> IndexReader.numDocs() returns the correct number though.
>>>>
>>>> I guess I misunderstand the javadoc and in particular the note
>>>> "*NOTE*:
>>>> This operation may run in O(maxDoc)."
>>>>
>>>> Does somebody explain in more detail what this method is doing?
>>>>
>>>> Thanks
>>>>
>>>> Michael
>>>
>>>
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: What exactly returns IndexReader.numDeletedDocs() [ In reply to ]
So IIUC the information re number of deleted documents is only visible
temporarily and only when there are many documents, right?

Thanks

Michael

Am 08.12.22 um 14:21 schrieb Uwe Schindler:
> If this is a reader with only a few documents the likelyness of all
> deletes being applied while closing is high.
>
> Uwe
>
> Am 08.12.2022 um 11:44 schrieb Michael Wechner:
>> My code at the moment is as follows:
>>
>> Directory dir = FSDirectory.open(Paths.get(vectorIndexPath));
>>
>> IndexReader reader =
>> DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));
>> int numberOfDocsBeforeDeleting = reader.numDocs();
>> log.info("Number of documents: " + numberOfDocsBeforeDeleting);
>> log.info("Number of deleted documents: " + reader.numDeletedDocs());
>> reader.close();
>>
>> log.info("Delete document with path '" + uuid +"' from index '" +
>> vectorIndexPath +"' ...");
>> IndexWriterConfig iwc =new IndexWriterConfig();IndexWriter writer
>> =new IndexWriter(dir, iwc);
>> Term term =new Term(PATH_FIELD, uuid);
>> writer.deleteDocuments(term);writer.close();
>>
>> reader =
>> DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath)));
>> int numberOfDocsAfterDeleting = reader.numDocs();
>> log.info("Number of documents: " + numberOfDocsAfterDeleting);
>> log.info("Number of deleted documents: " +
>> (numberOfDocsBeforeDeleting - numberOfDocsAfterDeleting));
>> // TODO: Not sure whether the method numDeletedDocs() makes sense
>> here log.info("Number of deleted documents: " +
>> reader.numDeletedDocs()); reader.close();
>>
>>
>> whereas this code always returns 0, whereas
>>
>> numberOfDocsBeforeDeleting - numberOfDocsAfterDeleting
>>
>> produces the correct result.
>>
>> Should I open the reader before closing the writer?
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> Am 08.12.22 um 11:36 schrieb Uwe Schindler:
>>> You have to reopen the index reader to see deletes from the
>>> indexwriter.
>>>
>>> Am 08.12.2022 um 10:32 schrieb Hrvoje Lon?ar:
>>>> Did you call this method before or after commit method?
>>>> My wild guess would be that you can count deleted documents inside
>>>> transaction only.
>>>>
>>>> On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner
>>>> <michael.wechner@wyona.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I am using Lucen 9.4.2 vector search and everything seems to work
>>>>> fine,
>>>>> except that when I delete some documents from the index, then the
>>>>> method
>>>>>
>>>>>
>>>>> https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs()
>>>>>
>>>>>
>>>>> always returns 0, whereas I would have expected that it would
>>>>> return the
>>>>> number of documents which I deleted from the index.
>>>>>
>>>>> IndexReader.numDocs() returns the correct number though.
>>>>>
>>>>> I guess I misunderstand the javadoc and in particular the note
>>>>> "*NOTE*:
>>>>> This operation may run in O(maxDoc)."
>>>>>
>>>>> Does somebody explain in more detail what this method is doing?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Michael
>>>>
>>>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org