Mailing List Archive

Merging indexes: delete_docs_by_term() first?
A question about which I'm not 100% sure:

When merging a temp index into a master index with add_invindexes(), must
any existing documents in the master (ie, docs which match docs in the
temp index) first be marked as deleted using delete_docs_by_term() prior
to the merge, or does the logic in add_invindexes() take care of duplicate
documents?

When indexing to a main index without temp indexes, then yes, delete
before adding, but what about merge with add_invindexes()?

The doc at
http://www.rectangular.com/kinosearch/docs/stable/KinoSearch/InvIndexer.html
isn't clear on this.

Sorry to beat this poor horse again - I just want to be sure.

Regards
Merging indexes: delete_docs_by_term() first? [ In reply to ]
On Aug 23, 2006, at 1:57 AM, henka@cityweb.co.za wrote:

> When merging a temp index into a master index with add_invindexes
> (), must
> any existing documents in the master (ie, docs which match docs in the
> temp index) first be marked as deleted using delete_docs_by_term()
> prior
> to the merge, or does the logic in add_invindexes() take care of
> duplicate
> documents?

You must use delete_docs_by_term(). KS knows nothing about whether
you've defined a field that you're using as a unique id. It will
happily index duplicates and return them with equal scores at search-
time.


Marvin Humphrey

--
I'm looking for a part time job.
Merging indexes: delete_docs_by_term() first? [ In reply to ]
>> When merging a temp index into a master index with add_invindexes
>> (), must
>> any existing documents in the master (ie, docs which match docs in the
>> temp index) first be marked as deleted using delete_docs_by_term()
>> prior
>> to the merge, or does the logic in add_invindexes() take care of
>> duplicate
>> documents?
>
> You must use delete_docs_by_term(). KS knows nothing about whether
> you've defined a field that you're using as a unique id. It will
> happily index duplicates and return them with equal scores at search-
> time.

Excellent, thanks. I suspected this, but needed to make sure to prevent
duplication.