Mailing List Archive

Updates documents using queries
Hi folks,
Currently the only way to update a block of documents is by identifying
them with a term and update those documents. However we have a case where
the child documents does not share a same identifier as parent documents,
and to identify the whole block of documents we need to use at least a
disjunction query like: (parentId: xx OR id: xx).
I wonder whether we could add a new API to IndexWriter supporting that? It
seems to me we just need to create a new DeleteQueue node with queries
instead of terms and pass it into internal updates method? Or am I missing
something so that update using query is not obvious?

Thanks
Patrick
Re: Updates documents using queries [ In reply to ]
You are talking about "updateDocuments(Term delTerm, Iterable docs)"?

We could add another method with Query like
[https://lucene.apache.org/core/9_6_0/core/org/apache/lucene/index/IndexWriter.html#deleteDocuments(org.apache.lucene.search.Query...)].
Implementation behind could be the same. Basically it would do the same
but just use delQuery using the DocIdSetIteraor of the query and
Iterable for the new block.

Uwe

Am 30.05.2023 um 23:52 schrieb Patrick Zhai:
> Hi folks,
> Currently the only way to update a block of documents is by
> identifying them with a term and update those documents. However we
> have a case where the child documents does not share a same identifier
> as parent documents, and to identify the whole block of documents we
> need to use at least a disjunction query like: (parentId: xx OR id: xx).
> I wonder whether we could add a new API to IndexWriter
> supportingĀ that? It seems to me we just need to create a new
> DeleteQueue node with queries instead of terms and pass it into
> internal updates method? Or amĀ I missing something so that update
> using query is not obvious?
>
> Thanks
> Patrick

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Updates documents using queries [ In reply to ]
+1 to add a new IW.updateDocuments(Query, Iterable) -- it should be simple
to implement using the existing (complex!) underlying machinery in IW.

It would atomically (wrt commit, refresh, and exceptions during the
indexing (i.e. fully rollback anything partially indexed)) delete all
documents matching Query and index all documents from Iterable.

Mike


On Wed, May 31, 2023, 7:12 AM Uwe Schindler <uwe@thetaphi.de> wrote:

> You are talking about "updateDocuments(Term delTerm, Iterable docs)"?
>
> We could add another method with Query like
> [
> https://lucene.apache.org/core/9_6_0/core/org/apache/lucene/index/IndexWriter.html#deleteDocuments(org.apache.lucene.search.Query...)].
>
> Implementation behind could be the same. Basically it would do the same
> but just use delQuery using the DocIdSetIteraor of the query and
> Iterable for the new block.
>
> Uwe
>
> Am 30.05.2023 um 23:52 schrieb Patrick Zhai:
> > Hi folks,
> > Currently the only way to update a block of documents is by
> > identifying them with a term and update those documents. However we
> > have a case where the child documents does not share a same identifier
> > as parent documents, and to identify the whole block of documents we
> > need to use at least a disjunction query like: (parentId: xx OR id: xx).
> > I wonder whether we could add a new API to IndexWriter
> > supporting that? It seems to me we just need to create a new
> > DeleteQueue node with queries instead of terms and pass it into
> > internal updates method? Or am I missing something so that update
> > using query is not obvious?
> >
> > Thanks
> > Patrick
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: Updates documents using queries [ In reply to ]
Thanks Uwe and Mike, I created: https://github.com/apache/lucene/pull/12341

On Wed, May 31, 2023 at 4:43?AM Michael McCandless <
lucene@mikemccandless.com> wrote:

> +1 to add a new IW.updateDocuments(Query, Iterable) -- it should be simple
> to implement using the existing (complex!) underlying machinery in IW.
>
> It would atomically (wrt commit, refresh, and exceptions during the
> indexing (i.e. fully rollback anything partially indexed)) delete all
> documents matching Query and index all documents from Iterable.
>
> Mike
>
>
> On Wed, May 31, 2023, 7:12 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> You are talking about "updateDocuments(Term delTerm, Iterable docs)"?
>>
>> We could add another method with Query like
>> [
>> https://lucene.apache.org/core/9_6_0/core/org/apache/lucene/index/IndexWriter.html#deleteDocuments(org.apache.lucene.search.Query...)].
>>
>> Implementation behind could be the same. Basically it would do the same
>> but just use delQuery using the DocIdSetIteraor of the query and
>> Iterable for the new block.
>>
>> Uwe
>>
>> Am 30.05.2023 um 23:52 schrieb Patrick Zhai:
>> > Hi folks,
>> > Currently the only way to update a block of documents is by
>> > identifying them with a term and update those documents. However we
>> > have a case where the child documents does not share a same identifier
>> > as parent documents, and to identify the whole block of documents we
>> > need to use at least a disjunction query like: (parentId: xx OR id: xx).
>> > I wonder whether we could add a new API to IndexWriter
>> > supporting that? It seems to me we just need to create a new
>> > DeleteQueue node with queries instead of terms and pass it into
>> > internal updates method? Or am I missing something so that update
>> > using query is not obvious?
>> >
>> > Thanks
>> > Patrick
>>
>> --
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>