Mailing List Archive

ConjunctionDISI nextDoc can return immediately when NO_MORE_DOCS
Hi
I am reading the code of class ConjunctionDISI .and about the method
nextDoc , Suppose that the sub-disi is emtpy in the lead1.lead2,should
there be that it can return immediately when the input doc==NO_MORE_DOCS?


private int doNext(int doc) throws IOException {
advanceHead:
for (; ; ) {
assert doc == lead1.docID();
//assumpt doc==NO_MORE_DOCS ,it return
if(doc==NO_MORE_DOCS){
return NO_MORE_DOCS;
}
// find agreement between the two iterators with the lower costs
// we special case them because they do not need the
// 'other.docID() < doc' check that the 'others' iterators need
final int next2 = lead2.advance(doc);
if (next2 != doc) {
doc = lead1.advance(next2);
if(doc==NO_MORE_DOCS){
return NO_MORE_DOCS;
}
if (next2 != doc) {
continue;
}
}
..left omited...
}
Re: ConjunctionDISI nextDoc can return immediately when NO_MORE_DOCS [ In reply to ]
Hello,

This change would be correct, but it would only save work when the
conjunction is exhausted, and add overhead otherwise?

Le sam. 30 sept. 2023, 16:20, YouPeng Yang <yypvsxf19870706@gmail.com> a
écrit :

> Hi
> I am reading the code of class ConjunctionDISI .and about the method
> nextDoc , Suppose that the sub-disi is emtpy in the lead1.lead2,should
> there be that it can return immediately when the input doc==NO_MORE_DOCS?
>
>
> private int doNext(int doc) throws IOException {
> advanceHead:
> for (; ; ) {
> assert doc == lead1.docID();
> //assumpt doc==NO_MORE_DOCS ,it return
> if(doc==NO_MORE_DOCS){
> return NO_MORE_DOCS;
> }
> // find agreement between the two iterators with the lower costs
> // we special case them because they do not need the
> // 'other.docID() < doc' check that the 'others' iterators need
> final int next2 = lead2.advance(doc);
> if (next2 != doc) {
> doc = lead1.advance(next2);
> if(doc==NO_MORE_DOCS){
> return NO_MORE_DOCS;
> }
> if (next2 != doc) {
> continue;
> }
> }
> ..left omited...
> }
>
Re: ConjunctionDISI nextDoc can return immediately when NO_MORE_DOCS [ In reply to ]
Hi Adrien
suppose that conjunction query like (term1 AND term2 AND term3) ,if
the term1 does not exist ,and then the loop execution may cause
unnecessary overhead.(sorry I have not yet find out whether there is any
filter work before the doNext()..

Best Regard

Adrien Grand <jpountz@gmail.com> ?2023?10?1??? 22:30???

> Hello,
>
> This change would be correct, but it would only save work when the
> conjunction is exhausted, and add overhead otherwise?
>
> Le sam. 30 sept. 2023, 16:20, YouPeng Yang <yypvsxf19870706@gmail.com> a
> écrit :
>
>> Hi
>> I am reading the code of class ConjunctionDISI .and about the method
>> nextDoc , Suppose that the sub-disi is emtpy in the lead1.lead2,should
>> there be that it can return immediately when the input doc==NO_MORE_DOCS?
>>
>>
>> private int doNext(int doc) throws IOException {
>> advanceHead:
>> for (; ; ) {
>> assert doc == lead1.docID();
>> //assumpt doc==NO_MORE_DOCS ,it return
>> if(doc==NO_MORE_DOCS){
>> return NO_MORE_DOCS;
>> }
>> // find agreement between the two iterators with the lower costs
>> // we special case them because they do not need the
>> // 'other.docID() < doc' check that the 'others' iterators need
>> final int next2 = lead2.advance(doc);
>> if (next2 != doc) {
>> doc = lead1.advance(next2);
>> if(doc==NO_MORE_DOCS){
>> return NO_MORE_DOCS;
>> }
>> if (next2 != doc) {
>> continue;
>> }
>> }
>> ..left omited...
>> }
>>
>
Re: ConjunctionDISI nextDoc can return immediately when NO_MORE_DOCS [ In reply to ]
I agree that it would save work in that case, but this query should be very
fast anyway.

On the other hand, if term1, term2 and term3 have 10M matches each, the
conjunction will need to check if the current candidate match is
NO_MORE_DOCS millions of times even though this would only happen once.

In general it's better to have less overhead for expensive queries and more
overhead for cheap queries than the other way around.

Le dim. 1 oct. 2023, 17:35, YouPeng Yang <yypvsxf19870706@gmail.com> a
écrit :

> Hi Adrien
> suppose that conjunction query like (term1 AND term2 AND term3) ,if
> the term1 does not exist ,and then the loop execution may cause
> unnecessary overhead.(sorry I have not yet find out whether there is any
> filter work before the doNext()..
>
> Best Regard
>
> Adrien Grand <jpountz@gmail.com> ?2023?10?1??? 22:30???
>
>> Hello,
>>
>> This change would be correct, but it would only save work when the
>> conjunction is exhausted, and add overhead otherwise?
>>
>> Le sam. 30 sept. 2023, 16:20, YouPeng Yang <yypvsxf19870706@gmail.com> a
>> écrit :
>>
>>> Hi
>>> I am reading the code of class ConjunctionDISI .and about the method
>>> nextDoc , Suppose that the sub-disi is emtpy in the lead1.lead2,should
>>> there be that it can return immediately when the input doc==NO_MORE_DOCS?
>>>
>>>
>>> private int doNext(int doc) throws IOException {
>>> advanceHead:
>>> for (; ; ) {
>>> assert doc == lead1.docID();
>>> //assumpt doc==NO_MORE_DOCS ,it return
>>> if(doc==NO_MORE_DOCS){
>>> return NO_MORE_DOCS;
>>> }
>>> // find agreement between the two iterators with the lower costs
>>> // we special case them because they do not need the
>>> // 'other.docID() < doc' check that the 'others' iterators need
>>> final int next2 = lead2.advance(doc);
>>> if (next2 != doc) {
>>> doc = lead1.advance(next2);
>>> if(doc==NO_MORE_DOCS){
>>> return NO_MORE_DOCS;
>>> }
>>> if (next2 != doc) {
>>> continue;
>>> }
>>> }
>>> ..left omited...
>>> }
>>>
>>
Re: ConjunctionDISI nextDoc can return immediately when NO_MORE_DOCS [ In reply to ]
At Infoseek, the engine checked the terms in frequency order, with the most rare term first. If the conjunction reached zero matches at any point, it stopped checking.

This might be a related but more general approach.

That was almost 30 years ago, so any patents are long-expired.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/ (my blog)

> On Oct 1, 2023, at 10:12 AM, Adrien Grand <jpountz@gmail.com> wrote:
>
> I agree that it would save work in that case, but this query should be very fast anyway.
>
> On the other hand, if term1, term2 and term3 have 10M matches each, the conjunction will need to check if the current candidate match is NO_MORE_DOCS millions of times even though this would only happen once.
>
> In general it's better to have less overhead for expensive queries and more overhead for cheap queries than the other way around.
>
> Le dim. 1 oct. 2023, 17:35, YouPeng Yang <yypvsxf19870706@gmail.com <mailto:yypvsxf19870706@gmail.com>> a écrit :
>> Hi Adrien
>> suppose that conjunction query like (term1 AND term2 AND term3) ,if the term1 does not exist ,and then the loop execution may cause unnecessary overhead.(sorry I have not yet find out whether there is any filter work before the doNext()..
>>
>> Best Regard
>>
>> Adrien Grand <jpountz@gmail.com <mailto:jpountz@gmail.com>> ?2023?10?1??? 22:30???
>>> Hello,
>>>
>>> This change would be correct, but it would only save work when the conjunction is exhausted, and add overhead otherwise?
>>>
>>> Le sam. 30 sept. 2023, 16:20, YouPeng Yang <yypvsxf19870706@gmail.com <mailto:yypvsxf19870706@gmail.com>> a écrit :
>>>> Hi
>>>> I am reading the code of class ConjunctionDISI .and about the method nextDoc , Suppose that the sub-disi is emtpy in the lead1.lead2,should there be that it can return immediately when the input doc==NO_MORE_DOCS?
>>>>
>>>>
>>>> private int doNext(int doc) throws IOException {
>>>> advanceHead:
>>>> for (; ; ) {
>>>> assert doc == lead1.docID();
>>>> //assumpt doc==NO_MORE_DOCS ,it return
>>>> if(doc==NO_MORE_DOCS){
>>>> return NO_MORE_DOCS;
>>>> }
>>>> // find agreement between the two iterators with the lower costs
>>>> // we special case them because they do not need the
>>>> // 'other.docID() < doc' check that the 'others' iterators need
>>>> final int next2 = lead2.advance(doc);
>>>> if (next2 != doc) {
>>>> doc = lead1.advance(next2);
>>>> if(doc==NO_MORE_DOCS){
>>>> return NO_MORE_DOCS;
>>>> }
>>>> if (next2 != doc) {
>>>> continue;
>>>> }
>>>> }
>>>> ..left omited...
>>>> }
Re: ConjunctionDISI nextDoc can return immediately when NO_MORE_DOCS [ In reply to ]
This is a good approach indeed, Lucene does this too.

Le dim. 1 oct. 2023, 19:33, Walter Underwood <wunder@wunderwood.org> a
écrit :

> At Infoseek, the engine checked the terms in frequency order, with the
> most rare term first. If the conjunction reached zero matches at any point,
> it stopped checking.
>
> This might be a related but more general approach.
>
> That was almost 30 years ago, so any patents are long-expired.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/ (my blog)
>
> On Oct 1, 2023, at 10:12 AM, Adrien Grand <jpountz@gmail.com> wrote:
>
> I agree that it would save work in that case, but this query should be
> very fast anyway.
>
> On the other hand, if term1, term2 and term3 have 10M matches each, the
> conjunction will need to check if the current candidate match is
> NO_MORE_DOCS millions of times even though this would only happen once.
>
> In general it's better to have less overhead for expensive queries and
> more overhead for cheap queries than the other way around.
>
> Le dim. 1 oct. 2023, 17:35, YouPeng Yang <yypvsxf19870706@gmail.com> a
> écrit :
>
>> Hi Adrien
>> suppose that conjunction query like (term1 AND term2 AND term3) ,if
>> the term1 does not exist ,and then the loop execution may cause
>> unnecessary overhead.(sorry I have not yet find out whether there is any
>> filter work before the doNext()..
>>
>> Best Regard
>>
>> Adrien Grand <jpountz@gmail.com> ?2023?10?1??? 22:30???
>>
>>> Hello,
>>>
>>> This change would be correct, but it would only save work when the
>>> conjunction is exhausted, and add overhead otherwise?
>>>
>>> Le sam. 30 sept. 2023, 16:20, YouPeng Yang <yypvsxf19870706@gmail.com>
>>> a écrit :
>>>
>>>> Hi
>>>> I am reading the code of class ConjunctionDISI .and about the method
>>>> nextDoc , Suppose that the sub-disi is emtpy in the lead1.lead2,should
>>>> there be that it can return immediately when the input doc==NO_MORE_DOCS?
>>>>
>>>>
>>>> private int doNext(int doc) throws IOException {
>>>> advanceHead:
>>>> for (; ; ) {
>>>> assert doc == lead1.docID();
>>>> //assumpt doc==NO_MORE_DOCS ,it return
>>>> if(doc==NO_MORE_DOCS){
>>>> return NO_MORE_DOCS;
>>>> }
>>>> // find agreement between the two iterators with the lower costs
>>>> // we special case them because they do not need the
>>>> // 'other.docID() < doc' check that the 'others' iterators need
>>>> final int next2 = lead2.advance(doc);
>>>> if (next2 != doc) {
>>>> doc = lead1.advance(next2);
>>>> if(doc==NO_MORE_DOCS){
>>>> return NO_MORE_DOCS;
>>>> }
>>>> if (next2 != doc) {
>>>> continue;
>>>> }
>>>> }
>>>> ..left omited...
>>>> }
>>>>
>>>
>