Mailing List Archive: Can I simplify this bit of query boosting?

Can I simplify this bit of query boosting?

May 11, 2023, 6:43 AM

Post #1 of 4 (215 views)

Hi, I've hit a wall here.

In brief, users search a library of documents. Every indexed document has a
version number field which is always populated for release notes, sometimes
for other docs. Every document also has a category field which is how
release notes are identified, among other content types.

The requirement is to make sure that release notes are boosted relative to
other content, and that release notes with higher versions are boosted more
than those with lower versions.

I've currently implemented a crude method to achieve this, and the crucial
part of the process is here:

// have IndexReader reader, IndexSearcher searcher, Analyzer analyzer,
String userQuery

QueryParser parser = new QueryParser( "content", analyzer );

parser.setDefaultOperator( QueryParserBase.AND_OPERATOR );

BooleanQuery query = new BooleanQuery.Builder()

.add( parser.parse( userQuery ), Occur.MUST )

.add( new BoostQuery( parser.parse( "category:relnotes version:9*" ),
90.0f ), Occur.SHOULD )

.add( new BoostQuery( parser.parse( "category:relnotes version:8*" ),
80.0f ), Occur.SHOULD )

.add( new BoostQuery( parser.parse( "category:relnotes version:7*" ),
70.0f ), Occur.SHOULD )

.add( new BoostQuery( parser.parse( "category:relnotes version:6*" ),
60.0f ), Occur.SHOULD )

.add( new BoostQuery( parser.parse( "category:relnotes version:5*" ),
50.0f ), Occur.SHOULD )

.add( new BoostQuery( parser.parse( "category:relnotes version:4*" ),
40.0f ), Occur.SHOULD )

.add( new BoostQuery( parser.parse( "category:relnotes version:3*" ),
30.0f ), Occur.SHOULD )

.add( new BoostQuery( parser.parse( "category:relnotes version:2*" ),
20.0f ), Occur.SHOULD )

.add( new BoostQuery( parser.parse( "category:relnotes version:1*" ),
10.0f ), Occur.SHOULD )

.build();

I found through experimentation that the boost factors are not
multiplicative (as most of the explanations on the web implied) but are
simply added to the score. If I've misunderstood how boosting works, please
enlighten me!

The versions and boost factors above are arbitrary just to keep the example
simple; in reality the versions cover a much wider range and the boost
values do too.

This is working to a degree. But it's not granular enough, I really want the
boost factor to be calculated directly from the version value, if that is
possible.

I also imagine doing it this way makes searches quite expensive.

How could I improve this?

cheers

T

Re: Can I simplify this bit of query boosting? [ In reply to ]

horvoje at gmail

May 11, 2023, 10:12 AM

Post #2 of 4 (215 views)

Permalink

I had a situation when i wanted to sort a list of articles based on the
amount of data entered. For example, article having a photo, description,
ingredients should perform better comparing to one having only name and
photo.
For that purpose I created a numeric field that holds calculated value
named completeness. Later when executing a query, this number is used as a
sort modifier - in my case by using reverse order.
My project is based on Hibernate Search, so I guess it's not that I can put
here a code snippet. This numeric value does not have to be 1st sort
modifier. First you put the main sort rule and then you can refine sort
with this numeric value.
I hope it helps - at least to give you an idea which way to go.
BR,
Hrvoje

On Thu, 11 May 2023, 15:44 Trevor Nicholls, <trevor@castingthevoid.com>
wrote:

> Hi, I've hit a wall here.
>
>
>
> In brief, users search a library of documents. Every indexed document has a
> version number field which is always populated for release notes, sometimes
> for other docs. Every document also has a category field which is how
> release notes are identified, among other content types.
>
>
>
> The requirement is to make sure that release notes are boosted relative to
> other content, and that release notes with higher versions are boosted more
> than those with lower versions.
>
>
>
> I've currently implemented a crude method to achieve this, and the crucial
> part of the process is here:
>
>
>
> // have IndexReader reader, IndexSearcher searcher, Analyzer analyzer,
> String userQuery
>
> QueryParser parser = new QueryParser( "content", analyzer );
>
> parser.setDefaultOperator( QueryParserBase.AND_OPERATOR );
>
> BooleanQuery query = new BooleanQuery.Builder()
>
> .add( parser.parse( userQuery ), Occur.MUST )
>
> .add( new BoostQuery( parser.parse( "category:relnotes version:9*" ),
> 90.0f ), Occur.SHOULD )
>
> .add( new BoostQuery( parser.parse( "category:relnotes version:8*" ),
> 80.0f ), Occur.SHOULD )
>
> .add( new BoostQuery( parser.parse( "category:relnotes version:7*" ),
> 70.0f ), Occur.SHOULD )
>
> .add( new BoostQuery( parser.parse( "category:relnotes version:6*" ),
> 60.0f ), Occur.SHOULD )
>
> .add( new BoostQuery( parser.parse( "category:relnotes version:5*" ),
> 50.0f ), Occur.SHOULD )
>
> .add( new BoostQuery( parser.parse( "category:relnotes version:4*" ),
> 40.0f ), Occur.SHOULD )
>
> .add( new BoostQuery( parser.parse( "category:relnotes version:3*" ),
> 30.0f ), Occur.SHOULD )
>
> .add( new BoostQuery( parser.parse( "category:relnotes version:2*" ),
> 20.0f ), Occur.SHOULD )
>
> .add( new BoostQuery( parser.parse( "category:relnotes version:1*" ),
> 10.0f ), Occur.SHOULD )
>
> .build();
>
>
>
> I found through experimentation that the boost factors are not
> multiplicative (as most of the explanations on the web implied) but are
> simply added to the score. If I've misunderstood how boosting works, please
> enlighten me!
>
> The versions and boost factors above are arbitrary just to keep the example
> simple; in reality the versions cover a much wider range and the boost
> values do too.
>
>
>
> This is working to a degree. But it's not granular enough, I really want
> the
> boost factor to be calculated directly from the version value, if that is
> possible.
>
> I also imagine doing it this way makes searches quite expensive.
>
>
>
> How could I improve this?
>
>
>
> cheers
>
> T
>
>
>
>
>
>

Re: Can I simplify this bit of query boosting? [ In reply to ]

msokolov at gmail

May 11, 2023, 1:00 PM

Post #3 of 4 (215 views)

Permalink

You might also want to have a look at FeatureField. This can be used
to associate a score with a particular term.

On Thu, May 11, 2023 at 1:13?PM Hrvoje Lon?ar <horvoje@gmail.com> wrote:
>
> I had a situation when i wanted to sort a list of articles based on the
> amount of data entered. For example, article having a photo, description,
> ingredients should perform better comparing to one having only name and
> photo.
> For that purpose I created a numeric field that holds calculated value
> named completeness. Later when executing a query, this number is used as a
> sort modifier - in my case by using reverse order.
> My project is based on Hibernate Search, so I guess it's not that I can put
> here a code snippet. This numeric value does not have to be 1st sort
> modifier. First you put the main sort rule and then you can refine sort
> with this numeric value.
> I hope it helps - at least to give you an idea which way to go.
> BR,
> Hrvoje
>
> On Thu, 11 May 2023, 15:44 Trevor Nicholls, <trevor@castingthevoid.com>
> wrote:
>
> > Hi, I've hit a wall here.
> >
> >
> >
> > In brief, users search a library of documents. Every indexed document has a
> > version number field which is always populated for release notes, sometimes
> > for other docs. Every document also has a category field which is how
> > release notes are identified, among other content types.
> >
> >
> >
> > The requirement is to make sure that release notes are boosted relative to
> > other content, and that release notes with higher versions are boosted more
> > than those with lower versions.
> >
> >
> >
> > I've currently implemented a crude method to achieve this, and the crucial
> > part of the process is here:
> >
> >
> >
> > // have IndexReader reader, IndexSearcher searcher, Analyzer analyzer,
> > String userQuery
> >
> > QueryParser parser = new QueryParser( "content", analyzer );
> >
> > parser.setDefaultOperator( QueryParserBase.AND_OPERATOR );
> >
> > BooleanQuery query = new BooleanQuery.Builder()
> >
> > .add( parser.parse( userQuery ), Occur.MUST )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes version:9*" ),
> > 90.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes version:8*" ),
> > 80.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes version:7*" ),
> > 70.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes version:6*" ),
> > 60.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes version:5*" ),
> > 50.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes version:4*" ),
> > 40.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes version:3*" ),
> > 30.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes version:2*" ),
> > 20.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes version:1*" ),
> > 10.0f ), Occur.SHOULD )
> >
> > .build();
> >
> >
> >
> > I found through experimentation that the boost factors are not
> > multiplicative (as most of the explanations on the web implied) but are
> > simply added to the score. If I've misunderstood how boosting works, please
> > enlighten me!
> >
> > The versions and boost factors above are arbitrary just to keep the example
> > simple; in reality the versions cover a much wider range and the boost
> > values do too.
> >
> >
> >
> > This is working to a degree. But it's not granular enough, I really want
> > the
> > boost factor to be calculated directly from the version value, if that is
> > possible.
> >
> > I also imagine doing it this way makes searches quite expensive.
> >
> >
> >
> > How could I improve this?
> >
> >
> >
> > cheers
> >
> > T
> >
> >
> >
> >
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Can I simplify this bit of query boosting? [ In reply to ]

trevor at castingthevoid

May 14, 2023, 10:24 PM

Post #4 of 4 (215 views)

Permalink

Thanks for this tip, it looks like it might do the biz (although I'm finding choosing good values for the constants a bit of a marathon exercise).

cheers
T

-----Original Message-----
From: Michael Sokolov <msokolov@gmail.com>
Sent: Friday, 12 May 2023 08:01
To: java-user@lucene.apache.org
Subject: Re: Can I simplify this bit of query boosting?

You might also want to have a look at FeatureField. This can be used to associate a score with a particular term.

On Thu, May 11, 2023 at 1:13?PM Hrvoje Lon?ar <horvoje@gmail.com> wrote:
>
> I had a situation when i wanted to sort a list of articles based on
> the amount of data entered. For example, article having a photo,
> description, ingredients should perform better comparing to one having
> only name and photo.
> For that purpose I created a numeric field that holds calculated value
> named completeness. Later when executing a query, this number is used
> as a sort modifier - in my case by using reverse order.
> My project is based on Hibernate Search, so I guess it's not that I
> can put here a code snippet. This numeric value does not have to be
> 1st sort modifier. First you put the main sort rule and then you can
> refine sort with this numeric value.
> I hope it helps - at least to give you an idea which way to go.
> BR,
> Hrvoje
>
> On Thu, 11 May 2023, 15:44 Trevor Nicholls,
> <trevor@castingthevoid.com>
> wrote:
>
> > Hi, I've hit a wall here.
> >
> >
> >
> > In brief, users search a library of documents. Every indexed
> > document has a version number field which is always populated for
> > release notes, sometimes for other docs. Every document also has a
> > category field which is how release notes are identified, among other content types.
> >
> >
> >
> > The requirement is to make sure that release notes are boosted
> > relative to other content, and that release notes with higher
> > versions are boosted more than those with lower versions.
> >
> >
> >
> > I've currently implemented a crude method to achieve this, and the
> > crucial part of the process is here:
> >
> >
> >
> > // have IndexReader reader, IndexSearcher searcher, Analyzer
> > analyzer, String userQuery
> >
> > QueryParser parser = new QueryParser( "content", analyzer );
> >
> > parser.setDefaultOperator( QueryParserBase.AND_OPERATOR );
> >
> > BooleanQuery query = new BooleanQuery.Builder()
> >
> > .add( parser.parse( userQuery ), Occur.MUST )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes
> > version:9*" ), 90.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes
> > version:8*" ), 80.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes
> > version:7*" ), 70.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes
> > version:6*" ), 60.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes
> > version:5*" ), 50.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes
> > version:4*" ), 40.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes
> > version:3*" ), 30.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes
> > version:2*" ), 20.0f ), Occur.SHOULD )
> >
> > .add( new BoostQuery( parser.parse( "category:relnotes
> > version:1*" ), 10.0f ), Occur.SHOULD )
> >
> > .build();
> >
> >
> >
> > I found through experimentation that the boost factors are not
> > multiplicative (as most of the explanations on the web implied) but
> > are simply added to the score. If I've misunderstood how boosting
> > works, please enlighten me!
> >
> > The versions and boost factors above are arbitrary just to keep the
> > example simple; in reality the versions cover a much wider range and
> > the boost values do too.
> >
> >
> >
> > This is working to a degree. But it's not granular enough, I really
> > want the boost factor to be calculated directly from the version
> > value, if that is possible.
> >
> > I also imagine doing it this way makes searches quite expensive.
> >
> >
> >
> > How could I improve this?
> >
> >
> >
> > cheers
> >
> > T
> >
> >
> >
> >
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org