Mailing List Archive

Changing Term Vectors for Query
Hello,
for some Queries i need to calcuate the score mostly like the normal score, but for some documents certain terms are assigned a Frequency given by me and the score should be calculated with these new term frequencies. After some research, it seems i have to implement a custom Query, custom Weight and Custom Scorer for this. I wanted to ask if I'm overlooking a simpler solution or if this is the way to go.
Thanks,
Marcel
Re: Changing Term Vectors for Query [ In reply to ]
Hi Marcel,

You can make Lucene index custom frequencies using something like
DelimitedTermFrequencyTokenFilter
<https://lucene.apache.org/core/8_8_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/DelimitedTermFrequencyTokenFilter.html>,
which would be easier than writing a custom Query/Weight/Scorer. Would it
work for you?

On Sun, Jun 6, 2021 at 10:24 PM Hannes Lohr
<trueBaum32@protonmail.com.invalid> wrote:

> Hello,
> for some Queries i need to calcuate the score mostly like the normal
> score, but for some documents certain terms are assigned a Frequency given
> by me and the score should be calculated with these new term frequencies.
> After some research, it seems i have to implement a custom Query, custom
> Weight and Custom Scorer for this. I wanted to ask if I'm overlooking a
> simpler solution or if this is the way to go.
> Thanks,
> Marcel



--
Adrien
Re: Changing Term Vectors for Query [ In reply to ]
Hi Adrien,
i forgot to mention that i also need the original frequencies. I have some queries i need to perform with the original frequencies and then some with custom frequencies, but as im only having a small index and a few queries that would work, but a solution where i dont have to change the index for those queries would be better for me.
Marcel



??????? Original Message ???????
On Monday, June 7, 2021 9:11 AM, Adrien Grand <jpountz@gmail.com> wrote:

> Hi Marcel,
>
> You can make Lucene index custom frequencies using something like
> DelimitedTermFrequencyTokenFilter
> https://lucene.apache.org/core/8_8_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/DelimitedTermFrequencyTokenFilter.html,
> which would be easier than writing a custom Query/Weight/Scorer. Would it
> work for you?
>
> On Sun, Jun 6, 2021 at 10:24 PM Hannes Lohr
> trueBaum32@protonmail.com.invalid wrote:
>
> > Hello,
> > for some Queries i need to calcuate the score mostly like the normal
> > score, but for some documents certain terms are assigned a Frequency given
> > by me and the score should be calculated with these new term frequencies.
> > After some research, it seems i have to implement a custom Query, custom
> > Weight and Custom Scorer for this. I wanted to ask if I'm overlooking a
> > simpler solution or if this is the way to go.
> > Thanks,
> > Marcel
>
> --
>
> Adrien



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Changing Term Vectors for Query [ In reply to ]
Hi,

the only way to get this performance wise effective would be the approach by Adrien.

What you generally do is to index the same information into 2 different fields (in Solr or Elasticsearch as "copy_to / copyfield") with different analyzers. During query you choosse the field applicable.

If you want to have "per document" scoring factors (not per term), you can also use additional DocValues fields with per-document factors and you can use a function query (e.g. using expressions module) to modify the score.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Marcel D. <trueBaum32@protonmail.com.INVALID>
> Sent: Monday, June 7, 2021 9:53 AM
> To: java-user@lucene.apache.org
> Subject: Re: Changing Term Vectors for Query
>
> Hi Adrien,
> i forgot to mention that i also need the original frequencies. I have some
> queries i need to perform with the original frequencies and then some with
> custom frequencies, but as im only having a small index and a few queries that
> would work, but a solution where i dont have to change the index for those
> queries would be better for me.
> Marcel
>
>
>
> ??????? Original Message ???????
> On Monday, June 7, 2021 9:11 AM, Adrien Grand <jpountz@gmail.com> wrote:
>
> > Hi Marcel,
> >
> > You can make Lucene index custom frequencies using something like
> > DelimitedTermFrequencyTokenFilter
> > https://lucene.apache.org/core/8_8_0/analyzers-
> common/org/apache/lucene/analysis/miscellaneous/DelimitedTermFrequencyT
> okenFilter.html,
> > which would be easier than writing a custom Query/Weight/Scorer. Would it
> > work for you?
> >
> > On Sun, Jun 6, 2021 at 10:24 PM Hannes Lohr
> > trueBaum32@protonmail.com.invalid wrote:
> >
> > > Hello,
> > > for some Queries i need to calcuate the score mostly like the normal
> > > score, but for some documents certain terms are assigned a Frequency
> given
> > > by me and the score should be calculated with these new term frequencies.
> > > After some research, it seems i have to implement a custom Query, custom
> > > Weight and Custom Scorer for this. I wanted to ask if I'm overlooking a
> > > simpler solution or if this is the way to go.
> > > Thanks,
> > > Marcel
> >
> > --
> >
> > Adrien
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Changing Term Vectors for Query [ In reply to ]
Hi,
at first i think i missed pointing out my problem exactly. What i wanna do is run a normal query on my index. After that i wanna change the frequencies of some important terms to another number and i know neither the new frequency nor the term which frequency i update at index creation. As far as ive looked now, i could use what you said and update the documents, but i also need for more queries the old normal index with the frequencies given firsthand in the documents and updating alot of documents for this task seems not intended.

Marcel
??????? Original Message ???????
On Monday, June 7, 2021 11:23 AM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> the only way to get this performance wise effective would be the approach by Adrien.
>
> What you generally do is to index the same information into 2 different fields (in Solr or Elasticsearch as "copy_to / copyfield") with different analyzers. During query you choosse the field applicable.
>
> If you want to have "per document" scoring factors (not per term), you can also use additional DocValues fields with per-document factors and you can use a function query (e.g. using expressions module) to modify the score.
>
> Uwe
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Marcel D. trueBaum32@protonmail.com.INVALID
> > Sent: Monday, June 7, 2021 9:53 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: Changing Term Vectors for Query
> > Hi Adrien,
> > i forgot to mention that i also need the original frequencies. I have some
> > queries i need to perform with the original frequencies and then some with
> > custom frequencies, but as im only having a small index and a few queries that
> > would work, but a solution where i dont have to change the index for those
> > queries would be better for me.
> > Marcel
> > ??????? Original Message ???????
> > On Monday, June 7, 2021 9:11 AM, Adrien Grand jpountz@gmail.com wrote:
> >
> > > Hi Marcel,
> > > You can make Lucene index custom frequencies using something like
> > > DelimitedTermFrequencyTokenFilter
> > > https://lucene.apache.org/core/8_8_0/analyzers-
> > > common/org/apache/lucene/analysis/miscellaneous/DelimitedTermFrequencyT
> > > okenFilter.html,
> > > which would be easier than writing a custom Query/Weight/Scorer. Would it
> > > work for you?
> > > On Sun, Jun 6, 2021 at 10:24 PM Hannes Lohr
> > > trueBaum32@protonmail.com.invalid wrote:
> > >
> > > > Hello,
> > > > for some Queries i need to calcuate the score mostly like the normal
> > > > score, but for some documents certain terms are assigned a Frequency
> > > > given
> > >
> > > > by me and the score should be calculated with these new term frequencies.
> > > > After some research, it seems i have to implement a custom Query, custom
> > > > Weight and Custom Scorer for this. I wanted to ask if I'm overlooking a
> > > > simpler solution or if this is the way to go.
> > > > Thanks,
> > > > Marcel
> > >
> > > --
> > > Adrien
> >
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org