Hi,-
Sorry about the missing parts in previous post. please accept my
apologies for that.
i needed to add a few more questions/corrections/additions to the
previous post:
Main Question was: if boost is a single constant value, do we need the
Javascript part below?
=== Indexing code snippet for Lucene version 6.6.0 and before===
Document doc = new Document();
??Field f1= new TextField("field1", "string1", Field.Store.YES);?
doc.add(f1); ?f1.setBoost(2.0f);??
Field f2 = new TextField("field2", "string2", Field.Store.YES);?
doc.add(f2);?
f2.setBoost(1.0f);??
=== end of indexing code snippet for Lucene version 6.6.0 and before ===
This turns into this where _boost1 field is associated with field1 and
_boost2 field is associated with field2 field:
In Indexing code:
=== begining of indexing code snippet ===
Field f1= new TextField("field1", "string1", Field.Store.YES);?
Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);
// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)
Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);
// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)
=== end of indexing code snippet ===
Now, in the searching code (i.e., at query time) should i need the
FunctionScoreQuery because in this case
the boost is just a constant value but not a function? However, constant
value can be argued to be a function with the same value all the time, too.
== begining of query time code snippet ===
Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");
??// SimpleBindings just maps variables to SortField instances?
SimpleBindings bindings = new SimpleBindings();?
bindings.add(new SortField("_boost1", SortField.Type.LONG));? ?// These
have to LONG type i think since NumericDocValuesField accepts "long"
type only, am i right? Can this be DOUBLE type?
bindings.add(new SortField("_boost2", SortField.Type.LONG));? ?// same
question here
// create a query that matches based on body:contents but?
// scores using expr?
Query query = new FunctionScoreQuery(?
new TermQuery(new Term("field1", "term_to_look_for")),?
expr.getDoubleValuesSource(bindings));
?searcher.search(query, 10);
=== end of code snippet ===
Best regards
On 10/21/19 11:05 AM, baris.kazar@oracle.com wrote:
> Hi,-
>
> i would like to ask the following to make it clearer (for me at least):
>
> Document doc = new Document();
>
> ??Field f1= new TextField("field1", "string1", Field.Store.YES);?
>
> doc.add(f1); ?f1.setBoost(2.0f);??
>
> Field f2 = new TextField("field2", "string2", Field.Store.YES);?
>
> doc.add(f2);?
>
> f2.setBoost(1.0f);??
>
>
> This turns into this where _boost1 field is associated with field1 and
>
> _boost2 field is associated with field2 field:
>
>
> In Indexing code:
>
> Field f1= new TextField("field1", "string1", Field.Store.YES);?
>
> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> doc.add(_boost1);
>
> // If this boost value needs to be stored, a separate storedField
> instance needs to be added as well
> … ( i will post this soon)
>
> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
> doc.add(_boost2);
>
> // If this boost value needs to be stored, a separate storedField
> instance needs to be added as well
> … ( i will post this soon)
>
>
> Now, in the searching code (i.e., at query time) should i need the
> FunctionScoreQuery because in this case
>
> the boost is just a constant value but not a function? However,
> constant value can be argued to be a function with the same value all
> the time, too.
>
>
> Expression expr = JavascriptCompiler.compile(“_boost");
>
> ??// SimpleBindings just maps variables to SortField instances?
>
> SimpleBindings bindings = new SimpleBindings();?
>
> bindings.add(new SortField("_boost1", SortField.Type.SCORE));? ?
>
> // create a query that matches based on body:contents but?
>
> // scores using expr?
>
> Query query = new FunctionScoreQuery(?
>
> new TermQuery(new Term("field1", "term_to_look_for")),?
>
> expr.getDoubleValuesSource(bindings));
>
> ?searcher.search(query, 10);
>
>
> So, if boost is a single constant value, do we need the Javascript
> part above?
>
> Best regards
>
>
> On 10/18/19 4:07 PM, baris.kazar@oracle.com wrote:
>> Uwe,-
>>
>> can this
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e=
>> doc example that You also gave be extended with NumericDocValuesField
>> part that needs to be done at indexing time boosting, too?
>>
>> i see now why You meant that this is mixed type of boosting (i.e.,
>> both indexing time and search time).
>>
>> I need then include this query mentioned in this example on these
>> _score field (i would call it _boost field in my case) into my
>> overall BooleanQuery.
>>
>> i will now try to combine these together and post here for future help.
>>
>> Best regards
>>
>>
>> On 10/18/19 3:18 PM, Uwe Schindler wrote:
>>> Hi,
>>>
>>> Read my original email! The index time values are written using
>>> NumericDocValuesField. The expressions docs also refer to that when
>>> the bindings are documented.
>>>
>>> It's separate from the indexed data (TextField). Think of it like an
>>> additional numeric field in your database table with a factor in
>>> each row.
>>>
>>> Uwe
>>>
>>> Am October 18, 2019 7:14:03 PM UTC schrieb baris.kazar@oracle.com:
>>>> Uwe,-
>>>>
>>>> Two questions there:
>>>>
>>>> i guess this is applicable to TextField, too.
>>>>
>>>> And i was expecting a index writer object in the example for index
>>>> time
>>>>
>>>> boosting.
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
>>>>> Sorry I was imprecise. It's a mix of both. The factors are stored per
>>>> document in index (this is why I called it index time). During query
>>>> time the expression use the index time values to fold them into the
>>>> query boost at query time.
>>>>> What's your problem with that approach?
>>>>>
>>>>> Uwe
>>>>>
>>>>> Am October 18, 2019 6:50:40 PM UTC schrieb baris.kazar@oracle.com:
>>>>>> Uwe,-
>>>>>>
>>>>>> Thanks, if possible i am looking for a pure Java methodology
>>>>>> to do
>>>> the
>>>>>> index time boosting.
>>>>>>
>>>>>> This example looks like a search time boosting example:
>>>>>>
>>>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>
>>>>>>
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>> Is there a working example for this? Is this mentioned in the
>>>> Lucene
>>>>>>>> Javadocs or any other docs so that i can look it?
>>>>>>> To index the docvalues, see NumericDocValuesField (it can be added
>>>> to
>>>>>> documents like indexed or stored fields). You may have used them for
>>>>>> sorting already.
>>>>>>>> this methodology seems sort of like discouraging using index time
>>>>>> boosting.
>>>>>>> Not really. Many use this all the time. It's one of the killer
>>>>>> features of both Solr and Elasticsearch. The problem was how the
>>>>>> Document.setBoost()worked (it did not work correctly, see below).
>>>>>>>> Previous setBoost method call was fine and easy to use.
>>>>>>>> Did it have some performance issues and then is that why it was
>>>>>> deprecated?
>>>>>>> No the reason for deprecating this was for several reasons:
>>>> setBoost
>>>>>> was not doing what the user had expected. Internally the boost value
>>>>>> was just multiplied into the document norm factor (which is
>>>> internally
>>>>>> also a docvalues field). The norm factors are only very inprecise
>>>>>> floats stored in a byte, so precision is not well. If you put some
>>>>>> values into it and the length norm was already consuming all bits,
>>>> the
>>>>>> boosting was very coarse. It was also only multiplied into and most
>>>>>> users want to do some stuff like record click counts in the index
>>>> and
>>>>>> then boost for example with the logarithm or some other function. If
>>>>>> the boost is just multiplied into the length norm you have no
>>>>>> flexibility at all.
>>>>>>> In addition you can have several docvalues fields and use their
>>>>>> values in a function (e.g. one field with click count and another
>>>> one
>>>>>> with product price). After that you can combine click count and
>>>> price
>>>>>> (which can be modified indipenently during index updates) and change
>>>>>> boost to boost lower price and higher click count up.
>>>>>>> This is what you can do with the expressions module. You just give
>>>> it
>>>>>> a function.
>>>>>>> Here is an example, the second example is using a
>>>> FunctionScoreQuery
>>>>>> that modifies the score based on the function and the given
>>>> docvalues:
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>
>>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also be
>>>>>> nice
>>>>>>>> where
>>>>>>>>
>>>>>>>> MultiFieldQuery already has boosts field to do this in its
>>>>>> constructor.
>>>>>>> The boots in the query parser are applied for fields during query
>>>>>> time (to have a different weight per field). Index time boosting is
>>>> per
>>>>>> document. So you can combine both.
>>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>>>>>> title versus body). The parsed query is then wrapped with an
>>>> expression
>>>>>> that modifies the score per document according to the docvalues.
>>>>>>> Uwe
>>>>>>>
>>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> that's not true. You can do index time boosting, but you need to
>>>> do
>>>>>> that
>>>>>>>> using a separate field. You just index a numeric docvalues field
>>>>>> (which may
>>>>>>>> contain a long or float value per document). Later you wrap your
>>>>>> query with
>>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function query
>>>>>> syntax in
>>>>>>>> the expressions module). This allows you to compile a javascript
>>>>>> function
>>>>>>>> that calculated the final score based on the score returned by the
>>>>>> inner query
>>>>>>>> and combines them with docvalues that were indexed per document.
>>>>>>>>> Uwe
>>>>>>>>>
>>>>>>>>> -----
>>>>>>>>> Uwe Schindler
>>>>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>>>>> eMail: uwe@thetaphi.de
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: baris.kazar@oracle.com <baris.kazar@oracle.com>
>>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>>> Cc: baris.kazar@oracle.com
>>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>>>>>
>>>>>>>>>> It looks like index-time boosting (field) is not possible since
>>>>>> Lucene
>>>>>>>>>> version 7.7.2 and
>>>>>>>>>>
>>>>>>>>>> i was using before for another case the BoostQuery at search
>>>> time
>>>>>> for
>>>>>>>>>> boosting and
>>>>>>>>>>
>>>>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/18/19 10:01 AM, baris.kazar@oracle.com wrote:
>>>>>>>>>>> Hi,-
>>>>>>>>>>>
>>>>>>>>>>> i saw this in the Field class docs and i am figuring out the
>>>>>> following
>>>>>>>>>>> note in the docs:
>>>>>>>>>>>
>>>>>>>>>>> setBoost(float boost)
>>>>>>>>>>> Deprecated.
>>>>>>>>>>> Index-time boosts are deprecated, please index index-time
>>>> scoring
>>>>>>>>>>> factors into a doc value field and combine them with the score
>>>> at
>>>>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>>>>
>>>>>>>>>>> I appreciate this note. Is there an example about this? I wish
>>>>>> docs
>>>>>>>>>>> would give a simple example to further help.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>>>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>>>>> Field.html
>>>>>>>>>>> vs
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>>>>> ield.html
>>>>>>>>>>> Best regards
>>>>>>>>>>>
>>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail:
>>>> java-user-help@lucene.apache.org
>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>> --
>>>>> Uwe Schindler
>>>>> Achterdiek 19, 28357 Bremen
>>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0BlOT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, 28357 Bremen
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1TEcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org