Mailing List Archive

Need suggestion for a Lucene upgrade scenario
Hi,

In our project for Lucene migration from 2.4.0 to 8.11.2, we need your suggestion to address a case.
With Lucene 2.4.0, we were using the kind of below code snippet.
With Lucene 8.11.2[Written snippet below], we need to extract the startOffset & endOffset value for further some calculation similar to Lucene 2.4.0.
Is there any easy way/API to extract the values from tokenStream?

//Lucene 2.4.0
===========================================================================
Token token;
TokenStream valueStream = analyzer.tokenStream(new StringReader(fieldValue), false,true);
while ((token = valueStream.next()) != null) {
int startOffset = token.startOffset();
int endOffset = token.endOffset();

//Do some calculation based on startOffset & endOffset
}
============================================================================


//Lucene 8.11.2
========================================================================
TokenStream valueStream = analyzer.tokenStream(field, new StringReader(fieldValue));
CharTermAttribute charValueTermAttribute = valueStream.addAttribute(CharTermAttribute.class);
while (valueStream.incrementToken()) {
String termValueText = charValueTermAttribute.toString();

//How to get startOffset & endOffset as like in Lucene 2.4

//Do some calculation based on startOffset & endOffset
}

Please let me know, if there is any further information is required from my side.

Regards
Rajib
Re: Need suggestion for a Lucene upgrade scenario [ In reply to ]
Hello Rajib,
Check OffsetAttribute.


On Tue, Jan 30, 2024 at 1:31?PM Saha, Rajib <rajib.saha@sap.com.invalid>
wrote:

> Hi,
>
> In our project for Lucene migration from 2.4.0 to 8.11.2, we need your
> suggestion to address a case.
> With Lucene 2.4.0, we were using the kind of below code snippet.
> With Lucene 8.11.2[Written snippet below], we need to extract the
> startOffset & endOffset value for further some calculation similar to
> Lucene 2.4.0.
> Is there any easy way/API to extract the values from tokenStream?
>
> //Lucene 2.4.0
> ===========================================================================
> Token token;
> TokenStream valueStream = analyzer.tokenStream(new
> StringReader(fieldValue), false,true);
> while ((token = valueStream.next()) != null) {
> int startOffset = token.startOffset();
> int endOffset = token.endOffset();
>
> //Do some calculation based on startOffset & endOffset
> }
>
> ============================================================================
>
>
> //Lucene 8.11.2
> ========================================================================
> TokenStream valueStream = analyzer.tokenStream(field, new
> StringReader(fieldValue));
> CharTermAttribute charValueTermAttribute =
> valueStream.addAttribute(CharTermAttribute.class);
> while (valueStream.incrementToken()) {
> String termValueText = charValueTermAttribute.toString();
>
> //How to get startOffset & endOffset as like in Lucene 2.4
>
> //Do some calculation based on startOffset & endOffset
> }
>
> Please let me know, if there is any further information is required from
> my side.
>
> Regards
> Rajib
>


--
Sincerely yours
Mikhail Khludnev
Re: Need suggestion for a Lucene upgrade scenario [ In reply to ]
Hi,

please read the documentation. It is explained in detail:
https://lucene.apache.org/core/8_11_0/core/org/apache/lucene/analysis/package-summary.html#package.description

There are also many blog posts about this change (which was done now
almost 15 years ago)!

Uwe

Am 30.01.2024 um 11:30 schrieb Saha, Rajib:
> Hi,
>
> In our project for Lucene migration from 2.4.0 to 8.11.2, we need your suggestion to address a case.
> With Lucene 2.4.0, we were using the kind of below code snippet.
> With Lucene 8.11.2[Written snippet below], we need to extract the startOffset & endOffset value for further some calculation similar to Lucene 2.4.0.
> Is there any easy way/API to extract the values from tokenStream?
>
> //Lucene 2.4.0
> ===========================================================================
> Token token;
> TokenStream valueStream = analyzer.tokenStream(new StringReader(fieldValue), false,true);
> while ((token = valueStream.next()) != null) {
> int startOffset = token.startOffset();
> int endOffset = token.endOffset();
>
> //Do some calculation based on startOffset & endOffset
> }
> ============================================================================
>
>
> //Lucene 8.11.2
> ========================================================================
> TokenStream valueStream = analyzer.tokenStream(field, new StringReader(fieldValue));
> CharTermAttribute charValueTermAttribute = valueStream.addAttribute(CharTermAttribute.class);
> while (valueStream.incrementToken()) {
> String termValueText = charValueTermAttribute.toString();
>
> //How to get startOffset & endOffset as like in Lucene 2.4
>
> //Do some calculation based on startOffset & endOffset
> }
>
> Please let me know, if there is any further information is required from my side.
>
> Regards
> Rajib
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org