Mailing List Archive

Unclear on what position means
Hi,

I'm trying to figure out if I should be learning to use Lucene. I
imagine wanting to provide a user with a way to search for something and
present that found thing, in some way. If what is ultimately searched is
text files, then position would be an offset into the text file, I
think. But, that seems like a pretty unlikely scenario.

If I have stored structured data into a database of some sort, does
Lucene provide some way to associate a position with an entry in a
database? Or is that left to the programmer to implement, outside of
Lucene?

Kendall


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Unclear on what position means [ In reply to ]
Hello, Kendall.

You can read about Token Position Increments at
https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/analysis/package-summary.html#package.description
Usually position is a number of word and offset is a number of symbol.
Modeling entries via positions is boilerplate, I suppose. Nowadays we
either denormalize by copying values across children into a single parent
document. Also, here are more relational options
https://lucene.apache.org/core/9_2_0/join/org/apache/lucene/search/join/package-summary.html


On Fri, Jul 22, 2022 at 7:02 AM Kendall Shaw <kshaw@kendallshaw.com> wrote:

> Hi,
>
> I'm trying to figure out if I should be learning to use Lucene. I
> imagine wanting to provide a user with a way to search for something and
> present that found thing, in some way. If what is ultimately searched is
> text files, then position would be an offset into the text file, I
> think. But, that seems like a pretty unlikely scenario.
>
> If I have stored structured data into a database of some sort, does
> Lucene provide some way to associate a position with an entry in a
> database? Or is that left to the programmer to implement, outside of
> Lucene?
>
> Kendall
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--
Sincerely yours
Mikhail Khludnev
Re: Unclear on what position means [ In reply to ]
Hi Kendall,

"Position" and "Offset" are often confused in Lucene ;)

Lucene uses offset to track what you referred to ("(character, not byte)
offset into a text file", or into an indexed string).

Lucene uses position to track the Nth token: position 0 is first token,
position 1 is the second token, etc. But since tokens are usually N > 1
characters, the offsets grow faster than the positions. These tokens need
not be only a linear sequence: they can be a graph structure when
multi-token synonyms are applied.

Lucene indexes both of these, and you can turn them individually on/off if
you want.

Finally, you might be interested in Lucene's highlighters module -- this
contains tooling to do hit highlighting, to solve the "final inch" problem
of showing your users precisely which words/excerpts matched inside each
matched hit. Here's an example
<https://jirasearch.mikemccandless.com/search.py?chg=new&text=python&a1=&a2=&page=0&searcher=24390&sort=recentlyUpdated&format=list&id=jvmz29ec86du&dd=project%3ALucene&newText=python>
(searching Lucene's issues for the word "python").

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jul 22, 2022 at 12:51 AM Mikhail Khludnev <mkhl@apache.org> wrote:

> Hello, Kendall.
>
> You can read about Token Position Increments at
>
> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/analysis/package-summary.html#package.description
> Usually position is a number of word and offset is a number of symbol.
> Modeling entries via positions is boilerplate, I suppose. Nowadays we
> either denormalize by copying values across children into a single parent
> document. Also, here are more relational options
>
> https://lucene.apache.org/core/9_2_0/join/org/apache/lucene/search/join/package-summary.html
>
>
> On Fri, Jul 22, 2022 at 7:02 AM Kendall Shaw <kshaw@kendallshaw.com>
> wrote:
>
> > Hi,
> >
> > I'm trying to figure out if I should be learning to use Lucene. I
> > imagine wanting to provide a user with a way to search for something and
> > present that found thing, in some way. If what is ultimately searched is
> > text files, then position would be an offset into the text file, I
> > think. But, that seems like a pretty unlikely scenario.
> >
> > If I have stored structured data into a database of some sort, does
> > Lucene provide some way to associate a position with an entry in a
> > database? Or is that left to the programmer to implement, outside of
> > Lucene?
> >
> > Kendall
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>