Mailing List Archive

Index smarts
Does the Lucene index keep track of where in the original document it found each term's occurrence ?

For example: Lucene is indexing a file, and one of the terms found was "banana", and "banana" occurred in the file 3 times. Does Lucene save in the index where it found each occurrence of "banana" ? So for example, I could go to the file's offset of position 100 and find "banana". For that matter, does Lucene know that it was found three times, or just that it was found ?

The reason I ask is that when a user searches for something, I might like to just display snippets of the original file where the term was found, instead of the whole thing, because some of the files are quite large.
Re: Index smarts [ In reply to ]
if you need that to "Highlight" the document as google or altavista, take a look at the contributors section, a guy Mark had the main idea and also implemented it.
I have a better version that may be i'll send to the mailing list, but it is wrote in C (actually C++) and have been tested on Win NT.
It is a Java compatible Library (DLL).
Bye.

--

On Mon, 8 Jul 2002 18:23:17
Chris Sibert wrote:
>Does the Lucene index keep track of where in the original document it found each term's occurrence ?
>
>For example: Lucene is indexing a file, and one of the terms found was "banana", and "banana" occurred in the file 3 times. Does Lucene save in the index where it found each occurrence of "banana" ? So for example, I could go to the file's offset of position 100 and find "banana". For that matter, does Lucene know that it was found three times, or just that it was found ?
>
>The reason I ask is that when a user searches for something, I might like to just display snippets of the original file where the term was found, instead of the whole thing, because some of the files are quite large.
>
>
>


_____________________________________________________
Supercharge your e-mail with a 25MB Inbox, POP3 Access, No Ads
and NoTaglines --> LYCOS MAIL PLUS.
http://www.mail.lycos.com/brandPage.shtml?pageId=plus

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>