Mailing List Archive

Highlighting, Keywords and Summarizing
Hi,

I'm looking for tools (code) which provides information for:

- Highlighting (of search results)
- extracting of keywords (in different languages)
- and summarizing of text (giving a short description of a long text)

Has anyone an idea if there is some code or solutions (based on lucene)
for using?

Regards

Stefan
Re: Highlighting, Keywords and Summarizing [ In reply to ]
On May 2, 2005, at 8:25 AM, Schuh, Stefan wrote:

> Hi,
>
> I'm looking for tools (code) which provides information for:
>
> - Highlighting (of search results)

Lucene includes a highlighter in its contrib area. You can see an
example of it here: http://www.lucenebook.com/search?query=highlighter

Highlighter is currently in a build-it-yourself state in Lucene's
Subversion repository, however it will be released in binary official
form with Lucene 1.9 in the near future. You can get the binary of it
from the Lucene in Action source code download.

> - extracting of keywords (in different languages)

Please elaborate on what you're after here.

> - and summarizing of text (giving a short description of a long text)

Classifier4j has a text summarizer:
http://classifier4j.sourceforge.net/

Erik
AW: Highlighting, Keywords and Summarizing [ In reply to ]
Hi,

Thanks for the info.

Keywords are the most important words in articles. Let's say you have an article with 3 or 5 pages, the keywords are the most important words, but no stop words.

Regards

Stefan

-----Ursprüngliche Nachricht-----
Von: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Gesendet: Montag, 2. Mai 2005 16:41
An: general@lucene.apache.org
Betreff: Re: Highlighting, Keywords and Summarizing



On May 2, 2005, at 8:25 AM, Schuh, Stefan wrote:

> Hi,
>
> I'm looking for tools (code) which provides information for:
>
> - Highlighting (of search results)

Lucene includes a highlighter in its contrib area. You can see an
example of it here: http://www.lucenebook.com/search?query=highlighter

Highlighter is currently in a build-it-yourself state in Lucene's
Subversion repository, however it will be released in binary official
form with Lucene 1.9 in the near future. You can get the binary of it
from the Lucene in Action source code download.

> - extracting of keywords (in different languages)

Please elaborate on what you're after here.

> - and summarizing of text (giving a short description of a long text)

Classifier4j has a text summarizer:
http://classifier4j.sourceforge.net/

Erik
Re: AW: Highlighting, Keywords and Summarizing [ In reply to ]
On May 3, 2005, at 4:49 AM, Schuh, Stefan wrote:

> Hi,
>
> Thanks for the info.
>
> Keywords are the most important words in articles. Let's say you
> have an article with 3 or 5 pages, the keywords are the most
> important words, but no stop words.


Have a look at the Similarity (MoreLikeThis) code in Lucene's
Subversion repository under contrib/similarity. It does a very nice
job of extracting "important" terms and has a fair bit of flexibilty.

Erik



>
> Regards
>
> Stefan
>
> -----Ursprüngliche Nachricht-----
> Von: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Gesendet: Montag, 2. Mai 2005 16:41
> An: general@lucene.apache.org
> Betreff: Re: Highlighting, Keywords and Summarizing
>
>
>
> On May 2, 2005, at 8:25 AM, Schuh, Stefan wrote:
>
>
>> Hi,
>>
>> I'm looking for tools (code) which provides information for:
>>
>> - Highlighting (of search results)
>>
>
> Lucene includes a highlighter in its contrib area. You can see an
> example of it here: http://www.lucenebook.com/search?query=highlighter
>
> Highlighter is currently in a build-it-yourself state in Lucene's
> Subversion repository, however it will be released in binary official
> form with Lucene 1.9 in the near future. You can get the binary of it
> from the Lucene in Action source code download.
>
>
>> - extracting of keywords (in different languages)
>>
>
> Please elaborate on what you're after here.
>
>
>> - and summarizing of text (giving a short description of a long text)
>>
>
> Classifier4j has a text summarizer:
> http://classifier4j.sourceforge.net/
>
> Erik
>