Mailing List Archive

Lucene and LSA
Hi

I want to build a search-engine based on LSA (latent semantic analysis).

How much of lucene's functionality could be reused? Could I use lucene's
index to build up the "term by document" matrix? And of course, why?

TIA, Sebastian
Re: Lucene and LSA [ In reply to ]
Lucene has term vector capability, which facilitates LSA types of
things. For a field you can get back all the terms in it, their
frequency, and their positions. Enabling this requires setting the
flag appropriately on the field during indexing.

Hope that helps.

Erik


On Aug 18, 2005, at 10:42 AM, Sebastian Menge wrote:

> Hi
>
> I want to build a search-engine based on LSA (latent semantic
> analysis).
>
> How much of lucene's functionality could be reused? Could I use
> lucene's
> index to build up the "term by document" matrix? And of course, why?
>
> TIA, Sebastian
>
Re: Lucene and LSA [ In reply to ]
Ok, thanks. Are you aware of any open source tools doing something
similar?

I wonder why these quite populare things (the lucene-package and the
LSA-approach) are not too common. I expected about 10 tools/packages
doing such things.

Regards, Seb.


Am Donnerstag, den 18.08.2005, 10:57 -0400 schrieb Erik Hatcher:
> Lucene has term vector capability, which facilitates LSA types of
> things. For a field you can get back all the terms in it, their
> frequency, and their positions. Enabling this requires setting the
> flag appropriately on the field during indexing.
>
> Hope that helps.
>
> Erik
>
>
> On Aug 18, 2005, at 10:42 AM, Sebastian Menge wrote:
>
> > Hi
> >
> > I want to build a search-engine based on LSA (latent semantic
> > analysis).
> >
> > How much of lucene's functionality could be reused? Could I use
> > lucene's
> > index to build up the "term by document" matrix? And of course, why?
> >
> > TIA, Sebastian
> >
>