Mailing List Archive

N-gram
At what point do I add n-grams? Does the order in which I add n-grams
affect exact phrase queries later? My questions are

(1) Should I add all the 1-grams followed by 2-grams followed by
3-grams..etc sentence by sentence OR
(2) Add all the 1 grams of entire document first before starting 2-grams
for the entire document?

What is the general accepted notion of adding n-grams of a document?

thanks,

Rajesh
Re: N-gram [ In reply to ]
Rajesh
I am not sure what your eventual goal is - but it looks like you are using
Lucene is some sort of Natural Language Processing environment - I am doing
something similar - with dotLucene. Possibly the SpanQuery is what you want
that will let you specify the Span - hence 1-gram, 2-gram etc. Email me if
you want samples (C#)
Madhu


On 7/18/05, Rajesh Munavalli <rajeshm@dessci.com> wrote:
>
> At what point do I add n-grams? Does the order in which I add n-grams
> affect exact phrase queries later? My questions are
>
> (1) Should I add all the 1-grams followed by 2-grams followed by
> 3-grams..etc sentence by sentence OR
> (2) Add all the 1 grams of entire document first before starting 2-grams
> for the entire document?
>
> What is the general accepted notion of adding n-grams of a document?
>
> thanks,
>
> Rajesh
>
>