Mailing List Archive

Interesting idea
Adding support to Lucene for Nilsimsa seems like a cool idea...

http://ixazon.dynip.com/~cmeclax/nilsimsa.html

The index would be the hash and one could use Lucene to rank searches based
on the Nilsimsa rating of the results...

-jon


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Interesting idea [ In reply to ]
Jon Scott Stevens wrote:
> Adding support to Lucene for Nilsimsa seems like a cool idea...
>
> http://ixazon.dynip.com/~cmeclax/nilsimsa.html
>
> The index would be the hash and one could use Lucene to rank searches based
> on the Nilsimsa rating of the results...

Nilsimsa employs a very different model than Lucene. So this would
require a re-write of the indexing and search portions of Lucene, which
is most of the code.

Nilsimsa appears to use what is called a "signature file" approach in
the literature, while Lucene uses an "inverted file". A search on
Google for "signature file versus inverted index" turns up a paper by
Zobel et. al. which concludes:

Our conclusions are unequivocal. For typical document indexing
applications, current signature file techniques do not perform well
compared to current implementations of inverted file indexes.

See: http://www.cs.columbia.edu/~pirot/cs6111/Readings/zobel98.pdf

Doug


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Interesting idea [ In reply to ]
on 7/10/02 9:35 AM, "Doug Cutting" <cutting@lucene.com> wrote:

> Nilsimsa appears to use what is called a "signature file" approach in
> the literature, while Lucene uses an "inverted file". A search on
> Google for "signature file versus inverted index" turns up a paper by
> Zobel et. al. which concludes:
>
> Our conclusions are unequivocal. For typical document indexing
> applications, current signature file techniques do not perform well
> compared to current implementations of inverted file indexes.
>
> See: http://www.cs.columbia.edu/~pirot/cs6111/Readings/zobel98.pdf
>
> Doug

Wow! Great response Doug. =) Learn something new every day!

-jon


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Interesting idea [ In reply to ]
+1 -- Doug is a great source of information on all things indexing
related. Reading Doug's emails and articles is
very educational.

Jon Scott Stevens wrote:

>on 7/10/02 9:35 AM, "Doug Cutting" <cutting@lucene.com> wrote:
>
>
>
>>Nilsimsa appears to use what is called a "signature file" approach in
>>the literature, while Lucene uses an "inverted file". A search on
>>Google for "signature file versus inverted index" turns up a paper by
>>Zobel et. al. which concludes:
>>
>> Our conclusions are unequivocal. For typical document indexing
>> applications, current signature file techniques do not perform well
>> compared to current implementations of inverted file indexes.
>>
>>See: http://www.cs.columbia.edu/~pirot/cs6111/Readings/zobel98.pdf
>>
>>Doug
>>
>>
>
>Wow! Great response Doug. =) Learn something new every day!
>
>-jon
>
>
>--
>To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>
>
>
>




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>