Hello,
I was wondering if there is already a method to return results when the
query contains typos.
For exemple the user searches for "nokopol" and we have a document that
has "nikopol" in the title and in the body, I'd like to be able to return
it.
I'd like to match for typos only on titles and eventually a few other
fields (for performance reason I don't think matching typos in body text
is appropriate).
Google has such a feature and I think they are doing it with their
statistical data (user usually correct typos and the recognize this).
However as I don't have statistical data and I'd like to provide this ootb
I was thinking I might go with the Levenshtein distance
(http://en.wikipedia.org/wiki/Levenshtein_distance) and fortunatly there
is a perl module that does this (thanks cpan).
So my question is, is there already something I missed included in
KinoSearch (stems are quite different as far as I can tell) that does that
?
And if not, does including that Levenshtein distance calculation to return
"corrected" result set make sense at first sight ?
Thanks
Eriam
_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
I was wondering if there is already a method to return results when the
query contains typos.
For exemple the user searches for "nokopol" and we have a document that
has "nikopol" in the title and in the body, I'd like to be able to return
it.
I'd like to match for typos only on titles and eventually a few other
fields (for performance reason I don't think matching typos in body text
is appropriate).
Google has such a feature and I think they are doing it with their
statistical data (user usually correct typos and the recognize this).
However as I don't have statistical data and I'd like to provide this ootb
I was thinking I might go with the Levenshtein distance
(http://en.wikipedia.org/wiki/Levenshtein_distance) and fortunatly there
is a perl module that does this (thanks cpan).
So my question is, is there already something I missed included in
KinoSearch (stems are quite different as far as I can tell) that does that
?
And if not, does including that Levenshtein distance calculation to return
"corrected" result set make sense at first sight ?
Thanks
Eriam
_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch