Hello !
I love to use KinoSearch. So far It's doing everything we need for our
project. I wonder if you could suggest me a way how to retrieve
Similar documents and Duplicates. We index few web-sites and sometimes
the documents are posted with different URLs. How to solve this?
One of the issues we also have is not related to KinoSearch. We would
like to remove some parts of the page which are similar (let's say we
want to remove navigation menu shared on all pages). Remove the
content is quite easy, but how would you detect what parts are
repeated across pages? Diff algorithm? What kind of approach would you
suggest?
Thank you,
Vlad
_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
I love to use KinoSearch. So far It's doing everything we need for our
project. I wonder if you could suggest me a way how to retrieve
Similar documents and Duplicates. We index few web-sites and sometimes
the documents are posted with different URLs. How to solve this?
One of the issues we also have is not related to KinoSearch. We would
like to remove some parts of the page which are similar (let's say we
want to remove navigation menu shared on all pages). Remove the
content is quite easy, but how would you detect what parts are
repeated across pages? Diff algorithm? What kind of approach would you
suggest?
Thank you,
Vlad
_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch