Hi all,
I have a configuration file that lists multiple queries, of all different types,
and that lists words to be ignored.
Each of these lists is user configured, variable in length and content.
I know that, in general, unless the ignore word is in the query it won’t match,
but I need to be able to handle wildcard, fuzzy, and Regex, queries which might match.
What I need to be able to do is ignore the words in the ignore list,
but only when they match terms the query would match.
For example: if the query is ‘free*’ and ‘freedom’ should be ignored,
I could modify the query to be ‘free*’ and not freedom.
But if ‘liberty’ is also to be ignored, I don’t want to add ‘and not liberty’ to that query
because that could produce false negatives for documents containing free and liberty.
I think what I need to do is:
for each query
for each ignore word
if the query would match the ignore word,
add ‘and not ignore word’ to the query
How can I test if a query would match an ignore word without putting the ignore words into an index
and searching the index?
This seems like overkill.
To make matters worse, for a query like A and B and C,
this won’t match an index of ignore words that contains C, but not A or B.
Thanks in advance, for any suggestions or advice,
David Shifflett
I have a configuration file that lists multiple queries, of all different types,
and that lists words to be ignored.
Each of these lists is user configured, variable in length and content.
I know that, in general, unless the ignore word is in the query it won’t match,
but I need to be able to handle wildcard, fuzzy, and Regex, queries which might match.
What I need to be able to do is ignore the words in the ignore list,
but only when they match terms the query would match.
For example: if the query is ‘free*’ and ‘freedom’ should be ignored,
I could modify the query to be ‘free*’ and not freedom.
But if ‘liberty’ is also to be ignored, I don’t want to add ‘and not liberty’ to that query
because that could produce false negatives for documents containing free and liberty.
I think what I need to do is:
for each query
for each ignore word
if the query would match the ignore word,
add ‘and not ignore word’ to the query
How can I test if a query would match an ignore word without putting the ignore words into an index
and searching the index?
This seems like overkill.
To make matters worse, for a query like A and B and C,
this won’t match an index of ignore words that contains C, but not A or B.
Thanks in advance, for any suggestions or advice,
David Shifflett