By parsing the search string, we get the benefit of full boolean
search, which is definitely cool. But our current regime has one
severe downside: any search which has a single short word or stopword
or non-existent word in it will automatically fail since we assume an
implicit "AND" between all search terms. The unadorned mysql "MATCH"
operator does not have this downside: if you search for "the chinese
wall" for example, "the" will be silently ignored and you get the
expected hit.
I am wondering if we can combine the best of those worlds. This should
decrease the number of complaints about short search terms
dramatically, maybe even to the point that we can keep the current
index size.
How about this: every subsequence of the query string which doesn't
contain any +/- boolean operators (see below) is passed to the MATCH
operator as is, which assumes an implicit "give me the best matches
you can find for these terms, the more matching terms the better".
Then we could have two additional operators: + and -. If a word is
preced by +, it *must* be presents, if a word is preceded by - it
*cannot* be present. That allows to express any complicated query we
can right now, but should result in much fewer failed searches. Does
that seem feasible?
Axel
search, which is definitely cool. But our current regime has one
severe downside: any search which has a single short word or stopword
or non-existent word in it will automatically fail since we assume an
implicit "AND" between all search terms. The unadorned mysql "MATCH"
operator does not have this downside: if you search for "the chinese
wall" for example, "the" will be silently ignored and you get the
expected hit.
I am wondering if we can combine the best of those worlds. This should
decrease the number of complaints about short search terms
dramatically, maybe even to the point that we can keep the current
index size.
How about this: every subsequence of the query string which doesn't
contain any +/- boolean operators (see below) is passed to the MATCH
operator as is, which assumes an implicit "give me the best matches
you can find for these terms, the more matching terms the better".
Then we could have two additional operators: + and -. If a word is
preced by +, it *must* be presents, if a word is preceded by - it
*cannot* be present. That allows to express any complicated query we
can right now, but should result in much fewer failed searches. Does
that seem feasible?
Axel