Mailing List Archive

Enforcing total matches
Hey guys!

I'd like to know if there's a way to force a query to return only exact
hits, not partials/subsets. For instance:
A query "foo bar" will match on a field with "foo bar zoo". This I would
like to avoid as I'm need of removing duplicates on certain fields. Two
options considered this far, neither are very pretty:

1) Save the hash value of the field you're deduping to an extra field and
dedup on that field
2) Check the score of the returning hits (a total hit scores higher than
subset hits)

Ideas, anyone?
Re: Enforcing total matches [ In reply to ]
: I'd like to know if there's a way to force a query to return only exact
: hits, not partials/subsets. For instance:
: A query "foo bar" will match on a field with "foo bar zoo". This I would
: like to avoid as I'm need of removing duplicates on certain fields. Two
: options considered this far, neither are very pretty:

the example you give is an "exact match" on a sequence of terms in that
fields -- it sounds like what you want is to say that for certain fields,
you want the entire field to be treated as one term.

In Java Lucene this is possible using the KeywordAnalyzer both when you
index the field, and when you use the QueryParser -- i assume
it's present in other versions of Lucene as well.

If you also need to support "tokenized" term matching on the same data,
then you'll need two seperate fields: one using KeywordAnalyzer, and one
using whatever other analyzer you want.


-Hoss
Re: Enforcing total matches [ In reply to ]
Yeah, I meant field matches ofcourse. Well ok, I'll check out the Keyword
Analyzer and give it a go, thanks!

On 5/30/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
>
> : I'd like to know if there's a way to force a query to return only exact
> : hits, not partials/subsets. For instance:
> : A query "foo bar" will match on a field with "foo bar zoo". This I would
> : like to avoid as I'm need of removing duplicates on certain fields. Two
> : options considered this far, neither are very pretty:
>
> the example you give is an "exact match" on a sequence of terms in that
> fields -- it sounds like what you want is to say that for certain fields,
> you want the entire field to be treated as one term.
>
> In Java Lucene this is possible using the KeywordAnalyzer both when you
> index the field, and when you use the QueryParser -- i assume
> it's present in other versions of Lucene as well.
>
> If you also need to support "tokenized" term matching on the same data,
> then you'll need two seperate fields: one using KeywordAnalyzer, and one
> using whatever other analyzer you want.
>
>
> -Hoss
>
>