Mailing List Archive

Searching by bit masks
Hello,

I am currently evaluating Lucene to see if it would be appropriate to
replace my company's current search software. So far everything has been
looking great, however there is one requirement that I am not too certain
about.

What we need to do is to be able to store a bit mask specifying various
filter flags for a document in the index and then search this field by
specifying another bit mask with desired filters, returning documents that
have any of the specified flags set. In other words, we are doing a bitwise
OR on the stored filter bit mask and the specified filter bit mask and if it
is non-zero, we want to return the document.

Before I started toying around with various options myself, I wanted to see
if any of you good folks in the Lucene community had some suggestions for an
efficient way to implement this.

We currently need to index ~8,000,000 documents. We have several filter
flag fields, the most important of which currently has 7 possible flags with
any combination of the flags being valid. The number of flags is expected
to increase rather rapidly in the near future.

My preemptive thanks for your suggestions.

Lawrence Taylor
Senior Software Engineer
Employon
--
View this message in context: http://www.nabble.com/Searching-by-bit-masks-tf2603692.html#a7264721
Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Searching by bit masks [ In reply to ]
On 11/9/06, ltaylor.employon <ltaylor@employon.com> wrote:
> I am currently evaluating Lucene to see if it would be appropriate to
> replace my company's current search software. So far everything has been
> looking great, however there is one requirement that I am not too certain
> about.
>
> What we need to do is to be able to store a bit mask specifying various
> filter flags for a document in the index and then search this field by
> specifying another bit mask with desired filters, returning documents that
> have any of the specified flags set. In other words, we are doing a bitwise
> OR on the stored filter bit mask and the specified filter bit mask and if it
> is non-zero, we want to return the document.

Lucene maintains an inverted index, so you don't need a bit mask...
you can actually use symbolic values.

doc {
id=1
tags = tag1 tag3 tag7
}

doc {
id = 2
tags = tag1 tag2 tag5 tag9
}

Then you can search via a BooleanQuery:

tags:(tag1 OR tag2 OR tag7)

If you are new to Lucene, you might check out Solr first. If nothing
else, it would be a gentle introduction to Lucene, and you could build
a custom Lucene implementation later if it doesn't meet your needs.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server