Mailing List Archive

Implementation of 'NOT' in the Lucene.
Hi guys,

I'm pretty new to Lucene (so forgive my naivety) and I am having problems with the use of NOT expressions in Lucene 1.2 final.
I know this question is probably more directed to the user list but in trying to debug it I have come across some questions about the architecture and design of the search engine.

The basic scenario is this: I have two files (single line files) and am running them through the demo application.

file1: abc def

file1: def ghi

I then run the query:

-(def)

and get the result:

Searching for: -def
0 total matching documents

All other queries seems to work just fine.
This seems to imply that "NOT" doesn't work as I expect it to. My questions are:

- Should my query return anything at all?
- By the time it reaches BucketTable.collectHits() has there been any index reading at all? Because that is the place where the prohibited clause is actually checked right?

Any answers or pointers to the right direction would be greatly appreciated!
Thanks in advance!

Regards,

Minh Kama Yie

This message is intended only for the named recipient.
If you are not the intended recipient you are notified that
disclosing, copying, distributing or taking any action
in reliance on the contents of this information is strictly
prohibited.
RE: Implementation of 'NOT' in the Lucene. [ In reply to ]
Hello,
[...]
> The basic scenario is this: I have two files (single line
> files) and am running them through the demo application.
>
> file1: abc def
>
> file1: def ghi
>
> I then run the query:
>
> -(def)
> and get the result:
>
> Searching for: -def
> 0 total matching documents
[...]
Sorry, this will not work, because "NOT" is evaluated on a set of documents
selected by other
parts of the query. You query contains only this exclude filter, so no
document is selected
and the result remains empty.
See also at http://jakarta.apache.org/lucene/docs/queryparsersyntax.html the
last note about "NOT".

In order to make it work you have to write your own query parser and add an
dummy field
"dummy_field" contains the same value for all documents, e.g. "empty". Your
parser have to
change the original query "-def" into "dummy:empty -def" only if the query
contains nothing
else.
But this solutions will be probaly to slow for real applications. There is
no good way to deal
such queries for large databases.
Regards,
Wolf-Dietrich Materna

--
Wolf-Dietrich Materna
Development

empolis GmbH
Bertelsmann MOHN Media Group
Kekuléstr. 7
12489 Berlin, Germany

phone : +49-30-6780-6510
fax : +49-30-6780-6549

<mailto:Wolf-Dietrich.Materna@empolis.com> <http://www.empolis.com>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>