Mailing List Archive

Highlighting terms, new white paper
Hi, I just wanted to let you know that I've set up a new white paper
along with complete source code. The new code also takes care of the new
Query subclasses PrefixQuery, RangeQuery and MultiTermQuery.

You can get the white paper (in PDF format) and source at
http://www.iq-computing.de/lucene/lucene-highlight.tar.gz
If you need US Letter format instead of A4, try
http://www.iq-computing.de/lucene/lucene-highlight-us.tar.gz

--
Maik Schreiber
IQ Computing - http://www.iq-computing.de
mailto: info@iq-computing.de
Re: Highlighting terms, new white paper [ In reply to ]
Dear Maik,

> Hi, I just wanted to let you know that I've set up a new white paper
> along with complete source code. The new code also takes care of the new
> Query subclasses PrefixQuery, RangeQuery and MultiTermQuery.

This looks fantastic - a great piece of documentation, thanks a lot!

However, when I enter a wildcard search ("wo?d" or "woo*" both cause the
error with my index), I get a NullPointerException with this trace:

java.lang.NullPointerException
at
org.apache.lucene.search.MultiTermQuery.getQuery(MultiTermQuery.java:131)
at misc.LuceneTools.getTermsFromMultiTermQuery(LuceneTools.java:190)
at misc.LuceneTools.getTerms(LuceneTools.java:131)
at misc.LuceneTools.getTermsFromBooleanQuery(LuceneTools.java:150)
at misc.LuceneTools.getTerms(LuceneTools.java:121)
at
com.grantadesign.asm.servlet.LuceneSearchResults.doGet(LuceneSearchResults.j
ava:129)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:740)

Do you ever see anything like this, as I'm a little baffled as to the
cause...

I made a tiny modification to check for enum == null in the finally block of
MultiTermQuery.getQuery(), but I get the feeling 'enum' should never be null
and that I'm doing the wrong thing...

Do you have any advice on how to get this to work?

Kind regards,

Lee Mallabone
AW: Highlighting terms, new white paper [ In reply to ]
>However, when I enter a wildcard search ("wo?d" or "woo*" both
>cause the error with my index), I get a NullPointerException
>with this trace:
>
>java.lang.NullPointerException at
>org.apache.lucene.search.MultiTermQuery.getQuery(MultiTermQuery
>.java:131)

I tested by directly creating a WildcardQuery, and it worked okay for
me.

As you can see from the trace, the exception occurs in MultiTermQuery.
As what I can see from the source, it looks like WildcardQuery.setEnum()
has not been called before. I have to admit that I don't know much about
the internals of MultiTermQuery and its subclasses, except for a basic
understanding of what they do (that is, constructing a BooleanQuery).

--
Maik Schreiber
IQ Computing - http://www.iq-computing.de
mailto: info@iq-computing.de
AW: Highlighting terms, new white paper [ In reply to ]
>However, when I enter a wildcard search ("wo?d" or "woo*" both
>cause the error with my index), I get a NullPointerException
>with this trace:
>
>java.lang.NullPointerException at
>org.apache.lucene.search.MultiTermQuery.getQuery(MultiTermQuery
>.java:131)

I tested by directly creating a WildcardQuery, and it worked okay for
me.

As you can see from the trace, the exception occurs in MultiTermQuery.
As what I can see from the source, it looks like WildcardQuery.setEnum()
has not been called before. I have to admit that I don't know much about
the internals of MultiTermQuery and its subclasses, except for a basic
understanding of what they do (that is, constructing a BooleanQuery).

--
Maik Schreiber
IQ Computing - http://www.iq-computing.de
mailto: info@iq-computing.de
Re: Highlighting terms, new white paper [ In reply to ]
> I tested by directly creating a WildcardQuery, and it worked okay for
> me.

Ah. I'm using the queryParser, so it was slightly less obvious what queries
were created for what.

> As you can see from the trace, the exception occurs in MultiTermQuery.
> As what I can see from the source, it looks like WildcardQuery.setEnum()
> has not been called before. I have to admit that I don't know much about
> the internals of MultiTermQuery and its subclasses, except for a basic
> understanding of what they do (that is, constructing a BooleanQuery).

I've just done a whole bunch of code browsing/digging, and it's as you
suspected - the WildcardQuery.prepare() method must be executed so that the
enum is set correctly. It turns out that this gets done when
Searcher.search() is called.

So I think the summary of this is that LuceneTools.getTerms() will crash if
a query containing wildcards has been parsed but not executed?

This certainly isn't the end of the world, but maybe it should be noted in
the docs?

I'd offer a code fix, but I suspect WildcardQuery.prepare() isn't public for
a good reason...

Thanks again,

Lee Mallabone.
Re: Highlighting terms, new white paper [ In reply to ]
--- Lee Mallabone <lee@grantadesign.com> wrote:
> I've just done a whole bunch of code
> browsing/digging, and it's as you
> suspected - the WildcardQuery.prepare() method must
> be executed so that the
> enum is set correctly. It turns out that this gets
> done when
> Searcher.search() is called.
>
> So I think the summary of this is that
> LuceneTools.getTerms() will crash if
> a query containing wildcards has been parsed but not
> executed?

Yes, that's because PrefixQuery, WildcardQuery and
FuzzyQuery all needs to look through the index to
construct the enum that lists the terms that matches.

MultiTermQuery comes in after that to convert that
enum of terms into a BooleanQuery, which happens in
getQuery(). Thus I don't think null pointer will be
thrown if getQuery is only called after search().

In the meantime, I'll put in additional safeguards to
ensure that null pointer isn't thrown if search()
hasn't been called.


> This certainly isn't the end of the world, but maybe
> it should be noted in
> the docs?

It's already documented. Look at the javadoc for the
MultiTermQuery class.




__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com
AW: Highlighting terms, new white paper [ In reply to ]
>Yes, that's because PrefixQuery, WildcardQuery and
>FuzzyQuery all needs to look through the index to
>construct the enum that lists the terms that matches.
>
>MultiTermQuery comes in after that to convert that
>enum of terms into a BooleanQuery, which happens in
>getQuery(). Thus I don't think null pointer will be
>thrown if getQuery is only called after search().

I will put a revised revision of my white paper on my server regarding
Query objects that use multiple internal Query objects to represent
themselves to the outside world. This will also include RangeQuery which
you didn't mention above.

In addition to this, I've put together a new "Projekte" (=projects) area
on my site so that you can learn about the current white paper's version
more easily.

--
Maik Schreiber
IQ Computing - http://www.iq-computing.de
mailto: info@iq-computing.de