Mailing List Archive: Diffs for enabling query rewriting

Diffs for enabling query rewriting

Oct 23, 2002, 2:38 PM

Post #1 of 6 (652 views)

Enclosed you find the diffs I promised for enabling query rewriting.

This also enables tools such as the HTML term highlighter
(http://www.iq-computing.de/lucene/highlight.jsp). There's one difference to
the white paper there: I didn't want to make arrays public, so getClauses()
in BooleanClause only returns an iterator. The same with getTerms() in
PhraseQuery. I have included my version of LuceneTools.java as presented on
the website I mentioned.

I've also got an example for query rewriting, but since it uses an external
library, I've left it out here.

Regards,

Clemens

--------------------------------------
http://www.cmarschner.net

Re: Diffs for enabling query rewriting [ In reply to ]

Oct 29, 2002, 8:11 AM

Post #2 of 6 (637 views)

Could any commiter please have a look at the diffs I posted a week ago?

--Clemens

----- Original Message -----
From: "Clemens Marschner" <cmad@lanlab.de>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Wednesday, October 23, 2002 10:38 PM
Subject: Diffs for enabling query rewriting

> Enclosed you find the diffs I promised for enabling query rewriting.
>
> This also enables tools such as the HTML term highlighter
> (http://www.iq-computing.de/lucene/highlight.jsp). There's one difference
to
> the white paper there: I didn't want to make arrays public, so
getClauses()
> in BooleanClause only returns an iterator. The same with getTerms() in
> PhraseQuery. I have included my version of LuceneTools.java as presented
on
> the website I mentioned.
>
> I've also got an example for query rewriting, but since it uses an
external
> library, I've left it out here.
>
> Regards,
>
> Clemens
>
>
>
>
> --------------------------------------
> http://www.cmarschner.net
>

----------------------------------------------------------------------------
----

> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Diffs for enabling query rewriting [ In reply to ]

otis_gospodnetic at yahoo

Nov 6, 2002, 11:01 PM

Post #3 of 6 (627 views)

Any Lucene developers looked at this yet?
+1s anyone? -1s anyone?

Clemens - I may have to ask you about a new set of diffs, sorry, I just
touched a bunch of classes, before I realized that this email contains
so many diffs.

Otis

--- Clemens Marschner <cmad@lanlab.de> wrote:
> Enclosed you find the diffs I promised for enabling query rewriting.
>
> This also enables tools such as the HTML term highlighter
> (http://www.iq-computing.de/lucene/highlight.jsp). There's one
> difference to
> the white paper there: I didn't want to make arrays public, so
> getClauses()
> in BooleanClause only returns an iterator. The same with getTerms()
> in
> PhraseQuery. I have included my version of LuceneTools.java as
> presented on
> the website I mentioned.
>
> I've also got an example for query rewriting, but since it uses an
> external
> library, I've left it out here.
>
> Regards,
>
> Clemens
>
>
>
>
> --------------------------------------
> http://www.cmarschner.net
> > /*
>
> Lucene-Highlighting – Lucene utilities to highlight terms in texts
> Copyright (C) 2001 Maik Schreiber
>
> This library is free software; you can redistribute it and/or modify
> it
> under the terms of the GNU Lesser General Public License as published
> by
> the Free Software Foundation; either version 2.1 of the License, or
> (at your option) any later version.
>
> This library is distributed in the hope that it will be useful, but
> WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General
> Public
> License for more details.
>
> You should have received a copy of the GNU Lesser General Public
> License along with this library; if not, write to the Free Software
> Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
> USA
>
> */
>
> package de.iqcomputing.lucene;
>
> import java.io.*;
> import java.util.*;
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.index.*;
> import org.apache.lucene.search.*;
>
>
> /**
> * Contains miscellaneous utility methods for use with Lucene.
> *
> * @version $Id: LuceneTools.java,v 1.5 2001/10/16 07:25:55 mickey
> Exp $
> * @author Maik Schreiber (mailto: bZ@iq-computing.de)
> */
> public final class LuceneTools
> {
> /** LuceneTools must not be instantiated directly. */
> private LuceneTools() {}
>
>
> /**
> * Highlights a text in accordance to a given query.
> *
> * @param text text to highlight terms in
> * @param highlighter TermHighlighter to use to highlight terms in
> the text
> * @param query Query which contains the terms to be
> highlighted in the text
> * @param analyzer Analyzer used to construct the Query
> *
> * @return highlighted text
> */
> public static final String highlightTerms(String text,
> TermHighlighter highlighter, Query query,
> Analyzer analyzer) throws IOException
> {
> StringBuffer newText = new StringBuffer();
> TokenStream stream = null;
>
> try
> {
> HashSet terms = new HashSet();
> org.apache.lucene.analysis.Token token;
> String tokenText;
> int startOffset;
> int endOffset;
> int lastEndOffset = 0;
>
> // get terms in query
> getTerms(query, terms, false);
>
> stream = analyzer.tokenStream(new StringReader(text));
> while ((token = stream.next()) != null)
> {
> startOffset = token.startOffset();
> endOffset = token.endOffset();
> tokenText = text.substring(startOffset, endOffset);
>
> // append text between end of last token (or beginning of
> text) and start of current token
> if (startOffset > lastEndOffset)
> newText.append(text.substring(lastEndOffset, startOffset));
>
> // does query contain current token?
> if (terms.contains(token.termText()))
> newText.append(highlighter.highlightTerm(tokenText));
> else
> newText.append(tokenText);
>
> lastEndOffset = endOffset;
> }
>
> // append text after end of last token
> if (lastEndOffset < text.length())
> newText.append(text.substring(lastEndOffset));
>
> return newText.toString();
> }
> finally
> {
> if (stream != null)
> {
> try
> {
> stream.close();
> }
> catch (Exception e) {}
> }
> }
> }
>
> /**
> * Extracts all term texts of a given Query. Term texts will be
> returned in lower-case.
> *
> * @param query Query to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> public static final void getTerms(Query query, HashSet terms,
> boolean prohibited)
> throws IOException
> {
> if (query instanceof BooleanQuery)
> getTermsFromBooleanQuery((BooleanQuery) query, terms,
> prohibited);
> else if (query instanceof PhraseQuery)
> getTermsFromPhraseQuery((PhraseQuery) query, terms);
> else if (query instanceof TermQuery)
> getTermsFromTermQuery((TermQuery) query, terms);
> else if (query instanceof PrefixQuery)
> getTermsFromPrefixQuery((PrefixQuery) query, terms,
> prohibited);
> else if (query instanceof RangeQuery)
> getTermsFromRangeQuery((RangeQuery) query, terms, prohibited);
> else if (query instanceof MultiTermQuery)
> getTermsFromMultiTermQuery((MultiTermQuery) query, terms,
> prohibited);
> }
>
> /**
> * Extracts all term texts of a given BooleanQuery. Term texts will
> be returned in lower-case.
> *
> * @param query BooleanQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> private static final void getTermsFromBooleanQuery(BooleanQuery
> query, HashSet terms,
> boolean prohibited) throws IOException
> {
> Iterator queryClauses = query.getClauses();
> while(queryClauses.hasNext())
> {
> BooleanClause cl = (BooleanClause)queryClauses.next();
> if (prohibited || cl.prohibited)
> getTerms(cl.query, terms, prohibited);
> }
> }
>
> /**
> * Extracts all term texts of a given PhraseQuery. Term texts will
> be returned in lower-case.
> *
> * @param query PhraseQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be put
> into (Elements: String)
> */
> private static final void getTermsFromPhraseQuery(PhraseQuery
> query, HashSet terms)
> {
> Iterator queryTerms = query.getTerms();
> int i;
>
> while(queryTerms.hasNext())
> terms.add(getTermsFromTerm((Term)queryTerms.next()));
> }
>
> /**
> * Extracts all term texts of a given TermQuery. Term texts will be
> returned in lower-case.
> *
> * @param query TermQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be put
> into (Elements: String)
> */
> private static final void getTermsFromTermQuery(TermQuery query,
> HashSet terms)
> {
> terms.add(getTermsFromTerm(query.getTerm()));
> }
>
> /**
> * Extracts all term texts of a given MultiTermQuery. Term texts
> will be returned in lower-case.
> *
> * @param query MultiTermQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> private static final void getTermsFromMultiTermQuery(MultiTermQuery
> query, HashSet terms,
> boolean prohibited) throws IOException
> {
> getTerms(query.getQuery(), terms, prohibited);
> }
>
> /**
> * Extracts all term texts of a given PrefixQuery. Term texts will
> be returned in lower-case.
> *
> * @param query PrefixQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> private static final void getTermsFromPrefixQuery(PrefixQuery
> query, HashSet terms,
> boolean prohibited) throws IOException
> {
> getTerms(query.getQuery(), terms, prohibited);
> }
>
> /**
> * Extracts all term texts of a given RangeQuery. Term texts will
> be returned in lower-case.
> *
> * @param query RangeQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> private static final void getTermsFromRangeQuery(RangeQuery query,
> HashSet terms,
> boolean prohibited) throws IOException
> {
> getTerms(query.getQuery(), terms, prohibited);
> }
>
> /**
> * Extracts the term of a given Term. The term will be returned in
> lower-case.
> *
> * @param term Term to extract term from
> *
> * @return the Term's term text
> */
> private static final String getTermsFromTerm(Term term)
> {
> return term.text().toLowerCase();
> }
> }
>

> ATTACHMENT part 3 application/octet-stream name=BooleanClause.diff

> ATTACHMENT part 4 application/octet-stream name=BooleanQuery.diff

> ATTACHMENT part 5 application/octet-stream name=FuzzyQuery.diff

> ATTACHMENT part 6 application/octet-stream
name=PhrasePrefixQuery.diff

> ATTACHMENT part 7 application/octet-stream name=PhraseQuery.diff

> ATTACHMENT part 8 application/octet-stream name=PrefixQuery.diff

> ATTACHMENT part 9 application/octet-stream name=Query.diff

> ATTACHMENT part 10 application/octet-stream name=RangeQuery.diff

> ATTACHMENT part 11 application/octet-stream name=TermQuery.diff

> ATTACHMENT part 12 application/octet-stream name=WildcardQuery.diff

> ATTACHMENT part 13 application/octet-stream name=Term.diff
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>

__________________________________________________
Do you Yahoo!?
U2 on LAUNCH - Exclusive greatest hits videos
http://launch.yahoo.com/u2

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Diffs for enabling query rewriting [ In reply to ]

otis_gospodnetic at yahoo

Nov 10, 2002, 11:33 AM

Post #4 of 6 (629 views)

Hm, developers are not responding to this 3 week old email. :(
Clemens, could you also provide some unit tests with this?

Thanks,
Otis

--- Clemens Marschner <cmad@lanlab.de> wrote:
> Enclosed you find the diffs I promised for enabling query rewriting.
>
> This also enables tools such as the HTML term highlighter
> (http://www.iq-computing.de/lucene/highlight.jsp). There's one
> difference to
> the white paper there: I didn't want to make arrays public, so
> getClauses()
> in BooleanClause only returns an iterator. The same with getTerms()
> in
> PhraseQuery. I have included my version of LuceneTools.java as
> presented on
> the website I mentioned.
>
> I've also got an example for query rewriting, but since it uses an
> external
> library, I've left it out here.
>
> Regards,
>
> Clemens
>
>
>
>
> --------------------------------------
> http://www.cmarschner.net
> > /*
>
> Lucene-Highlighting – Lucene utilities to highlight terms in texts
> Copyright (C) 2001 Maik Schreiber
>
> This library is free software; you can redistribute it and/or modify
> it
> under the terms of the GNU Lesser General Public License as published
> by
> the Free Software Foundation; either version 2.1 of the License, or
> (at your option) any later version.
>
> This library is distributed in the hope that it will be useful, but
> WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General
> Public
> License for more details.
>
> You should have received a copy of the GNU Lesser General Public
> License along with this library; if not, write to the Free Software
> Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
> USA
>
> */
>
> package de.iqcomputing.lucene;
>
> import java.io.*;
> import java.util.*;
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.index.*;
> import org.apache.lucene.search.*;
>
>
> /**
> * Contains miscellaneous utility methods for use with Lucene.
> *
> * @version $Id: LuceneTools.java,v 1.5 2001/10/16 07:25:55 mickey
> Exp $
> * @author Maik Schreiber (mailto: bZ@iq-computing.de)
> */
> public final class LuceneTools
> {
> /** LuceneTools must not be instantiated directly. */
> private LuceneTools() {}
>
>
> /**
> * Highlights a text in accordance to a given query.
> *
> * @param text text to highlight terms in
> * @param highlighter TermHighlighter to use to highlight terms in
> the text
> * @param query Query which contains the terms to be
> highlighted in the text
> * @param analyzer Analyzer used to construct the Query
> *
> * @return highlighted text
> */
> public static final String highlightTerms(String text,
> TermHighlighter highlighter, Query query,
> Analyzer analyzer) throws IOException
> {
> StringBuffer newText = new StringBuffer();
> TokenStream stream = null;
>
> try
> {
> HashSet terms = new HashSet();
> org.apache.lucene.analysis.Token token;
> String tokenText;
> int startOffset;
> int endOffset;
> int lastEndOffset = 0;
>
> // get terms in query
> getTerms(query, terms, false);
>
> stream = analyzer.tokenStream(new StringReader(text));
> while ((token = stream.next()) != null)
> {
> startOffset = token.startOffset();
> endOffset = token.endOffset();
> tokenText = text.substring(startOffset, endOffset);
>
> // append text between end of last token (or beginning of
> text) and start of current token
> if (startOffset > lastEndOffset)
> newText.append(text.substring(lastEndOffset, startOffset));
>
> // does query contain current token?
> if (terms.contains(token.termText()))
> newText.append(highlighter.highlightTerm(tokenText));
> else
> newText.append(tokenText);
>
> lastEndOffset = endOffset;
> }
>
> // append text after end of last token
> if (lastEndOffset < text.length())
> newText.append(text.substring(lastEndOffset));
>
> return newText.toString();
> }
> finally
> {
> if (stream != null)
> {
> try
> {
> stream.close();
> }
> catch (Exception e) {}
> }
> }
> }
>
> /**
> * Extracts all term texts of a given Query. Term texts will be
> returned in lower-case.
> *
> * @param query Query to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> public static final void getTerms(Query query, HashSet terms,
> boolean prohibited)
> throws IOException
> {
> if (query instanceof BooleanQuery)
> getTermsFromBooleanQuery((BooleanQuery) query, terms,
> prohibited);
> else if (query instanceof PhraseQuery)
> getTermsFromPhraseQuery((PhraseQuery) query, terms);
> else if (query instanceof TermQuery)
> getTermsFromTermQuery((TermQuery) query, terms);
> else if (query instanceof PrefixQuery)
> getTermsFromPrefixQuery((PrefixQuery) query, terms,
> prohibited);
> else if (query instanceof RangeQuery)
> getTermsFromRangeQuery((RangeQuery) query, terms, prohibited);
> else if (query instanceof MultiTermQuery)
> getTermsFromMultiTermQuery((MultiTermQuery) query, terms,
> prohibited);
> }
>
> /**
> * Extracts all term texts of a given BooleanQuery. Term texts will
> be returned in lower-case.
> *
> * @param query BooleanQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> private static final void getTermsFromBooleanQuery(BooleanQuery
> query, HashSet terms,
> boolean prohibited) throws IOException
> {
> Iterator queryClauses = query.getClauses();
> while(queryClauses.hasNext())
> {
> BooleanClause cl = (BooleanClause)queryClauses.next();
> if (prohibited || cl.prohibited)
> getTerms(cl.query, terms, prohibited);
> }
> }
>
> /**
> * Extracts all term texts of a given PhraseQuery. Term texts will
> be returned in lower-case.
> *
> * @param query PhraseQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be put
> into (Elements: String)
> */
> private static final void getTermsFromPhraseQuery(PhraseQuery
> query, HashSet terms)
> {
> Iterator queryTerms = query.getTerms();
> int i;
>
> while(queryTerms.hasNext())
> terms.add(getTermsFromTerm((Term)queryTerms.next()));
> }
>
> /**
> * Extracts all term texts of a given TermQuery. Term texts will be
> returned in lower-case.
> *
> * @param query TermQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be put
> into (Elements: String)
> */
> private static final void getTermsFromTermQuery(TermQuery query,
> HashSet terms)
> {
> terms.add(getTermsFromTerm(query.getTerm()));
> }
>
> /**
> * Extracts all term texts of a given MultiTermQuery. Term texts
> will be returned in lower-case.
> *
> * @param query MultiTermQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> private static final void getTermsFromMultiTermQuery(MultiTermQuery
> query, HashSet terms,
> boolean prohibited) throws IOException
> {
> getTerms(query.getQuery(), terms, prohibited);
> }
>
> /**
> * Extracts all term texts of a given PrefixQuery. Term texts will
> be returned in lower-case.
> *
> * @param query PrefixQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> private static final void getTermsFromPrefixQuery(PrefixQuery
> query, HashSet terms,
> boolean prohibited) throws IOException
> {
> getTerms(query.getQuery(), terms, prohibited);
> }
>
> /**
> * Extracts all term texts of a given RangeQuery. Term texts will
> be returned in lower-case.
> *
> * @param query RangeQuery to extract term texts from
> * @param terms HashSet where extracted term texts should be
> put into (Elements: String)
> * @param prohibited <code>true</code> to extract "prohibited"
> terms, too
> */
> private static final void getTermsFromRangeQuery(RangeQuery query,
> HashSet terms,
> boolean prohibited) throws IOException
> {
> getTerms(query.getQuery(), terms, prohibited);
> }
>
> /**
> * Extracts the term of a given Term. The term will be returned in
> lower-case.
> *
> * @param term Term to extract term from
> *
> * @return the Term's term text
> */
> private static final String getTermsFromTerm(Term term)
> {
> return term.text().toLowerCase();
> }
> }
>

> ATTACHMENT part 3 application/octet-stream name=BooleanClause.diff

> ATTACHMENT part 4 application/octet-stream name=BooleanQuery.diff

> ATTACHMENT part 5 application/octet-stream name=FuzzyQuery.diff

> ATTACHMENT part 6 application/octet-stream
name=PhrasePrefixQuery.diff

> ATTACHMENT part 7 application/octet-stream name=PhraseQuery.diff

> ATTACHMENT part 8 application/octet-stream name=PrefixQuery.diff

> ATTACHMENT part 9 application/octet-stream name=Query.diff

> ATTACHMENT part 10 application/octet-stream name=RangeQuery.diff

> ATTACHMENT part 11 application/octet-stream name=TermQuery.diff

> ATTACHMENT part 12 application/octet-stream name=WildcardQuery.diff

> ATTACHMENT part 13 application/octet-stream name=Term.diff
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>

__________________________________________________
Do you Yahoo!?
U2 on LAUNCH - Exclusive greatest hits videos
http://launch.yahoo.com/u2

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Diffs for enabling query rewriting [ In reply to ]

Nov 10, 2002, 1:12 PM

Post #5 of 6 (639 views)

I see what I can do.

--C.

----- Original Message -----
From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Cc: "Aaron Galea" <agale@nextgen.net.mt>
Sent: Sunday, November 10, 2002 7:33 PM
Subject: Re: Diffs for enabling query rewriting

> Hm, developers are not responding to this 3 week old email. :(
> Clemens, could you also provide some unit tests with this?
>
> Thanks,
> Otis
>
>
> --- Clemens Marschner <cmad@lanlab.de> wrote:
> > Enclosed you find the diffs I promised for enabling query rewriting.
> >
> > This also enables tools such as the HTML term highlighter
> > (http://www.iq-computing.de/lucene/highlight.jsp). There's one
> > difference to
> > the white paper there: I didn't want to make arrays public, so
> > getClauses()
> > in BooleanClause only returns an iterator. The same with getTerms()
> > in
> > PhraseQuery. I have included my version of LuceneTools.java as
> > presented on
> > the website I mentioned.
> >
> > I've also got an example for query rewriting, but since it uses an
> > external
> > library, I've left it out here.
> >
> > Regards,
> >
> > Clemens
> >
> >
> >
> >
> > --------------------------------------
> > http://www.cmarschner.net
> > > /*
> >
> > Lucene-Highlighting - Lucene utilities to highlight terms in texts
> > Copyright (C) 2001 Maik Schreiber
> >
> > This library is free software; you can redistribute it and/or modify
> > it
> > under the terms of the GNU Lesser General Public License as published
> > by
> > the Free Software Foundation; either version 2.1 of the License, or
> > (at your option) any later version.
> >
> > This library is distributed in the hope that it will be useful, but
> > WITHOUT ANY WARRANTY; without even the implied warranty of
> > MERCHANTABILITY
> > or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General
> > Public
> > License for more details.
> >
> > You should have received a copy of the GNU Lesser General Public
> > License along with this library; if not, write to the Free Software
> > Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
> > USA
> >
> > */
> >
> > package de.iqcomputing.lucene;
> >
> > import java.io.*;
> > import java.util.*;
> > import org.apache.lucene.analysis.*;
> > import org.apache.lucene.index.*;
> > import org.apache.lucene.search.*;
> >
> >
> > /**
> > * Contains miscellaneous utility methods for use with Lucene.
> > *
> > * @version $Id: LuceneTools.java,v 1.5 2001/10/16 07:25:55 mickey
> > Exp $
> > * @author Maik Schreiber (mailto: bZ@iq-computing.de)
> > */
> > public final class LuceneTools
> > {
> > /** LuceneTools must not be instantiated directly. */
> > private LuceneTools() {}
> >
> >
> > /**
> > * Highlights a text in accordance to a given query.
> > *
> > * @param text text to highlight terms in
> > * @param highlighter TermHighlighter to use to highlight terms in
> > the text
> > * @param query Query which contains the terms to be
> > highlighted in the text
> > * @param analyzer Analyzer used to construct the Query
> > *
> > * @return highlighted text
> > */
> > public static final String highlightTerms(String text,
> > TermHighlighter highlighter, Query query,
> > Analyzer analyzer) throws IOException
> > {
> > StringBuffer newText = new StringBuffer();
> > TokenStream stream = null;
> >
> > try
> > {
> > HashSet terms = new HashSet();
> > org.apache.lucene.analysis.Token token;
> > String tokenText;
> > int startOffset;
> > int endOffset;
> > int lastEndOffset = 0;
> >
> > // get terms in query
> > getTerms(query, terms, false);
> >
> > stream = analyzer.tokenStream(new StringReader(text));
> > while ((token = stream.next()) != null)
> > {
> > startOffset = token.startOffset();
> > endOffset = token.endOffset();
> > tokenText = text.substring(startOffset, endOffset);
> >
> > // append text between end of last token (or beginning of
> > text) and start of current token
> > if (startOffset > lastEndOffset)
> > newText.append(text.substring(lastEndOffset, startOffset));
> >
> > // does query contain current token?
> > if (terms.contains(token.termText()))
> > newText.append(highlighter.highlightTerm(tokenText));
> > else
> > newText.append(tokenText);
> >
> > lastEndOffset = endOffset;
> > }
> >
> > // append text after end of last token
> > if (lastEndOffset < text.length())
> > newText.append(text.substring(lastEndOffset));
> >
> > return newText.toString();
> > }
> > finally
> > {
> > if (stream != null)
> > {
> > try
> > {
> > stream.close();
> > }
> > catch (Exception e) {}
> > }
> > }
> > }
> >
> > /**
> > * Extracts all term texts of a given Query. Term texts will be
> > returned in lower-case.
> > *
> > * @param query Query to extract term texts from
> > * @param terms HashSet where extracted term texts should be
> > put into (Elements: String)
> > * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> > */
> > public static final void getTerms(Query query, HashSet terms,
> > boolean prohibited)
> > throws IOException
> > {
> > if (query instanceof BooleanQuery)
> > getTermsFromBooleanQuery((BooleanQuery) query, terms,
> > prohibited);
> > else if (query instanceof PhraseQuery)
> > getTermsFromPhraseQuery((PhraseQuery) query, terms);
> > else if (query instanceof TermQuery)
> > getTermsFromTermQuery((TermQuery) query, terms);
> > else if (query instanceof PrefixQuery)
> > getTermsFromPrefixQuery((PrefixQuery) query, terms,
> > prohibited);
> > else if (query instanceof RangeQuery)
> > getTermsFromRangeQuery((RangeQuery) query, terms, prohibited);
> > else if (query instanceof MultiTermQuery)
> > getTermsFromMultiTermQuery((MultiTermQuery) query, terms,
> > prohibited);
> > }
> >
> > /**
> > * Extracts all term texts of a given BooleanQuery. Term texts will
> > be returned in lower-case.
> > *
> > * @param query BooleanQuery to extract term texts from
> > * @param terms HashSet where extracted term texts should be
> > put into (Elements: String)
> > * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> > */
> > private static final void getTermsFromBooleanQuery(BooleanQuery
> > query, HashSet terms,
> > boolean prohibited) throws IOException
> > {
> > Iterator queryClauses = query.getClauses();
> > while(queryClauses.hasNext())
> > {
> > BooleanClause cl = (BooleanClause)queryClauses.next();
> > if (prohibited || cl.prohibited)
> > getTerms(cl.query, terms, prohibited);
> > }
> > }
> >
> > /**
> > * Extracts all term texts of a given PhraseQuery. Term texts will
> > be returned in lower-case.
> > *
> > * @param query PhraseQuery to extract term texts from
> > * @param terms HashSet where extracted term texts should be put
> > into (Elements: String)
> > */
> > private static final void getTermsFromPhraseQuery(PhraseQuery
> > query, HashSet terms)
> > {
> > Iterator queryTerms = query.getTerms();
> > int i;
> >
> > while(queryTerms.hasNext())
> > terms.add(getTermsFromTerm((Term)queryTerms.next()));
> > }
> >
> > /**
> > * Extracts all term texts of a given TermQuery. Term texts will be
> > returned in lower-case.
> > *
> > * @param query TermQuery to extract term texts from
> > * @param terms HashSet where extracted term texts should be put
> > into (Elements: String)
> > */
> > private static final void getTermsFromTermQuery(TermQuery query,
> > HashSet terms)
> > {
> > terms.add(getTermsFromTerm(query.getTerm()));
> > }
> >
> > /**
> > * Extracts all term texts of a given MultiTermQuery. Term texts
> > will be returned in lower-case.
> > *
> > * @param query MultiTermQuery to extract term texts from
> > * @param terms HashSet where extracted term texts should be
> > put into (Elements: String)
> > * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> > */
> > private static final void getTermsFromMultiTermQuery(MultiTermQuery
> > query, HashSet terms,
> > boolean prohibited) throws IOException
> > {
> > getTerms(query.getQuery(), terms, prohibited);
> > }
> >
> > /**
> > * Extracts all term texts of a given PrefixQuery. Term texts will
> > be returned in lower-case.
> > *
> > * @param query PrefixQuery to extract term texts from
> > * @param terms HashSet where extracted term texts should be
> > put into (Elements: String)
> > * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> > */
> > private static final void getTermsFromPrefixQuery(PrefixQuery
> > query, HashSet terms,
> > boolean prohibited) throws IOException
> > {
> > getTerms(query.getQuery(), terms, prohibited);
> > }
> >
> > /**
> > * Extracts all term texts of a given RangeQuery. Term texts will
> > be returned in lower-case.
> > *
> > * @param query RangeQuery to extract term texts from
> > * @param terms HashSet where extracted term texts should be
> > put into (Elements: String)
> > * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> > */
> > private static final void getTermsFromRangeQuery(RangeQuery query,
> > HashSet terms,
> > boolean prohibited) throws IOException
> > {
> > getTerms(query.getQuery(), terms, prohibited);
> > }
> >
> > /**
> > * Extracts the term of a given Term. The term will be returned in
> > lower-case.
> > *
> > * @param term Term to extract term from
> > *
> > * @return the Term's term text
> > */
> > private static final String getTermsFromTerm(Term term)
> > {
> > return term.text().toLowerCase();
> > }
> > }
> >
>
> > ATTACHMENT part 3 application/octet-stream name=BooleanClause.diff
>
>
> > ATTACHMENT part 4 application/octet-stream name=BooleanQuery.diff
>
>
> > ATTACHMENT part 5 application/octet-stream name=FuzzyQuery.diff
>
>
> > ATTACHMENT part 6 application/octet-stream
> name=PhrasePrefixQuery.diff
>
>
> > ATTACHMENT part 7 application/octet-stream name=PhraseQuery.diff
>
>
> > ATTACHMENT part 8 application/octet-stream name=PrefixQuery.diff
>
>
> > ATTACHMENT part 9 application/octet-stream name=Query.diff
>
>
> > ATTACHMENT part 10 application/octet-stream name=RangeQuery.diff
>
>
> > ATTACHMENT part 11 application/octet-stream name=TermQuery.diff
>
>
> > ATTACHMENT part 12 application/octet-stream name=WildcardQuery.diff
>
>
> > ATTACHMENT part 13 application/octet-stream name=Term.diff
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>
>
> __________________________________________________
> Do you Yahoo!?
> U2 on LAUNCH - Exclusive greatest hits videos
> http://launch.yahoo.com/u2
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Re: Diffs for enabling query rewriting [ In reply to ]

cutting at lucene

Nov 12, 2002, 11:59 AM

Post #6 of 6 (627 views)

Sorry its taken me so long to look at these.

Clemens Marschner wrote:
> Enclosed you find the diffs I promised for enabling query rewriting.
>
> This also enables tools such as the HTML term highlighter
> (http://www.iq-computing.de/lucene/highlight.jsp). There's one difference to
> the white paper there: I didn't want to make arrays public, so getClauses()
> in BooleanClause only returns an iterator. The same with getTerms() in
> PhraseQuery. I have included my version of LuceneTools.java as presented on
> the website I mentioned.

A few issues I spotted:

You don't need to clone Terms. In the public API they're read-only,
like Strings. Term should also not expose setField() or setText()
methods. (Also, you ignore the result of String.intern() in these
methods, but that doesn't matter, since you shouldn't implement them
anyway...)

You should not expose PhraseQuery.setField() either. This field is set
implicitly when you add terms to a phrase query and should never be set
otherwise. Similarly with other classes, where you've added setter methods.

In general, I would be more comfortable with these changes if they only
added get methods, not set methods, as the set methods create lots of
room for abuse. Re-writing can be done by copying, so you shouldn't
need the setters. Does that make sense?

Doug

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>