Mailing List Archive: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysis Token.java

cutting 2002/08/05 10:39:03

Modified: . CHANGES.txt
src/java/org/apache/lucene/analysis Token.java
Log:
Improved documentation.

Revision Changes Path
1.29 +16 -1 jakarta-lucene/CHANGES.txt

Index: CHANGES.txt
===================================================================
RCS file: /home/cvs/jakarta-lucene/CHANGES.txt,v
retrieving revision 1.28
retrieving revision 1.29
diff -u -r1.28 -r1.29
--- CHANGES.txt 29 Jul 2002 19:11:14 -0000 1.28
+++ CHANGES.txt 5 Aug 2002 17:39:03 -0000 1.29
@@ -58,6 +58,21 @@
for longer fields. Once the index is re-created, scores will be
as before. (cutting)

+ 13. Added new method Token.setPositionIncrement().
+
+ This permits, for the purpose of phrase searching, placing
+ multiple terms in a single position. This is useful with
+ stemmers that produce multiple possible stems for a word.
+
+ This also permits the introduction of gaps between terms, so that
+ terms which are adjacent in a token stream will not be matched by
+ and exact phrase query. This makes it possible, e.g., to build
+ an analyzer where phrases are not matched over stop words which
+ have been removed.
+
+ Finally, repeating a token with an increment of zero can also be
+ used to boost scores of matches on that token.
+

1.2 RC6

1.3 +13 -9 jakarta-lucene/src/java/org/apache/lucene/analysis/Token.java

Index: Token.java
===================================================================
RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/analysis/Token.java,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -r1.2 -r1.3
--- Token.java 5 Aug 2002 17:14:59 -0000 1.2
+++ Token.java 5 Aug 2002 17:39:03 -0000 1.3
@@ -54,6 +54,8 @@
* <http://www.apache.org/>.
*/

+import org.apache.lucene.index.TermPositions;
+
/** A Token is an occurence of a term from the text of a field. It consists of
a term's text, the start and end offset of the term in the text of the field,
and a type string.
@@ -98,19 +100,21 @@
*
* <p>The default value is one.
*
- * <p>Two common uses for this are:<ul>
+ * <p>Some common uses for this are:<ul>
*
* <li>Set it to zero to put multiple terms in the same position. This is
- * useful if, e.g., when a word has multiple stems. This way searches for
- * phrases including either stem will match this occurence. In this case,
- * all but the first stem's increment should be set to zero: the increment of
- * the first instance should be one.
+ * useful if, e.g., a word has multiple stems. Searches for phrases
+ * including either stem will match. In this case, all but the first stem's
+ * increment should be set to zero: the increment of the first instance
+ * should be one. Repeating a token with an increment of zero can also be
+ * used to boost the scores of matches on that token.
*
* <li>Set it to values greater than one to inhibit exact phrase matches.
- * If, for example, one does not want phrases to match across stop words,
- * then one could build a stop word filter that removes stop words and also
- * sets the increment to the number of stop words removed before each
- * non-stop word.
+ * If, for example, one does not want phrases to match across removed stop
+ * words, then one could build a stop word filter that removes stop words and
+ * also sets the increment to the number of stop words removed before each
+ * non-stop word. Then exact phrase queries will only match when the terms
+ * occur with no intervening stop words.
*
* </ul>
* @see TermPositions

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>