Mailing List Archive

cvs commit: jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball package.html SnowballAnalyzer.java SnowballFilter.java
cutting 2002/12/20 15:05:20

Modified: contributions/snowball build.xml default.properties
contributions/snowball/src/java/org/apache/lucene/analysis/snowball
SnowballAnalyzer.java SnowballFilter.java
Added: contributions/snowball/src/java overview.html
contributions/snowball/src/java/net/sf/snowball package.html
contributions/snowball/src/java/net/sf/snowball/ext
package.html
contributions/snowball/src/java/org/apache/lucene/analysis/snowball
package.html
Log:
Improved javadoc for Snowball stemmer code.

Revision Changes Path
1.2 +1 -3 jakarta-lucene-sandbox/contributions/snowball/build.xml

Index: build.xml
===================================================================
RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/snowball/build.xml,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- build.xml 20 Dec 2002 22:39:43 -0000 1.1
+++ build.xml 20 Dec 2002 23:05:19 -0000 1.2
@@ -27,8 +27,6 @@
<!-- Stuff needed by all targets -->
<!-- ====================================================== -->
<target name="init">
- <mkdir dir="${bin.dir}"/>
-
<mkdir dir="${build.dir}"/>
<mkdir dir="${build.classes}"/>

@@ -134,7 +132,7 @@
<javadoc
sourcepath="${src.dir}"
overview="${src.dir}/overview.html"
- packagenames="${javadoc.packages}"
+ packagenames="*"
destdir="${build.javadoc}"
author="true"
version="true"



1.2 +0 -1 jakarta-lucene-sandbox/contributions/snowball/default.properties

Index: default.properties
===================================================================
RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/snowball/default.properties,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- default.properties 20 Dec 2002 22:39:43 -0000 1.1
+++ default.properties 20 Dec 2002 23:05:19 -0000 1.2
@@ -14,7 +14,6 @@

javadoc.link.java=http://java.sun.com/j2se/1.4.1/docs/api/
javadoc.link.lucene=http://jakarta.apache.org/lucene/docs/api/
-javadoc.packages=org.apache.lucene.analysis.snowball.*

snowball.cvsroot=:pserver:cvsuser@cvs.tartarus.org:/home/cvs
snowball.root=snowball/website



1.1 jakarta-lucene-sandbox/contributions/snowball/src/java/overview.html

Index: overview.html
===================================================================
<html>
<body>
Snowball stemmers for Lucene
</body>
</html>



1.1 jakarta-lucene-sandbox/contributions/snowball/src/java/net/sf/snowball/package.html

Index: package.html
===================================================================
<html>
<body>
Snowball system classes.
</body>
</html>



1.1 jakarta-lucene-sandbox/contributions/snowball/src/java/net/sf/snowball/ext/package.html

Index: package.html
===================================================================
<html>
<body>
Snowball generated stemmer classes.
</body>
</html>



1.2 +16 -9 jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/SnowballAnalyzer.java

Index: SnowballAnalyzer.java
===================================================================
RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/SnowballAnalyzer.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- SnowballAnalyzer.java 20 Dec 2002 22:39:44 -0000 1.1
+++ SnowballAnalyzer.java 20 Dec 2002 23:05:19 -0000 1.2
@@ -57,23 +57,30 @@
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.standard.*;

+import net.sf.snowball.ext.*;
+
import java.io.Reader;
import java.util.Hashtable;

/** Filters {@link StandardTokenizer} with {@link StandardFilter}, {@link
- * LowerCaseFilter}, {@link SnowballFilter} and {@link StopFilter}. */
+ * LowerCaseFilter}, {@link StopFilter} and {@link SnowballFilter}.
+ *
+ * Available stemmers are listed in {@link net.sf.snowball.ext}. The name of a
+ * stemmer is the part of the class name before "Stemmer", e.g., the stemmer in
+ * {@link EnglishStemmer} is named "English".
+ */
public class SnowballAnalyzer extends Analyzer {
- private String language;
+ private String name;
private Hashtable stopTable;

- /** Builds an analyzer with the given stop words. */
- public SnowballAnalyzer(String language) {
- this.language = language;
+ /** Builds the named analyzer with no stop words. */
+ public SnowballAnalyzer(String name) {
+ this.name = name;
}

- /** Builds an analyzer with the given stop words. */
- public SnowballAnalyzer(String language, String[] stopWords) {
- this(language);
+ /** Builds the named analyzer with the given stop words. */
+ public SnowballAnalyzer(String name, String[] stopWords) {
+ this(name);
stopTable = StopFilter.makeStopTable(stopWords);
}

@@ -85,7 +92,7 @@
result = new LowerCaseFilter(result);
if (stopTable != null)
result = new StopFilter(result, stopTable);
- result = new SnowballFilter(result, language);
+ result = new SnowballFilter(result, name);
return result;
}
}



1.2 +14 -4 jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/SnowballFilter.java

Index: SnowballFilter.java
===================================================================
RCS file: /home/cvs/jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/SnowballFilter.java,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- SnowballFilter.java 20 Dec 2002 22:39:44 -0000 1.1
+++ SnowballFilter.java 20 Dec 2002 23:05:19 -0000 1.2
@@ -59,13 +59,18 @@
import java.lang.reflect.Method;

import net.sf.snowball.SnowballProgram;
+import net.sf.snowball.ext.*;

import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;

-/**
-*/
+/** A filter that stems words using a Snowball-generated stemmer.
+ *
+ * Available stemmers are listed in {@link net.sf.snowball.ext}. The name of a
+ * stemmer is the part of the class name before "Stemmer", e.g., the stemmer in
+ * {@link EnglishStemmer} is named "English".
+ */

public class SnowballFilter extends TokenFilter {
private static final Object [] EMPTY_ARGS = new Object[0];
@@ -73,11 +78,16 @@
private SnowballProgram stemmer;
private Method stemMethod;

- public SnowballFilter(TokenStream in, String language) {
+ /** Construct the named stemming filter.
+ *
+ * @param in the input tokens to stem
+ * @param in the name of a stemmer
+ */
+ public SnowballFilter(TokenStream in, String name) {
this.input = in;
try {
Class stemClass =
- Class.forName("net.sf.snowball.ext." + language + "Stemmer");
+ Class.forName("net.sf.snowball.ext." + name + "Stemmer");
stemmer = (SnowballProgram) stemClass.newInstance();
stemMethod = stemClass.getMethod("stem", new Class[0]);
} catch (Exception e) {



1.1 jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/package.html

Index: package.html
===================================================================
<html>
<body>
Lucene analyzer that uses Snowball stemmers.
</body>
</html>




--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: cvs commit: jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball package.html SnowballAnalyzer.java SnowballFilter.java [ In reply to ]
We had this in Lucene Sandbox? I never saw it committed, weird.

I can't get it from the repository, any idea why?

cvs server: failed to create lock directory for
`/home/cvs/jakarta-lucene-sandbox/contributions/snowball'
(/home/cvs/jakarta-lucene-sandbox/contributions/snowball/#cvs.lock):
Permission denied
cvs server: failed to obtain dir lock in repository
`/home/cvs/jakarta-lucene-sandbox/contributions/snowball'
cvs [server aborted]: read lock failed - giving up

Otis





--- cutting@apache.org wrote:
> cutting 2002/12/20 15:05:20
>
> Modified: contributions/snowball build.xml default.properties
>
> contributions/snowball/src/java/org/apache/lucene/analysis/snowball
> SnowballAnalyzer.java SnowballFilter.java
> Added: contributions/snowball/src/java overview.html
> contributions/snowball/src/java/net/sf/snowball
> package.html
> contributions/snowball/src/java/net/sf/snowball/ext
> package.html
>
> contributions/snowball/src/java/org/apache/lucene/analysis/snowball
> package.html
> Log:
> Improved javadoc for Snowball stemmer code.
>
> Revision Changes Path
> 1.2 +1 -3
> jakarta-lucene-sandbox/contributions/snowball/build.xml
>
> Index: build.xml
> ===================================================================
> RCS file:
> /home/cvs/jakarta-lucene-sandbox/contributions/snowball/build.xml,v
> retrieving revision 1.1
> retrieving revision 1.2
> diff -u -r1.1 -r1.2
> --- build.xml 20 Dec 2002 22:39:43 -0000 1.1
> +++ build.xml 20 Dec 2002 23:05:19 -0000 1.2
> @@ -27,8 +27,6 @@
> <!-- Stuff needed by all targets -->
> <!-- ====================================================== -->
> <target name="init">
> - <mkdir dir="${bin.dir}"/>
> -
> <mkdir dir="${build.dir}"/>
> <mkdir dir="${build.classes}"/>
>
> @@ -134,7 +132,7 @@
> <javadoc
> sourcepath="${src.dir}"
> overview="${src.dir}/overview.html"
> - packagenames="${javadoc.packages}"
> + packagenames="*"
> destdir="${build.javadoc}"
> author="true"
> version="true"
>
>
>
> 1.2 +0 -1
> jakarta-lucene-sandbox/contributions/snowball/default.properties
>
> Index: default.properties
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene-sandbox/contributions/snowball/default.properties,v
> retrieving revision 1.1
> retrieving revision 1.2
> diff -u -r1.1 -r1.2
> --- default.properties 20 Dec 2002 22:39:43 -0000 1.1
> +++ default.properties 20 Dec 2002 23:05:19 -0000 1.2
> @@ -14,7 +14,6 @@
>
> javadoc.link.java=http://java.sun.com/j2se/1.4.1/docs/api/
> javadoc.link.lucene=http://jakarta.apache.org/lucene/docs/api/
> -javadoc.packages=org.apache.lucene.analysis.snowball.*
>
> snowball.cvsroot=:pserver:cvsuser@cvs.tartarus.org:/home/cvs
> snowball.root=snowball/website
>
>
>
> 1.1
> jakarta-lucene-sandbox/contributions/snowball/src/java/overview.html
>
> Index: overview.html
> ===================================================================
> <html>
> <body>
> Snowball stemmers for Lucene
> </body>
> </html>
>
>
>
> 1.1
>
jakarta-lucene-sandbox/contributions/snowball/src/java/net/sf/snowball/package.html
>
> Index: package.html
> ===================================================================
> <html>
> <body>
> Snowball system classes.
> </body>
> </html>
>
>
>
> 1.1
>
jakarta-lucene-sandbox/contributions/snowball/src/java/net/sf/snowball/ext/package.html
>
> Index: package.html
> ===================================================================
> <html>
> <body>
> Snowball generated stemmer classes.
> </body>
> </html>
>
>
>
> 1.2 +16 -9
>
jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/SnowballAnalyzer.java
>
> Index: SnowballAnalyzer.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/SnowballAnalyzer.java,v
> retrieving revision 1.1
> retrieving revision 1.2
> diff -u -r1.1 -r1.2
> --- SnowballAnalyzer.java 20 Dec 2002 22:39:44 -0000 1.1
> +++ SnowballAnalyzer.java 20 Dec 2002 23:05:19 -0000 1.2
> @@ -57,23 +57,30 @@
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.analysis.standard.*;
>
> +import net.sf.snowball.ext.*;
> +
> import java.io.Reader;
> import java.util.Hashtable;
>
> /** Filters {@link StandardTokenizer} with {@link StandardFilter},
> {@link
> - * LowerCaseFilter}, {@link SnowballFilter} and {@link
> StopFilter}. */
> + * LowerCaseFilter}, {@link StopFilter} and {@link
> SnowballFilter}.
> + *
> + * Available stemmers are listed in {@link net.sf.snowball.ext}.
> The name of a
> + * stemmer is the part of the class name before "Stemmer", e.g.,
> the stemmer in
> + * {@link EnglishStemmer} is named "English".
> + */
> public class SnowballAnalyzer extends Analyzer {
> - private String language;
> + private String name;
> private Hashtable stopTable;
>
> - /** Builds an analyzer with the given stop words. */
> - public SnowballAnalyzer(String language) {
> - this.language = language;
> + /** Builds the named analyzer with no stop words. */
> + public SnowballAnalyzer(String name) {
> + this.name = name;
> }
>
> - /** Builds an analyzer with the given stop words. */
> - public SnowballAnalyzer(String language, String[] stopWords) {
> - this(language);
> + /** Builds the named analyzer with the given stop words. */
> + public SnowballAnalyzer(String name, String[] stopWords) {
> + this(name);
> stopTable = StopFilter.makeStopTable(stopWords);
> }
>
> @@ -85,7 +92,7 @@
> result = new LowerCaseFilter(result);
> if (stopTable != null)
> result = new StopFilter(result, stopTable);
> - result = new SnowballFilter(result, language);
> + result = new SnowballFilter(result, name);
> return result;
> }
> }
>
>
>
> 1.2 +14 -4
>
jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/SnowballFilter.java
>
> Index: SnowballFilter.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/SnowballFilter.java,v
> retrieving revision 1.1
> retrieving revision 1.2
> diff -u -r1.1 -r1.2
> --- SnowballFilter.java 20 Dec 2002 22:39:44 -0000 1.1
> +++ SnowballFilter.java 20 Dec 2002 23:05:19 -0000 1.2
> @@ -59,13 +59,18 @@
> import java.lang.reflect.Method;
>
> import net.sf.snowball.SnowballProgram;
> +import net.sf.snowball.ext.*;
>
> import org.apache.lucene.analysis.Token;
> import org.apache.lucene.analysis.TokenFilter;
> import org.apache.lucene.analysis.TokenStream;
>
> -/**
> -*/
> +/** A filter that stems words using a Snowball-generated stemmer.
> + *
> + * Available stemmers are listed in {@link net.sf.snowball.ext}.
> The name of a
> + * stemmer is the part of the class name before "Stemmer", e.g.,
> the stemmer in
> + * {@link EnglishStemmer} is named "English".
> + */
>
> public class SnowballFilter extends TokenFilter {
> private static final Object [] EMPTY_ARGS = new Object[0];
> @@ -73,11 +78,16 @@
> private SnowballProgram stemmer;
> private Method stemMethod;
>
> - public SnowballFilter(TokenStream in, String language) {
> + /** Construct the named stemming filter.
> + *
> + * @param in the input tokens to stem
> + * @param in the name of a stemmer
> + */
> + public SnowballFilter(TokenStream in, String name) {
> this.input = in;
> try {
> Class stemClass =
> - Class.forName("net.sf.snowball.ext." + language +
> "Stemmer");
> + Class.forName("net.sf.snowball.ext." + name + "Stemmer");
> stemmer = (SnowballProgram) stemClass.newInstance();
> stemMethod = stemClass.getMethod("stem", new Class[0]);
> } catch (Exception e) {
>
>
>
> 1.1
>
jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball/package.html
>
> Index: package.html
> ===================================================================
> <html>
> <body>
> Lucene analyzer that uses Snowball stemmers.
> </body>
> </html>
>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: cvs commit: jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball package.html SnowballAnalyzer.java SnowballFilter.java [ In reply to ]
Otis Gospodnetic wrote:
> We had this in Lucene Sandbox? I never saw it committed, weird.

I just committed it today. The commit message bounced because it was
too big.

> I can't get it from the repository, any idea why?

Some protections were wrong. I think I fixed it. Try now.

Doug



--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: cvs commit: jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball package.html SnowballAnalyzer.java SnowballFilter.java [ In reply to ]
Yeah, works now. I wonder about SnowballAnalyzer and SnowballFilter
classes.
The ctor of the later uses introspection to instantiate the appropriate
Stemmer.
In most use cases that will be the same Stemmer from call to call.
Seems like redundant work and objects created.
Wouldn't it be better to have SnowballFilter 'cache' instances of
previously instantiated Stemmers?
I guess that would require that Snowball's Stemmers are thread
safe....are they?

Otis


--- Doug Cutting <cutting@lucene.com> wrote:
> Otis Gospodnetic wrote:
> > We had this in Lucene Sandbox? I never saw it committed, weird.
>
> I just committed it today. The commit message bounced because it was
>
> too big.
>
> > I can't get it from the repository, any idea why?
>
> Some protections were wrong. I think I fixed it. Try now.
>
> Doug
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: cvs commit: jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball package.html SnowballAnalyzer.java SnowballFilter.java [ In reply to ]
Otis Gospodnetic wrote:
> I wonder about SnowballAnalyzer and SnowballFilter
> classes.
> The ctor of the later uses introspection to instantiate the appropriate
> Stemmer.
> In most use cases that will be the same Stemmer from call to call.
> Seems like redundant work and objects created.
> Wouldn't it be better to have SnowballFilter 'cache' instances of
> previously instantiated Stemmers?
> I guess that would require that Snowball's Stemmers are thread
> safe....are they?

Compared to all of the tokens & strings that will be allocated when it
is used, the allocation of the stemmer should not be significant. And
the stemmers are not thread safe anyway.

I don't particularly like the use of introspection either. I copied it
from Snowball's sample code. Unfortunately there's no other way to do
this without modifying the Snowball code, which I'd rather not do.
Currrently this project incorporates the Snowball code as-is, so that
if/when the Snowball project updates things it should be very easy to
integrate those updates.

This project is still a work in progress. I want to do some
benchmarking, more testing and add better documentation before I make a
release and announce its availability. If the benchmarking shows major
performance problems, then I may have to look at optimizing the Snowball
code, but I hope to avoid that.

Doug


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>