Mailing List Archive

RE: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysi s/standard StandardAnalyzer.java
> From: otis@apache.org [mailto:otis@apache.org]
>
> - 'De-finalized' the class per Doug's suggestion to make it
> easy to use
> different lists of stop words.

Thanks!

> - Added a few more words to the stop word list (MS'
> contribution via Alan).

I don't think we should do that here. This could break any application
which is already using this stop list when it upgrades Lucene, since it will
no longer be possible to search for these words. What we need is a facility
to load stop lists from file-based resources, and to include a new such
resource that contains this MS stop list. But I don't think we should
change the default stop lists. What do others think? Is that too
conservative?

> - Re-indented the whole class.

I don't think we should re-indent whole files. It makes it hard to figure
out what's changed over time. We should try to use a similar indenting
style, but if someone has written the code, they have the right to indent
it. I generally only re-indent code if I'm committing it for the first
time, or for those parts of the code that I change.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysi s/standard StandardAnalyzer.java [ In reply to ]
Doug Cutting wrote:

>> - Added a few more words to the stop word list (MS'
>>contribution via Alan).
>>
>
>I don't think we should do that here. This could break any application
>which is already using this stop list when it upgrades Lucene, since it will
>no longer be possible to search for these words. What we need is a facility
>to load stop lists from file-based resources, and to include a new such
>resource that contains this MS stop list. But I don't think we should
>change the default stop lists. What do others think? Is that too
>conservative?
>
Sounds good. The real issue is making it easy to use custom stop lists,
not making Lucene compatible with today's flavor of some other search
engine. Besides, isn't there already a constructor that takes an array
of stop words? File reading stuff would be good, but I don't think we
have to have it before the release. Is that too conservative? :)

Dmitry



--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: cvs commit: jakarta-lucene/src/java/org/apache/lucene/analysi s/standard StandardAnalyzer.java [ In reply to ]
Hello,

> > - Added a few more words to the stop word list (MS'
> > contribution via Alan).
>
> I don't think we should do that here. This could break any
> application
> which is already using this stop list when it upgrades Lucene, since
> it will
> no longer be possible to search for these words.

Ah, didn't think of that. I'll revert to the previous revision.

> What we need is a facility
> to load stop lists from file-based resources, and to include a new
> such
> resource that contains this MS stop list. But I don't think we
> should
> change the default stop lists. What do others think? Is that too
> conservative?

Couldn't applications already use this:

/** Builds an analyzer with the given stop words. */
public StandardAnalyzer(String[] stopWords) {
stopTable = StopFilter.makeStopTable(stopWords);
}

And we leave it up to them to figure out how they get the list of stop
words to this constructor.

> > - Re-indented the whole class.
>
> I don't think we should re-indent whole files. It makes it hard to
> figure
> out what's changed over time. We should try to use a similar
> indenting
> style, but if someone has written the code, they have the right to
> indent
> it. I generally only re-indent code if I'm committing it for the
> first
> time, or for those parts of the code that I change.

I knew I was walking on thin ice when I did this. :)
I was thinking more along the lines of making all code in the project
uniform, which may be different from the original.
I prefer the former, but either one is fine with me, as long as it's
clear what we're sticking to. I think the former would not be hard to
achieve since we have a fairly small number of developers with commit
priviledges.

Otis


__________________________________________________
Do You Yahoo!?
Yahoo! Sports - Coverage of the 2002 Olympic Games
http://sports.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>