Mailing List Archive

stop-words
Hi list,

I'm planning to compile a list of stop-words for Polish Wikipedia.
I wanted to base on English list which exists somewhere for sure,
as it is incorporated in English Wikipedia. I browsed through CVS,
searched through Wikipedia namespaces and found none.
Is it a copyrighted material?
If no, can somebody give me hint where I can find it
or place it eg. somewhere in Wikipedia: namespace or on meta.
It will be usefull for all languages, I'm sure!
Eg. it could be a part of help on serching.

And if Polish stop-words are collected, can they be incorporated
into software of Polish Wikipedia before upgrade to Phase III?

Regards
User:Youandme


----------------------------------------------------------------------
Znajdz swoja druga polowe... >>> http://link.interia.pl/f1681
stop-words [ In reply to ]
Hi list,

I'm planning to compile a list of stop words for Polish Wikipedia.
I wanted to base on English list which exists somewhere for sure,
as it is incorporated in English Wikipedia. I browsed through CVS,
searched through Wikipedia namespaces and found none.
Is it a copyrighted material?
If no, can somebody give me hint where I can find it
or place it eg. somewhere in Wikipedia: namespace or on meta.
It will be usefull for all languages, I'm sure!
Eg. it could be a part of help on serching.

And if Polish stop words are collected, can they be incorporated
into software of Polish Wikipedia before upgrade to Phase III?

Regards
User:Youandme
Re: stop-words [ In reply to ]
Hello,

youandme@poczta.fm writes:

> I'm planning to compile a list of stop words for Polish Wikipedia.
> I wanted to base on English list which exists somewhere for sure,
> as it is incorporated in English Wikipedia. I browsed through CVS,
> searched through Wikipedia namespaces and found none.

Here is the stopwordlist. Should be also in the cvs.

greetings,
elian
Re: stop-words [ In reply to ]
On 15 Nov 2002 at 21:23, elian wrote:
> Hello,
>
> youandme@poczta.fm writes:
>
> > I'm planning to compile a list of stop words for Polish Wikipedia. I
> > wanted to base on English list which exists somewhere for sure, as
> > it is incorporated in English Wikipedia. I browsed through CVS,
> > searched through Wikipedia namespaces and found none.
>
> Here is the stopwordlist. Should be also in the cvs.
>
> greetings,
> elian

Thank you elian,
Shame on me... I know I should start wearing contact lenses :-)
Hmmm... I grepped only "stopwords" and variations, and here I see "Fullstop".
Anyway, looking at syntax I see that it is rather not localized.
No En suffixes. So non-English Wikipedias should replace that file
with their own copy or any "switch" based on language setup is planned?
Greetings
User:Youandme



----------------------------------------------------------------------
Znajdz swoja druga polowe... >>> http://link.interia.pl/f1681
Re: stop-words [ In reply to ]
<youandme@poczta.fm> skribis:

> I'm planning to compile a list of stop words for Polish Wikipedia. [...]
> And if Polish stop words are collected, can they be incorporated
> into software of Polish Wikipedia before upgrade to Phase III?

If I remember correctly, the stopwords are used
internally by the database and cannot simply be changed.

The list is duplicated in the PHP software for
not searching stop words (since this would not
give any results). So changing the list in the
software (for example by translating it) does
not give any advantage, but instead can give
unexpected results when searching.

If anyone knows better, please correct me.

Paul