Mailing List Archive

Question - Why stopwords.txt provided by smartcn contains blank lines?
Hi all,

This following line contains two blank lines, including line 56 & 58:
https://github.com/apache/lucene/blob/main/lucene/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn/smart/stopwords.txt

As a result, SmartChineseAnalyzer.getDefaultStopSet() will produce a empty
string as stop words, but it makes no sense to have empty string as stop
word right?

Much appreciated for your help!




*Regards,Jerry Chin.*
*?????????????????????????????????????????????????????*
Re: Question - Why stopwords.txt provided by smartcn contains blank lines? [ In reply to ]
Hi Jerry,

I agree, that makes no sense! Maybe the stopload loader should ignore
truly blank lines?

Also, the comments on lines 57 and 59 are confusing -- there are no
(default) English and Chinese stopwords in the file. I guess they are
placeholders.

Could you open an issue in Lucene's GitHub issue tracker (
https://github.com/apache/lucene/issues ) and let's iterate from there?

Thanks!

Mike McCandless

http://blog.mikemccandless.com


On Mon, May 15, 2023 at 5:25?AM Jerry Chin <metrxqin@gmail.com> wrote:

> Hi all,
>
> This following line contains two blank lines, including line 56 & 58:
>
> https://github.com/apache/lucene/blob/main/lucene/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn/smart/stopwords.txt
>
> As a result, SmartChineseAnalyzer.getDefaultStopSet() will produce a empty
> string as stop words, but it makes no sense to have empty string as stop
> word right?
>
> Much appreciated for your help!
>
>
>
>
> *Regards,Jerry Chin.*
> *?????????????????????????????????????????????????????*
>
Re: Question - Why stopwords.txt provided by smartcn contains blank lines? [ In reply to ]
Hi Michael,

Thanks for clarifying, I have created an issue
<https://github.com/apache/lucene/issues/12291> to follow up in Github.

Much appreciated!

On Monday, May 15, 2023, Michael McCandless <lucene@mikemccandless.com>
wrote:

> Hi Jerry,
>
> I agree, that makes no sense! Maybe the stopload loader should ignore
> truly blank lines?
>
> Also, the comments on lines 57 and 59 are confusing -- there are no
> (default) English and Chinese stopwords in the file. I guess they are
> placeholders.
>
> Could you open an issue in Lucene's GitHub issue tracker (
> https://github.com/apache/lucene/issues ) and let's iterate from there?
>
> Thanks!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, May 15, 2023 at 5:25?AM Jerry Chin <metrxqin@gmail.com> wrote:
>
>> Hi all,
>>
>> This following line contains two blank lines, including line 56 & 58:
>>
>> https://github.com/apache/lucene/blob/main/lucene/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn/smart/stopwords.txt
>>
>> As a result, SmartChineseAnalyzer.getDefaultStopSet() will produce a
>> empty
>> string as stop words, but it makes no sense to have empty string as stop
>> word right?
>>
>> Much appreciated for your help!
>>
>>
>>
>>
>> *Regards,Jerry Chin.*
>> *?????????????????????????????????????????????????????*
>>
>