Mailing List Archive

Lucene Hunspell Spell checker
Hello,
I'm trying to create a java-wrapper library to lang-detect and then spell check for the detected languages. I'm currently using Apache Tika as a lang detector and i'm trying to use lucene.analysis.hunspell package for spell-checking, as i've i seen it supports many languages.My issue is, i cant get good accuracy for some languages that have "special" characters.e.g in sweedish im checking the word bästa, which is classified as misspelled and the word basta is suggested instead.bästa exists in the dictionary, so i think this is some encoding issue.

I'm on windows, w/ lucene 8.11.2.
Im using lucene.analysis.hunspell.Hunspell as a spellchcker
and lucene.analysis.hunspell.Dictionary to create the dicts.
I'm using .dic and .aff files from here.


Any guidance on where i should look, or how i should implement to perform spellchecks would be welcome, as i've hardly found anything :)

Thanks a lot in advance,
Thanos