Hi,
I'd like to contribute to the support of Hunspell in Lucene, specifically:
* support the flags necessary for English, German, French, Spanish and
Russian dictionaries, possibly more languages later
* provide a public API to check if a word is misspelled
* mirror Hunspell's suggestion algorithm in Lucene, probably in the
"src/suggest" module
For context: I work on natural language support for IntelliJ-based IDEs.
We'd like to use Hunspell dictionaries there, but interfacing with native
binaries proved to be slow and unreliable. So we'd prefer a JVM-only
reimplementation of Hunspell spellchecker and suggester. Lucene's
Hunspell-related code currently seems closest to that goal, so we thought
we can enhance it further.
Is there anything non-obvious that I should know before diving into the
implementation?
The contribution will likely consist of many commits, dedicated to specific
subtasks or small refactorings. Should I file separate JIRA issues for each
of them, or having a single big one (e.g. "Hunspell improvements") is
enough?
Peter Gromov
I'd like to contribute to the support of Hunspell in Lucene, specifically:
* support the flags necessary for English, German, French, Spanish and
Russian dictionaries, possibly more languages later
* provide a public API to check if a word is misspelled
* mirror Hunspell's suggestion algorithm in Lucene, probably in the
"src/suggest" module
For context: I work on natural language support for IntelliJ-based IDEs.
We'd like to use Hunspell dictionaries there, but interfacing with native
binaries proved to be slow and unreliable. So we'd prefer a JVM-only
reimplementation of Hunspell spellchecker and suggester. Lucene's
Hunspell-related code currently seems closest to that goal, so we thought
we can enhance it further.
Is there anything non-obvious that I should know before diving into the
implementation?
The contribution will likely consist of many commits, dedicated to specific
subtasks or small refactorings. Should I file separate JIRA issues for each
of them, or having a single big one (e.g. "Hunspell improvements") is
enough?
Peter Gromov