Hi Marvin,
I'was googling more for compound word splitting and maybe there's a
solution which could work for KinoSearch too.
There's a program called TSearch V2
(http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/) which is a
PostgreSQL extension which enhances PostGres by adding an inverted
fulltext search indexes and adds new functions to PostgreSQL. One
function is 'lexize' which you must pass the encoding and a word which
returns the compound words if you have a dictionary which is tagged for
compound words, but there are some dictionaries for swedish, german and
other languages although I don't know if the other dictionaries are
tagged too.
The extension is written in C and the code is not too big, I wonder if
you could take a look at it and decide if it would be possible to create
a new Analyzer maybe Analyzer::CompoundSplitter for KinoSearch. This
would be really great and would work for all available dictionaries with
compound tagging. There's also a helper script which can create ispell
dictionaries out of myspell dictionaries (they are from openoffice) and
also a helper script which can tag dictionaries for compound words.
Cheers,
Marc
I'was googling more for compound word splitting and maybe there's a
solution which could work for KinoSearch too.
There's a program called TSearch V2
(http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/) which is a
PostgreSQL extension which enhances PostGres by adding an inverted
fulltext search indexes and adds new functions to PostgreSQL. One
function is 'lexize' which you must pass the encoding and a word which
returns the compound words if you have a dictionary which is tagged for
compound words, but there are some dictionaries for swedish, german and
other languages although I don't know if the other dictionaries are
tagged too.
The extension is written in C and the code is not too big, I wonder if
you could take a look at it and decide if it would be possible to create
a new Analyzer maybe Analyzer::CompoundSplitter for KinoSearch. This
would be really great and would work for all available dictionaries with
compound tagging. There's also a helper script which can create ispell
dictionaries out of myspell dictionaries (they are from openoffice) and
also a helper script which can tag dictionaries for compound words.
Cheers,
Marc