Currently, dev-libs/link-grammar fails its test suite on uclibc. In
tests/test-suite.log, the text "link-grammar: Error: Affix dictionary:
QUOTES: Invalid utf8 character" is found. By looking up the "Invalid
utf8 character" message in the link-grammar source code, I found out
it's the call to mbsrtowcs that fails. I tried to check in uClibc
sources how that function can be configured, and from the documents
inside uClibc sources, I learned ctype and wchar support is a mess.
Because link-grammar loads from UTF-8, and that UTF-8 can be
translated to wide character strings using bit masks and bit shifts
(no big fat table needed), I made up my own implementation of
mbsrtowcs for UTF-8, reading the manual pages for mbsrtowcs and
mbrtowc and the Wikipedia article on UTF-8.
But before integrating in link-grammar or somewhere else, I would like
a code review on it. The attached source code is MIT-licensed, so I
can put in any open source project I want without worrying about the
license issues, so do you.
--
René Rhéaume
tests/test-suite.log, the text "link-grammar: Error: Affix dictionary:
QUOTES: Invalid utf8 character" is found. By looking up the "Invalid
utf8 character" message in the link-grammar source code, I found out
it's the call to mbsrtowcs that fails. I tried to check in uClibc
sources how that function can be configured, and from the documents
inside uClibc sources, I learned ctype and wchar support is a mess.
Because link-grammar loads from UTF-8, and that UTF-8 can be
translated to wide character strings using bit masks and bit shifts
(no big fat table needed), I made up my own implementation of
mbsrtowcs for UTF-8, reading the manual pages for mbsrtowcs and
mbrtowc and the Wikipedia article on UTF-8.
But before integrating in link-grammar or somewhere else, I would like
a code review on it. The attached source code is MIT-licensed, so I
can put in any open source project I want without worrying about the
license issues, so do you.
--
René Rhéaume