Mailing List Archive

dev-libs/link-grammar and uclibc mbsrtowcs
Currently, dev-libs/link-grammar fails its test suite on uclibc. In
tests/test-suite.log, the text "link-grammar: Error: Affix dictionary:
QUOTES: Invalid utf8 character" is found. By looking up the "Invalid
utf8 character" message in the link-grammar source code, I found out
it's the call to mbsrtowcs that fails. I tried to check in uClibc
sources how that function can be configured, and from the documents
inside uClibc sources, I learned ctype and wchar support is a mess.

Because link-grammar loads from UTF-8, and that UTF-8 can be
translated to wide character strings using bit masks and bit shifts
(no big fat table needed), I made up my own implementation of
mbsrtowcs for UTF-8, reading the manual pages for mbsrtowcs and
mbrtowc and the Wikipedia article on UTF-8.

But before integrating in link-grammar or somewhere else, I would like
a code review on it. The attached source code is MIT-licensed, so I
can put in any open source project I want without worrying about the
license issues, so do you.

--
René Rhéaume
Re: dev-libs/link-grammar and uclibc mbsrtowcs [ In reply to ]
On 6/26/16 1:35 PM, René Rhéaume wrote:
> Currently, dev-libs/link-grammar fails its test suite on uclibc. In
> tests/test-suite.log, the text "link-grammar: Error: Affix dictionary:
> QUOTES: Invalid utf8 character" is found. By looking up the "Invalid
> utf8 character" message in the link-grammar source code, I found out
> it's the call to mbsrtowcs that fails. I tried to check in uClibc
> sources how that function can be configured, and from the documents
> inside uClibc sources, I learned ctype and wchar support is a mess.
>
> Because link-grammar loads from UTF-8, and that UTF-8 can be
> translated to wide character strings using bit masks and bit shifts
> (no big fat table needed), I made up my own implementation of
> mbsrtowcs for UTF-8, reading the manual pages for mbsrtowcs and
> mbrtowc and the Wikipedia article on UTF-8.
>
> But before integrating in link-grammar or somewhere else, I would like
> a code review on it. The attached source code is MIT-licensed, so I
> can put in any open source project I want without worrying about the
> license issues, so do you.
>

Why not make a patch against uclibc and try to get it upstream?

--
Anthony G. Basile, Ph. D.
Chair of Information Technology
D'Youville College
Buffalo, NY 14201
(716) 829-8197