Mailing List Archive

[DBMail 0001038]: UTF8 support. Sortfield generated ignoring multibyte symbols.
The following issue has been SUBMITTED.
======================================================================
http://www.dbmail.org/mantis/view.php?id=1038
======================================================================
Reported By: ALyarskiy
Assigned To:
======================================================================
Project: DBMail
Issue ID: 1038
Category: General
Reproducibility: sometimes
Severity: major
Priority: normal
Status: new
target:
======================================================================
Date Submitted: 20-Jan-14 12:42 CET
Last Modified: 20-Jan-14 12:42 CET
======================================================================
Summary: UTF8 support. Sortfield generated ignoring multibyte
symbols.
Description:
dm_messages.c:
function _header_cache
.....
if(issubject) {
char *s, *t = dm_base_subject(value);
s = dbmail_iconv_str_to_db(t, charset);
g_strlcpy(sortfield, s, CACHE_WIDTH-1);
g_free(s);
g_free(t);
}
.....

It does not work correctly with multibyte strings.
======================================================================

Issue History
Date Modified Username Field Change
======================================================================
20-Jan-14 12:42 ALyarskiy New Issue
======================================================================

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev
[DBMail 0001038]: UTF8 support. Sortfield generated ignoring multibyte symbols. [ In reply to ]
A NOTE has been added to this issue.
======================================================================
http://www.dbmail.org/mantis/view.php?id=1038
======================================================================
Reported By: ALyarskiy
Assigned To:
======================================================================
Project: DBMail
Issue ID: 1038
Category: General
Reproducibility: sometimes
Severity: major
Priority: normal
Status: new
target:
======================================================================
Date Submitted: 20-Jan-14 12:42 CET
Last Modified: 20-Jan-14 12:51 CET
======================================================================
Summary: UTF8 support. Sortfield generated ignoring multibyte
symbols.
Description:
dm_messages.c:
function _header_cache
.....
if(issubject) {
char *s, *t = dm_base_subject(value);
s = dbmail_iconv_str_to_db(t, charset);
g_strlcpy(sortfield, s, CACHE_WIDTH-1);
g_free(s);
g_free(t);
}
.....

It does not work correctly with multibyte strings.
======================================================================

----------------------------------------------------------------------
(0003625) paul (administrator) - 20-Jan-14 12:51
http://www.dbmail.org/mantis/view.php?id=1038#c3625
----------------------------------------------------------------------
Please provide steps to reproduce, or better yet: a patch that is validated
to fix this problem by a unit-test that exersizes it.

Issue History
Date Modified Username Field Change
======================================================================
20-Jan-14 12:42 ALyarskiy New Issue
20-Jan-14 12:51 paul Note Added: 0003625
======================================================================

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev
[DBMail 0001038]: UTF8 support. Sortfield generated ignoring multibyte symbols. [ In reply to ]
A NOTE has been added to this issue.
======================================================================
http://www.dbmail.org/mantis/view.php?id=1038
======================================================================
Reported By: ALyarskiy
Assigned To:
======================================================================
Project: DBMail
Issue ID: 1038
Category: General
Reproducibility: sometimes
Severity: major
Priority: normal
Status: new
target:
======================================================================
Date Submitted: 20-Jan-14 12:42 CET
Last Modified: 22-Jan-14 08:19 CET
======================================================================
Summary: UTF8 support. Sortfield generated ignoring multibyte
symbols.
Description:
dm_messages.c:
function _header_cache
.....
if(issubject) {
char *s, *t = dm_base_subject(value);
s = dbmail_iconv_str_to_db(t, charset);
g_strlcpy(sortfield, s, CACHE_WIDTH-1);
g_free(s);
g_free(t);
}
.....

It does not work correctly with multibyte strings.
======================================================================

----------------------------------------------------------------------
(0003625) paul (administrator) - 20-Jan-14 12:51
http://www.dbmail.org/mantis/view.php?id=1038#c3625
----------------------------------------------------------------------
Please provide steps to reproduce, or better yet: a patch that is validated
to fix this problem by a unit-test that exersizes it.

----------------------------------------------------------------------
(0003626) ALyarskiy (reporter) - 22-Jan-14 08:19
http://www.dbmail.org/mantis/view.php?id=1038#c3626
----------------------------------------------------------------------
Possible way I see at this moment is to process headers through forced utf8
encoding. Like that:
char *s, *t = dm_base_subject(value);

s = dbmail_iconv_str_to_utf8(t, charset);
... PROCESSING ...
... if db_encoding != utf8:
dbmail_iconv_str_to_db()

Other way is to nail database encoding to utf8.

DB backends limitations:
Oracle supports 4-byte characters
Postgres supports 4-byte characters
SQLite supports 4-byte characters
Mysql default utf8 is 3-byte, but it can handle 4-byte characters since
version 5.5.3 by using "utf8mb4" encoding

4 byte characters are supplementary
(http://www.i18nguy.com/unicode/supplementary-test.html).

So there is possible problem with mysql version prior to 5.5.3, later
versions will require db-update to switch from 3-byte to 4-byte
(http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html).

At this moment I have patch that works with utf8 headers (postgres
backend). Will provide it right after some tests.

Issue History
Date Modified Username Field Change
======================================================================
20-Jan-14 12:42 ALyarskiy New Issue
20-Jan-14 12:51 paul Note Added: 0003625
22-Jan-14 08:19 ALyarskiy Note Added: 0003626
======================================================================

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev
[DBMail 0001038]: UTF8 support. Sortfield generated ignoring multibyte symbols. [ In reply to ]
A NOTE has been added to this issue.
======================================================================
http://www.dbmail.org/mantis/view.php?id=1038
======================================================================
Reported By: ALyarskiy
Assigned To:
======================================================================
Project: DBMail
Issue ID: 1038
Category: General
Reproducibility: sometimes
Severity: major
Priority: normal
Status: new
target:
======================================================================
Date Submitted: 20-Jan-14 12:42 CET
Last Modified: 23-Jan-14 09:04 CET
======================================================================
Summary: UTF8 support. Sortfield generated ignoring multibyte
symbols.
Description:
dm_messages.c:
function _header_cache
.....
if(issubject) {
char *s, *t = dm_base_subject(value);
s = dbmail_iconv_str_to_db(t, charset);
g_strlcpy(sortfield, s, CACHE_WIDTH-1);
g_free(s);
g_free(t);
}
.....

It does not work correctly with multibyte strings.
======================================================================

----------------------------------------------------------------------
(0003625) paul (administrator) - 20-Jan-14 12:51
http://www.dbmail.org/mantis/view.php?id=1038#c3625
----------------------------------------------------------------------
Please provide steps to reproduce, or better yet: a patch that is validated
to fix this problem by a unit-test that exersizes it.

----------------------------------------------------------------------
(0003626) ALyarskiy (reporter) - 22-Jan-14 08:19
http://www.dbmail.org/mantis/view.php?id=1038#c3626
----------------------------------------------------------------------
Possible way I see at this moment is to process headers through forced utf8
encoding. Like that:
char *s, *t = dm_base_subject(value);

s = dbmail_iconv_str_to_utf8(t, charset);
... PROCESSING ...
... if db_encoding != utf8:
dbmail_iconv_str_to_db()

Other way is to nail database encoding to utf8.

DB backends limitations:
Oracle supports 4-byte characters
Postgres supports 4-byte characters
SQLite supports 4-byte characters
Mysql default utf8 is 3-byte, but it can handle 4-byte characters since
version 5.5.3 by using "utf8mb4" encoding

4 byte characters are supplementary
(http://www.i18nguy.com/unicode/supplementary-test.html).

So there is possible problem with mysql version prior to 5.5.3, later
versions will require db-update to switch from 3-byte to 4-byte
(http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html).

At this moment I have patch that works with utf8 headers (postgres
backend). Will provide it right after some tests.

----------------------------------------------------------------------
(0003627) ALyarskiy (reporter) - 23-Jan-14 09:04
http://www.dbmail.org/mantis/view.php?id=1038#c3627
----------------------------------------------------------------------
Ok, here is patch. It is my first experience with C, so the patch should be
reviewed by a real developer =)

Patch includes:
1. New function _header_exists to check if header already exists. It is
not really cool to try to insert and check for errors.
2. UTF8 headers support. Assuming max size 4 bytes (some issues with
mysql, see previuos note).
3. Added few trace messages.

Example header:
RAW=[=?koi8-r?Q?[XXXXXX]_[33666]_=FA=C1=D0=D2=CF=D3_=CE=C1_=D4=C5=C8._=D0=CF?=

=?koi8-r?B?xMTF0tbL1SAo+sHQ0s/TIMTP0C4gyc7Gz9LNwcPJySksIM7Fy8/S0sXL1M7P?=

=?koi8-r?B?xSDazsHexc7JxSDEz9AuIMHU0snC1dTBINPPINrOwd7FzsnFzSDQzyDVzc/M?=

=?koi8-r?B?3sHOycAgxMzRINTJ0MEgxM/Hz9fP0sEsIM7FINPX0drBzs7Px88g0yDc1MnN?=
=?koi8-r?Q?_=C1=D4=D2=C9=C2=D5=D4=CF=CD?=]

Issue History
Date Modified Username Field Change
======================================================================
20-Jan-14 12:42 ALyarskiy New Issue
20-Jan-14 12:51 paul Note Added: 0003625
22-Jan-14 08:19 ALyarskiy Note Added: 0003626
23-Jan-14 09:04 ALyarskiy Note Added: 0003627
======================================================================

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev
[DBMail 0001038]: UTF8 support. Sortfield generated ignoring multibyte symbols. [ In reply to ]
A NOTE has been added to this issue.
======================================================================
http://www.dbmail.org/mantis/view.php?id=1038
======================================================================
Reported By: ALyarskiy
Assigned To:
======================================================================
Project: DBMail
Issue ID: 1038
Category: General
Reproducibility: sometimes
Severity: major
Priority: normal
Status: new
target:
======================================================================
Date Submitted: 20-Jan-14 12:42 CET
Last Modified: 23-Jan-14 09:43 CET
======================================================================
Summary: UTF8 support. Sortfield generated ignoring multibyte
symbols.
Description:
dm_messages.c:
function _header_cache
.....
if(issubject) {
char *s, *t = dm_base_subject(value);
s = dbmail_iconv_str_to_db(t, charset);
g_strlcpy(sortfield, s, CACHE_WIDTH-1);
g_free(s);
g_free(t);
}
.....

It does not work correctly with multibyte strings.
======================================================================

----------------------------------------------------------------------
(0003625) paul (administrator) - 20-Jan-14 12:51
http://www.dbmail.org/mantis/view.php?id=1038#c3625
----------------------------------------------------------------------
Please provide steps to reproduce, or better yet: a patch that is validated
to fix this problem by a unit-test that exersizes it.

----------------------------------------------------------------------
(0003626) ALyarskiy (reporter) - 22-Jan-14 08:19
http://www.dbmail.org/mantis/view.php?id=1038#c3626
----------------------------------------------------------------------
Possible way I see at this moment is to process headers through forced utf8
encoding. Like that:
char *s, *t = dm_base_subject(value);

s = dbmail_iconv_str_to_utf8(t, charset);
... PROCESSING ...
... if db_encoding != utf8:
dbmail_iconv_str_to_db()

Other way is to nail database encoding to utf8.

DB backends limitations:
Oracle supports 4-byte characters
Postgres supports 4-byte characters
SQLite supports 4-byte characters
Mysql default utf8 is 3-byte, but it can handle 4-byte characters since
version 5.5.3 by using "utf8mb4" encoding

4 byte characters are supplementary
(http://www.i18nguy.com/unicode/supplementary-test.html).

So there is possible problem with mysql version prior to 5.5.3, later
versions will require db-update to switch from 3-byte to 4-byte
(http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html).

At this moment I have patch that works with utf8 headers (postgres
backend). Will provide it right after some tests.

----------------------------------------------------------------------
(0003627) ALyarskiy (reporter) - 23-Jan-14 09:04
http://www.dbmail.org/mantis/view.php?id=1038#c3627
----------------------------------------------------------------------
Ok, here is patch. It is my first experience with C, so the patch should be
reviewed by a real developer =)

Patch includes:
1. New function _header_exists to check if header already exists. It is
not really cool to try to insert and check for errors.
2. UTF8 headers support. Assuming max size 4 bytes (some issues with
mysql, see previuos note).
3. Added few trace messages.

Example header:
RAW=[=?koi8-r?Q?[XXXXXX]_[33666]_=FA=C1=D0=D2=CF=D3_=CE=C1_=D4=C5=C8._=D0=CF?=

=?koi8-r?B?xMTF0tbL1SAo+sHQ0s/TIMTP0C4gyc7Gz9LNwcPJySksIM7Fy8/S0sXL1M7P?=

=?koi8-r?B?xSDazsHexc7JxSDEz9AuIMHU0snC1dTBINPPINrOwd7FzsnFzSDQzyDVzc/M?=

=?koi8-r?B?3sHOycAgxMzRINTJ0MEgxM/Hz9fP0sEsIM7FINPX0drBzs7Px88g0yDc1MnN?=
=?koi8-r?Q?_=C1=D4=D2=C9=C2=D5=D4=CF=CD?=]

----------------------------------------------------------------------
(0003628) paul (administrator) - 23-Jan-14 09:43
http://www.dbmail.org/mantis/view.php?id=1038#c3628
----------------------------------------------------------------------
Ok, you're on to something here, but I'm rejecting the patch for following
reasons:

- please use git-diff to generate the patch, or better yest: fork on
github, clone, hack, test, commit, push, and send me a pull request. Since
you're working off the 3.1 code that will allow easy forward porting to the
master branch.

- the patch does way too much. You're fixing a non-existing problem with
the new _header_exists function. That case is already well covered in the
code. Also, *all* queries *must* happen inside a TRY/CATCH/FINALLY block.

- I don't see any unit-tests that demonstrate the problem and the fix:
please expand tests/check_dbmail_message.c

Issue History
Date Modified Username Field Change
======================================================================
20-Jan-14 12:42 ALyarskiy New Issue
20-Jan-14 12:51 paul Note Added: 0003625
22-Jan-14 08:19 ALyarskiy Note Added: 0003626
23-Jan-14 09:04 ALyarskiy Note Added: 0003627
23-Jan-14 09:04 ALyarskiy File Added: utf8_header.patch

23-Jan-14 09:43 paul Note Added: 0003628
======================================================================

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev