Mailing List Archive

[Bug 3085] New: Allow UTF-8 for log output
https://bugs.exim.org/show_bug.cgi?id=3085

Bug ID: 3085
Summary: Allow UTF-8 for log output
Product: Exim
Version: N/A
Hardware: All
OS: Linux
Status: NEW
Severity: bug
Priority: medium
Component: Logging
Assignee: unallocated@exim.org
Reporter: forza@tnonline.net
CC: exim-dev@lists.exim.org

This is probably not a bug, but more of a request for comments.

I am logging to syslog instead of files. The syslog is handled by syslog-ng,
and I parse the logfiles with Fail2Ban.

The exim.conf:

### Logging
log_selector = +all
log_file_path = syslog
syslog_timestamp = false
syslog_duplication = false
syslog_processname = exim
SYSLOG_LONG_LINES = yes


No, my issue is that sometimes Fail2Ban fails to read some of the lines and
outputs a warning like this:

2024-03-17T19:23:33.870+00:00 warning fail2ban.filter[2922]: WARNING Error
decoding line from '/var/log/exim.log' with 'UTF-8'.
2024-03-17T19:23:33.870+00:00 warning fail2ban.filter[2922]: WARNING Consider
setting logencoding to appropriate encoding for this jail. Continuing to
process line ignoring invalid characters: b'2024-03-17T19:23:33.698+00:00
notice exim[5673]: [12\\21] F From: "\xbe\xe7\xb9\xcc\xbc\xf8"
<msoony@gmail.com>\n'


So, this leads me to my current question. Can Exim be set to output UTF-8
encoded logs to syslog? Apparently, the syslog format according to RFC-5425
says " MSG SHOULD be UNICODE, encoded using UTF-8", but it seems to allow plain
US-ASCII too.

https://datatracker.ietf.org/doc/html/rfc5424#section-6.4

I believe syslog-ng could handle non-UT8 messages, using flags(sanitize-utf8)
on the source, however the manual specifies:

"The HEADER part of the message must be in plain ASCII format, the parameter
values of the STRUCTURED-DATA part must be in UTF-8, while the MSG part should
be in UTF-8. The different parts of the message are explained in the following
sections."

Perhaps I am overthinking all of this. I'd appreciate some thoughts on correct
logging configurations.

--
You are receiving this mail because:
You are on the CC list for the bug.

--
## subscription configuration (requires account):
## https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@lists.exim.org
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: [Bug 3085] New: Allow UTF-8 for log output [ In reply to ]
On Sat, 23 Mar 2024, Exim Bugzilla via Exim-dev wrote:

> https://bugs.exim.org/show_bug.cgi?id=3085
>
> Bug ID: 3085
> Summary: Allow UTF-8 for log output
> Product: Exim
> Version: N/A
> Hardware: All
> OS: Linux
> Status: NEW
> Severity: bug
> Priority: medium
> Component: Logging
> Assignee: unallocated@exim.org
> Reporter: forza@tnonline.net
> CC: exim-dev@lists.exim.org
>
> This is probably not a bug, but more of a request for comments.
>
> I am logging to syslog instead of files. The syslog is handled by syslog-ng,
> and I parse the logfiles with Fail2Ban.
>
> The exim.conf:
>
> ### Logging
> log_selector = +all
> log_file_path = syslog
> syslog_timestamp = false
> syslog_duplication = false
> syslog_processname = exim
> SYSLOG_LONG_LINES = yes
>
>
> No, my issue is that sometimes Fail2Ban fails to read some of the lines and
> outputs a warning like this:
>
> 2024-03-17T19:23:33.870+00:00 warning fail2ban.filter[2922]: WARNING Error
> decoding line from '/var/log/exim.log' with 'UTF-8'.
> 2024-03-17T19:23:33.870+00:00 warning fail2ban.filter[2922]: WARNING Consider
> setting logencoding to appropriate encoding for this jail. Continuing to
> process line ignoring invalid characters: b'2024-03-17T19:23:33.698+00:00
> notice exim[5673]: [12\\21] F From: "\xbe\xe7\xb9\xcc\xbc\xf8"
> <msoony@gmail.com>\n'

[. So syslog-ng is writing exim's logging to /var/log/exim.log
I guess there are reasons to go the indirect way. ]

How do the relevant lines look in /var/log/exim.log - perhaps with
grep "2024-03-17T19:23:33.698+00:00" /var/log/exim.log
I guess the result would be something like:
2024-03-17T19:23:33.698+00:00 notice exim[5673]: [12\\21] F From: "????" <msoony@gmail.com>
?

> So, this leads me to my current question. Can Exim be set to output
> UTF-8 encoded logs to syslog?

> Apparently, the syslog format
> according to RFC-5425 says " MSG SHOULD be UNICODE, encoded using
> UTF-8", but it seems to allow plain US-ASCII too.

[. For a piece of text, if the plain US-ASCII encoding is correct
then that byte stream is automatically valid UTF-8 and represents
that text correctly.
It is impossible to support UTF-8 and not handle
(true 7bit) plain US-ASCII correctly ! ]

> https://datatracker.ietf.org/doc/html/rfc5424#section-6.4
>
> I believe syslog-ng could handle non-UT8 messages, using flags(sanitize-utf8)
> on the source, however the manual specifies:
>
> "The HEADER part of the message must be in plain ASCII format, the parameter
> values of the STRUCTURED-DATA part must be in UTF-8, while the MSG part should
> be in UTF-8. The different parts of the message are explained in the following
> sections."
>
> Perhaps I am overthinking all of this. I'd appreciate some thoughts on correct
> logging configurations.

I think you are looking in the wrong place for the problem.
It is not that exim is disallowing UTF-8 output in the log,
but that it occasionally the output is not valid UTF-8.

The fundamental issue is we have "garbage in",
so will inevitably have "garbage out".

Exim is trying to log some "text" - the display-name of the From: header -
which should be ASCII (unless SMTPUTF8 is enabled, in which case it can be
UTF-8) but in this case is not UTF-8 or ASCII, but some unknown byte-stream.
[. Do you happen to know what language or
character set this sender writes their name in ? ]

As I understand it, exim logs this byte-stream as-is and there is nothing that
syslog-ng or fail2ban could reasonably do to interpret it correctly.
I believe that if you reverted to having exim log to a file,
the same issue would be there, probably with exactly the same byte-stream
as the syslog.

The best "fix" might be for exim to log this byte-stream coded as hex,
but in many cases that would be less readable than doing nothing.
For example
From: "André Aitchison" <andrew@aitchison.me.uk>
where the e-acute was encoded in LATIN-9 is not valid UTF-8,
but it is much clearer left like that than logged as
From: "\x41\x6e\x64\x72\xe9\x20\x41\x69\x74\x63\x68\x69\x73\x6f\x6e" <andrew@aitchison.me.uk>
- and then exim would have to spend time figuring out when the display-name
was not valid UTF-8.

I have not used fail2ban for email logs.
Is the message merely annoying, or is this stopping you from blocking
<msoony@gmail.com> because other lines in the log indicate a problem ?

--
Andrew C. Aitchison Kendal, UK
andrew@aitchison.me.uk

--
## subscription configuration (requires account):
## https://lists.exim.org/mailman3/postorius/lists/exim-dev.lists.exim.org/
## unsubscribe (doesn't require an account):
## exim-dev-unsubscribe@lists.exim.org
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/