Mailing List Archive

[Bug 638] rfc2047 encoding doesn't work correctly
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=638

Derrick <derrick.rice@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |derrick.rice@gmail.com




--- Comment #1 from Derrick <derrick.rice@gmail.com> 2008-10-22 04:02:41 ---
This is still an open issue, and I want to comment on some of the discussion
found in the email thread mentioned previously.

The thread seems to touch on a separate, but related issue: encoding and
decoding line endings. Specifically, encoding a LF and decoding a CR LF is
ambiguous. If we assume that a LF in the source is meant to be an end of line,
then it seems appropriate to encode it as CR LF, the end of line notation for
email. The reverse is true: an encoded CR LF is likely meant to be an end of
line, so it seems appropriate to decode it as a LF in unix.

This introduces problems when:
a) A LF in the source is /not/ meant to be an end of line, and it should be
written into the encoded string as a LF only.
b) a BR LF in an encoded string is /not/ meant to be an end of line, and it
should be decoded as BR LF, not as a LF.

Regardless, my point is that this issue is separate and unrelated to handling
line breaks BETWEEN encoded words. These line breaks are never encoded or
decoded, and any decision made regarding the above should not effect what is
done with the line breaks between words.

According to rfc2047, it seems the following is appropriate:

a) when encoding, 'LF space' should be placed between encoded words. These
will be translated to 'BR LF space' when transmitted, resulting in correctly
formated rfc2047 headers.
b) when decoding, 'LF space' between encoded words should be removed. These
are the result of 'BR LF space' between encoded words in a transmitted email
message.

What concerns are there that observing both (a) and (b) will have undesired
affects? I'd like to patch rfc2047.c to correctly handle whitespace between
encoded words.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

--
## List details at http://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 638] rfc2047 encoding doesn't work correctly [ In reply to ]
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=638

Kjetil Torgrim Homme <kjetilho@ifi.uio.no> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |kjetilho@ifi.uio.no




--- Comment #2 from Kjetil Torgrim Homme <kjetilho@ifi.uio.no> 2008-10-22 16:53:13 ---
To the original poster: use $rh_ instead, and add a condition to see if it's
needed.

condition = ${if !match {$h_Subject:} {\\[${tr{$local_part}{_}{-}}\\]}}
headers_remove = Subject
headers_add = Subject: [${tr{$local_part}{_}{-}}]\n $rh_Subject:

(untested)

Alternatively, you can check the $h_ value against [^ -~] and only apply
${rfc2047: if it matches -- then you know the result will be encoded words, and
a ${sg of SPACE into LF SPACE will do the right thing.

To Derrick: what is BR?

I stand by my comment in the referenced mail thread: the simplest way of fixing
this problem is to unfold header lines when $h_ is used. it will be very hard
(for exim.conf writers) to avoid adding extraneous whitespace otherwise.

I don't think the separator used in ${rfc2047: should change, since sites which
use a ${sg of SPACE into LF SPACE (as suggested above) will see their
configuration break (you get to LF in a row, ending the headers).

When it comes to decoding RFC 2047, *any* white space between two encoded words
shall be removed. it doesn't matter if it's just one SPACE, or LF TAB TAB LF
TAB. Of course, ${rfc2047d: does the right thing here. I haven't verified
that $h_ behaves the same way, but it would surprise me if it didn't.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

--
## List details at http://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 638] rfc2047 encoding doesn't work correctly [ In reply to ]
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=638




--- Comment #3 from Derrick <derrick.rice@gmail.com> 2008-10-24 00:55:45 ---
(In reply to comment #2)
> To Derrick: what is BR?

Sorry, replace BR with CR. I had 'break' in my head.

> I don't think the separator used in ${rfc2047: should change, since sites which
> use a ${sg of SPACE into LF SPACE (as suggested above) will see their
> configuration break (you get to LF in a row, ending the headers).

This is true, but this occurs because rfc2047 had this deficit to begin with,
and users started using this workaround (as suggested above). The correct 2047
behavior would insert these LF's, rather than expect them to be inserted with a
conditional ${sg.

But yes, not breaking existing configurations seems the appropriate course,
here. Documentation ought to be updated to indicate that rfc2047 doesn't
actually follow rfc2047 to the letter.
http://exim.org/exim-html-current/doc/html/spec_html/ch11.html#SECTexpop

I don't have a good understanding of $h_ right now, so I'll look into that
further before commenting on your suggestion there.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

--
## List details at http://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 638] rfc2047 encoding doesn't work correctly [ In reply to ]
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=638

Kjetil Torgrim Homme <kjetilho@ifi.uio.no> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |LATER




--- Comment #4 from Kjetil Torgrim Homme <kjetilho@ifi.uio.no> 2008-10-24 01:42:35 ---
apologies for being a pedant, but Exim follows RFC 2047 to the letter. RFC
2047 says that the length of an encoded word can not surpass 76 letters, but it
says nothing about the length of lines. this is left to RFC 2822 and 2821 (now
RFC 5322 and 5321 although I don't think anything has changed). 2821/5321
restricts the length of a line to 1000 octets, while 2822/5322 says SHOULD NOT
exceed 78 characters (excluding CRLF) in header fields.

anyway -- in a hypothetical Exim 5, I would like to see it use CRLF internally,
and functions like rfc2047 should separate word with CRLF SP. but for Exim 4,
I think it's best to leave everything alone. the workarounds are known, and
not overly onerous to get around.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

--
## List details at http://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 638] rfc2047 encoding doesn't work correctly [ In reply to ]
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=638

Brad "anomie" Jorsch <anomie@users.sourceforge.net> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |anomie@users.sourceforge.net




--- Comment #5 from Brad "anomie" Jorsch <anomie@users.sourceforge.net> 2008-10-24 14:31:57 ---
(In reply to comment #4)

Actually, RFC 2047 does specify the length of lines:

> An 'encoded-word' may not be more than 75 characters long, including
> 'charset', 'encoding', 'encoded-text', and delimiters. If it is
> desirable to encode more text than will fit in an 'encoded-word' of
> 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
> be used.
>
> While there is no limit to the length of a multiple-line header
> field, each line of a header field that contains one or more
> 'encoded-word's is limited to 76 characters.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

--
## List details at http://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 638] rfc2047 encoding doesn't work correctly [ In reply to ]
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=638




--- Comment #6 from Kjetil Torgrim Homme <kjetilho@ifi.uio.no> 2008-10-24 16:10:55 ---
(again apologies for my pedantry -- this sub-discussion has little or no
bearing on the bug, and I promise I'll restrain myself henceforth.)

well, yes: IF the result of the encoding is to be used in a header, the
restrictions on headers from RFC 2822 (or 5322) apply, and RFC 2047 reminds the
reader of this fact. however, encoded-words may be used in other contexts,
however uncommon this may be, and so the maximum line length is not a property
of the RFC 2047 encoding itself.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

--
## List details at http://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 638] rfc2047 encoding doesn't work correctly [ In reply to ]
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=638




--- Comment #7 from Brad "anomie" Jorsch <anomie@users.sourceforge.net> 2008-10-24 17:04:29 ---
(In reply to comment #6)

The 76 character limit is actually more strict than the 1000 character limit
imposed on header length by RFC 2822. I suppose you're correct in that if RFC
2047 encoding is used in other contexts than RFC 2822 message headers, that
clause wouldn't strictly apply.

I hadn't heard about 5322 (why not 5822?), I'll have to read that one of these
days.


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

--
## List details at http://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##