Mailing List Archive

RFC 2047 encoding of long strings
Hello,

working on my list of things to check, I am pleased to see that
parse_quote_2047() splits long strings into several MIME words, but it
separates them by a space.

Unless Exim folds lines at another place, that may create illegal
long lines. What would break if Exim used a newline and a space?

You may still need to introduce fold marks in string expressions before
and after encoded parts, when building a new header line, but that's
no problem.

Michael

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
Re: RFC 2047 encoding of long strings [ In reply to ]
On Fri, 2006-01-06 at 15:57 +0100, Michael Haardt wrote:
> working on my list of things to check, I am pleased to see that
> parse_quote_2047() splits long strings into several MIME words, but it
> separates them by a space.
>
> Unless Exim folds lines at another place, that may create illegal
> long lines. What would break if Exim used a newline and a space?
>
> You may still need to introduce fold marks in string expressions before
> and after encoded parts, when building a new header line, but that's
> no problem.

I'm a bit puzzled by the comment for this function:

Now it is being used for much longer texts in ACLs and via the
${rfc2047: expansion item.

surely such encoded-words don't make much sense in an ACL warning
statement, as RFC 2047 encoding only makes sense in the context of the
e-mail headers To/Cc/From/Subject.

as such, adding a LF between encoded words can never be wrong, since it
has no semantic meaning. (the LF becomes CR LF later, of course.)

however, note that Exim will happily encode any LF in the source text as
=0A. I think that behaviour should stay, but it would IMHO be useful if
the ${h_foo"} "normalised" this aspect and removed CR LF as appropriate
as part of the RFC 2047 decoding in find_header. it'd probably be best
to only remove the LF if the header contains any encoded words.

--
Kjetil T.



--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
Re: RFC 2047 encoding of long strings [ In reply to ]
On Fri, Jan 06, 2006 at 06:58:11PM +0100, Kjetil Torgrim Homme wrote:
> I'm a bit puzzled by the comment for this function:
>
> Now it is being used for much longer texts in ACLs and via the
> ${rfc2047: expansion item.
>
> surely such encoded-words don't make much sense in an ACL warning
> statement, as RFC 2047 encoding only makes sense in the context of the
> e-mail headers To/Cc/From/Subject.

It may be useful when adding headers in ACLs and I think it is used for
all unstructured headers and at certain places in structured ones. But
I may need to read that part again.

> as such, adding a LF between encoded words can never be wrong, since it
> has no semantic meaning. (the LF becomes CR LF later, of course.)

That is my intention, but it wouldn't the first time a quick patch breaks
something, and said function does not just implement ${rfc2047.

> however, note that Exim will happily encode any LF in the source text as
> =0A. I think that behaviour should stay, but it would IMHO be useful if
> the ${h_foo"} "normalised" this aspect and removed CR LF as appropriate
> as part of the RFC 2047 decoding in find_header. it'd probably be best
> to only remove the LF if the header contains any encoded words.

If =0A is present in the encoded header, it should get decoded to LF and
nothing else. An LF in the header, on the other side, is not so easy.
Unix uses LF as end of line, and looking at it that way, its EOL should
be translated as =0D=0A and =0D=0A should be decoded to LF. With Unix,
there is no such thing as a spare LF.

Yes, that screws up things when converting forth and back.

Sieve uses CR LF as end of line, but if compiled without further settings,
the Exim Sieve implementation uses just LF, as that's what the environment
uses. A string that spans across two lines MUST contain CR LF, though.
Same problem, now I can't embed a single LF. Btw: 4.60 contains some
bugs in that area. Sorry about that, but I am currently working on
fixing them.

I think I remember that OS-9 uses CR as EOL terminator, so they have
the same trouble, just the other way round. It's not just Unix. :-)

Michael

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
Re: RFC 2047 encoding of long strings [ In reply to ]
On Fri, 2006-01-06 at 20:44 +0100, Michael Haardt wrote:
> On Fri, Jan 06, 2006 at 06:58:11PM +0100, Kjetil Torgrim Homme wrote:
> > however, note that Exim will happily encode any LF in the source text as
> > =0A. I think that behaviour should stay, but it would IMHO be useful if
> > the ${h_foo"} "normalised" this aspect and removed CR LF as appropriate
> > as part of the RFC 2047 decoding in find_header. it'd probably be best
> > to only remove the LF if the header contains any encoded words.
>
> If =0A is present in the encoded header, it should get decoded to LF and
> nothing else. An LF in the header, on the other side, is not so easy.
> Unix uses LF as end of line, and looking at it that way, its EOL should
> be translated as =0D=0A and =0D=0A should be decoded to LF. With Unix,
> there is no such thing as a spare LF.

indeed. a CR LF in the original can be changed into an encoded bare LF
unless you're very careful, and I want this to be easier to do right.

I started to edit rfc2047_decode2, but I realised it won't do, we need
to remove the LF in find_header since I'm sure people will write

${rfc2047:Re: ${h_foo:}: I'm away in Japanese}

which still has the LF problem if the foo header is ASCII only.

and changing ${header_:} seems fraught with subtle changes to people's
configurations.

so instead I think we need an operator to do RFC 2047 decoding
explicitly, rather than relying on the implicit conversion done by
${h_foo:}

we can then use

${rfc2047_decode:${sg{$rh_foo:}{\x0a}{}}}

to do it right. of course it's a little more verbose than just
${h_foo:}, but at least it is possible to do it right.

another option is to introduce a ${mheader_} or something to do the
magic for us. this may be in addition to the decode function.

--
Kjetil T.



--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
Re: RFC 2047 encoding of long strings [ In reply to ]
Folks,

I'd like to revisit the rfc2047 encoding. I realize that this is an
older issue, but it looks to me like it is still unresolved. I'm
mailing exim-dev for a broader audience, hopefully to get some
attention in the CR. I recently made a comment summarizing the
discussions that have taken place on this thread. I hope some of you
can find a moment to visit my comment and respond.

http://bugs.exim.org/show_bug.cgi?id=638

Note: This is my first attempt to contribute to any open source
project, let alone exim, so I decided to err on the side of spam. If
this is a gross violation of best practice, please drop me a hint ;-)
. I hope someone takes interest in this, so that I can have one of
these "open source cooperative experiences" that is all the rage.
Kidding aside, thanks in advance for any time spent helping me revive
this old issue.

Derrick F. Rice
Tufts U. / Akamai Tech.

--
## List details at http://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##