Mailing List Archive: Reworking the header handling

Now I feel like doing something about Exim's header handling! (See
http://www.exim.org/bugzilla/show_bug.cgi?id=354 or Exim 4 WishList item 333)

Some questions to discuss:

How much can be done without breaking backwards compatibility? I'm primarily
thinking about what is visible when. How many configurations would break if
headers_remove removed headers previously added by headers_add?

Can some things be done before completely revamping the header handling, i.e.
which wishes are actually blocked by, or could even be included in, the
aforementioned bug 354?

The somewhat peculiar system filter mechanism, which its whole set of
configuration options, could be deprecated and replaced with something else.
What's special with the system filter is that it's run once per message,
whereas routers are run once per recipient. I think adding recipients from
ACLs has been suggested at some time. Another idea was a kind of routers that
only run once per message. What else is needed to get the same features as is
possible with system filters?

Let's see if I've understood what happens to headers throughout the course of
a delivery...

First, header lines can be added (but not removed) in ACLs (the add_header
modifier or the message modifier in a warn statement). Headers lines added in
the MAIL, RCPT and PREDATA ACLs are accumulated and duplicates are removed.
They are then added to the message and become visible during the DATA and
MIME ACLs. Header lines added in the DATA and MIME ACLs are accumulated in
the same way, duplicates are removed and finally the result is added to the
message.

Then the local_scan() function is called. It can do whatever it wants.

If the message is accepted it is written to the spool in its current state.
Then it can be delivered.

In every delivery attempt, the system filter is run first. It can add and
remove header lines, and all such changes happen immediately. The altered
header is saved to the -H spool file, so if delivery fails and the filter
doesn't use the first_delivery condition, the same header fields will be
added on the next attempt.

Then all the recipient addresses are routed. Every router can add and remove
header lines for the recipient in question, but the message is not actually
modified and thus the new header lines do not become visible; instead the
additions and removals are accumulated in a data structure associated with
each address. They are not written to disk, so on a second attempt all these
header modifications are rolled back.

When all addresses have been routed and assigned transports, they are batched
together subject to batch_max, extra headers etc. The transports write the
message copies to their destinations, omitting the header fields that have
been set to be removed and appending any extra header lines. Expansion
variables still reflect the way the message looked after passing the system
filter.

To add to this we have address rewriting and possibly other things I have
forgotten. What *have* I forgotten?

--
Magnus Holmgren holmgren@lysator.liu.se
(No Cc of list mail needed, thanks)

On Mon, 25 Sep 2006, Magnus Holmgren wrote:

> Now I feel like doing something about Exim's header handling! (See
> http://www.exim.org/bugzilla/show_bug.cgi?id=354 or Exim 4 WishList item 333)

Oh, dear. I was hoping to put all that off till Exim 5, and also hoping
to have retired by then. :-)

> How much can be done without breaking backwards compatibility? I'm primarily
> thinking about what is visible when. How many configurations would break if
> headers_remove removed headers previously added by headers_add?

My feeling is that quite a lot of configurations will break if/when the
header handling is rationalized, which is why I think it is an "Exim 5"
issue. However, I agree that it would be good to sort out the current
mess, which has been hacked up over the years and could well do with
rationalizing.

> Can some things be done before completely revamping the header handling, i.e.
> which wishes are actually blocked by, or could even be included in, the
> aforementioned bug 354?

I am not sure if anything can be done that will not cause pain to
somebody. (Of course, I could be wrong.) I think it might therefore be
best to do a "proper job" rather than tinker any more.

> The somewhat peculiar system filter mechanism, which its whole set of
> configuration options, could be deprecated and replaced with something else.
> What's special with the system filter is that it's run once per message,
> whereas routers are run once per recipient. I think adding recipients from
> ACLs has been suggested at some time. Another idea was a kind of routers that
> only run once per message. What else is needed to get the same features as is
> possible with system filters?

A router that runs once per message would be odd because there would be
no single recipient address. I think it would have to be called
something other than a "router". On the other hand, perhaps this could
be done by a run of the routers with $local_part = $domain = "" (an
empty string) and some special flag set so that you could specify which
routers would act (compare verify_only). Of course $recipients could be
made available. We'd have to work out how to use something like that to
do what can now be done in a system filter.

However, I agree that the system filter is starting to look untidy, and
some of the original reasons for implementing it are now handled much
better in ACLs. Rather than deprecate it, however, an "Exim 5" style
change would be to remove it altogether, as long as equivalent features
were provided. (On the other hand, if the new features were entirely
orthogonal, retaining the filter for compatibility is not unreasonable.
I suppose we could arrange a way of omitting it from the binary to save
space and plan to delete it in "Exim 6".)

> Let's see if I've understood what happens to headers throughout the course of
> a delivery...
>
> First, header lines can be added (but not removed) in ACLs (the add_header
> modifier or the message modifier in a warn statement). Headers lines added in
> the MAIL, RCPT and PREDATA ACLs are accumulated and duplicates are removed.
> They are then added to the message and become visible during the DATA and
> MIME ACLs. Header lines added in the DATA and MIME ACLs are accumulated in
> the same way, duplicates are removed and finally the result is added to the
> message.

Yes. It was implemented that way at first because the data structure for
headers is not yet set up when the MAIL and RCPT ACLs are run. Perhaps I
should have made a bigger change and made it set up the full data
structure right at the start (MAIL) but at the time I took the easy way
out. (Always leads to trouble in the end... :-)

More history: Why isn't the header structure set up at MAIL time?
Because I used the same code for non-SMTP messages. In both cases the
header structure is set up at the start of receiving the incoming
headers by which time the same code is running for all types of incoming
message.

> Then the local_scan() function is called. It can do whatever it wants.

Yes.

> If the message is accepted it is written to the spool in its current state.
> Then it can be delivered.

Yes. There is also the possiblity of some headers being rewritten if
address rewriting takes place. This is done before the spool is written.

> In every delivery attempt, the system filter is run first. It can add and
> remove header lines, and all such changes happen immediately. The altered
> header is saved to the -H spool file, so if delivery fails and the filter
> doesn't use the first_delivery condition, the same header fields will be
> added on the next attempt.

Correct (unless the filter uses "if first_delivery" to do it only once).
The removed header lines are also saved on the spool. In fact, no header
line is actually removed; it is just flagged as "not wanted, invisible".
This flag is also used when headers are automatically rewritten as part
of address rewriting. The idea was that there was a "history" of the
headers written to the spool that could be investigated if anything went
wrong.

> Then all the recipient addresses are routed. Every router can add and remove
> header lines for the recipient in question, but the message is not actually
> modified and thus the new header lines do not become visible; instead the
> additions and removals are accumulated in a data structure associated with
> each address. They are not written to disk, so on a second attempt all these
> header modifications are rolled back.

Correct.

> When all addresses have been routed and assigned transports, they are batched
> together subject to batch_max, extra headers etc. The transports write the
> message copies to their destinations, omitting the header fields that have
> been set to be removed and appending any extra header lines. Expansion
> variables still reflect the way the message looked after passing the system
> filter.

Correct. The transports can also specify the addition/removal of
headers, but they can't remove what the routers added.

> To add to this we have address rewriting and possibly other things I have
> forgotten. What *have* I forgotten?

Probably the same thing that I have forgotten...

The above represents "off the cuff" responses without very much deep
thought. The only general thought I've had about this is to wonder
exactly what is needed: For instance, do we want a way of seeing all the
headers at all times? For an individual address, that is
address-specific. If address A adds a header, then it shouldn't be
visible while processing address B. Do we want a way of specifing "an
original header, received with the message" or "a header added by an
ACL"? Or even "the version of this header before it was rewritten"?

I don't know the answers because I don't have any feel for what people
use the various header-manipulating features for, other than the most
simple cases.

There is one other can of worms related to headers: Resent-xxx.

Philip
--
Philip Hazel University of Cambridge Computing Service
Get the Exim 4 book: http://www.uit.co.uk/exim-book

--
## List details at http://www.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##