Mailing List Archive: Rejecting spam from : SES, message-id caches and cookies (long)

Rejecting spam from <>: SES, message-id caches and cookies (long)

Feb 28, 2004, 7:51 AM

Post #1 of 17 (10710 views)

I've been thinking a fair bit over the past few months about the
correct way to deal with unwanted mail from the <> address, be it spam
explicitly sent from that address, or bounces from joe jobs.

There are two main approaches to this that I've seen discussed, and
I've seen references to both solutions being deployed (though how
wisespread they are is difficult to determine).

The first is to use a timespamed, cryptographically signed envelope
sender, as is proposed in SES. In fact, I currently do something
similar using TMDA's dated address feature.

The second approach is to record the message-ids of all messages that
you emit, and then scan incoming mail from the <> address for a
message-id you recognise.

I'm also going to propose a third class of solution for discussion:
place a cryptographically signed cookie in an extension header.

With the first approach (signed envelope sender), there are a couple
of points to note with regard to how it interacts with other anti-spam
technology already in widespread use.

The first issue is Sender Address Verification (SAV). A number of
MTAs have the option to verify all envelope senders, by issueing the
following sequence of commands to the primary MX associated with the
sender's domain.

MAIL FROM:<>
RCPT TO:<address-to-test>
RSET

The idea is to test whether the originating domain thinks the address
is a valid address. If the RCPT TO command gives a hard error, then
an MTA using SAV will return a hard error for the transaction, too.

Therefore, as Shevek notes, any SES scheme needs to postpone rejecting
the SMTP transaction until after the DATA command. Rejecting unsigned
addresses at the RCPT TO command will cause MTAs that use SAV to fail
to validate your address, and hence reject mail from you.

Since SAV is a standard feature of the latest versions of a number of
popular MTAs (including exim and postfix), it's reasonable to expect
it to be widely deployed, so any SES scheme needs to play nice with
it.

The second potential problem with SES and similar schemes is how they
interact with challenge/response systems, such as TMDA.
Challenge/response systems send a message to the sender (typically the
envelope sender) the first time they receive a message from that
sender -- this is the challenge. The sender is required to respond to
the challenge in some way (typically by replying to it or by clicking
on a URL containing in the challenge) in order to verify that they
really received the challenge. Some systems incorporate a CAPTCHA
into the challenge to verify that the sender is human, too. Once the
sender has successfully responded, they are whitelisted, and won't
have to go through this process again.

With an SES-like scheme, every message will come from a new, unknown
envelope sender, and hence will trigger a challenge and require a
response before the message can be delivered.

TMDA has a solution to this; since it implements both
challenge/response and signed envelope senders, it has to. The
solution is that any messsage using a signed sender should also
contain an X-Primary-Address header, containing the primary (unsigned)
address of the sender. Provided the value of the header matches a
rudimentory sanity check (the domain matches that in the envelope
sender) this address is whitelisted, rather than the envelope sender
address. Subsequent whitelist checks test both the envelope sender
and the X-Primary-Address header, and let the message through if
either matches.

I would therefore suggest that any SES scheme should add an
X-Primary-Address header, since if nothing else this will ensure it
plays nice with TMDA.

An alternative solution would be for challenge/response systems to try
to parse the signed envelope sender and guess the underlying real
sender, but the problem with that is that there are probably too many
different encoding schemes in use (eg TMDAs dated addresses are
completely unlike SRS addresses). Perhaps a challenge/response system
could fall back to this if there are known widely deployed SES systems
that don't add a X-Primary-Address header.

The third (related) problem with SES and similar schemes is that it
interacts badly with greylisting.

Greylisting is a technique where mail from a new, unrecognised
envelope sender is delayed for a while (typically a couple of hours)
by returning a 4xx code from the SMTP transaction. The idea is
generally to see whether the originating IP address identifies itself
as a spammer during this period (eg by hitting a spam trap) in which
case the IP address will be blacklisted and subsequent delivery
attempts will be rejected with a 5xx code. If nothing untoward
happens during the holding period, then the envelope sender address is
whitelisted, and subsequent delivery attempts from this envelope
sender will succeed without delay.

The problem with an SES sender talking to a greylisting receiver is
that _every_ message will be delayed for a couple of hours, since it
will always be from a new, unknown sender address that hasn't been
whitelisted.

The problem here is much the same as for challenge/response, and the
potential solutions would appear to be the same: either whitelist the
address indicated in the X-Primary-Address header, or attempt to
intuit the correct address yourself by parsing the signed sender
address. There's an added wrinkle, though, with using
X-Primary-Address. The greylisting system can no longer reject with
4xx after RCPT TO; it needs to see the entire message in order to
check whether the address in the X-Primary-Address header has been
whitelisted, so will have to reject after the DATA phase.

Message-ID systems are immune to these problems since they don't
involve modifying the envelope sender. All bounce messages should
contain the full headers of the original message, so they should be
reliable. Because Message-ID systems tend to play nicer with other
anti-spam technologies, I'm inclined to favour them over SES schemes.
However, although they're ideal for implementing in an MUA, they cause
scalability problems if you want to implement them in a large MTA
cluster: each MTA needs access to the complete list of message-ids of
all messages sent by all MTAs in the cluster, in order to be able to
validate bounces.

As an aside, there's a subcategory of message-id schemes which use a
cryptographically-signed message ID, rather than maintaining a list of
all message-ids. Unfortunately this is only implementable in an MUA,
since by the time an MTA receives the message it might already have a
message-id, preventing the MTA from making its own choice here.

So I'd like to propose a third class of solution to this problem: the
cryptographically signed cookie (CSC).

The idea is that you add an additional header (X-CSC-Cookie, say) to
all outgoing mail. For sake of argument, this could contain a
timestamp, and a keyed hash of the timestamp and message-id of the
message. When you receive a bounce, you just need to extract the
Message-ID and X-CSC-Cookie fields to validate it. This avoids the
problems of SES, and is completely scalable in large MTA clusters,
since the MTA only needs to know the key in order to perform
verification.

There is another problem which all these solutions face, which is
worth analysing, and that is how they handle automatically generated
messages that are _not_ bounces, eg vacation messages, challenges from
challenge/response systems, etc.

I'll dismiss the challenge/response case very quickly by observing
that all challenges sent by a default TMDA install are syntactically
identical to bounces. They are sent from the <> address to the
envelope sender of the original message, and include the full headers
of the original message. So any of the above schemes (and indeed any
other scheme that correctly handles bounce messages) will necesarily
do the right thing for TMDA challenges, too. Of course, this may not
be true of other challenge/response systems.

So, what happens when an autoresponder sends a message from the <>
sender address?

Well, if the autoresponder sends a message to the envelope sender,
then SES is fine. If it sends it to some other address (header
sender, address already on file) then SES will lose.

Message-ID schemes will work either if the message includes full
headers (typically not the case in vacation messages) or if it quotes
the Message-ID in an In-Reply-To or References header.

The Cookie scheme fairs somewhat worse than the Message-ID scheme,
here though, since it needs the full headers of the message.

And of course, if the autogenerated message was some kind of
notification what was not generated in response to a message, all the
above schemes lose.

Just some random thoughts,

-roy

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname=srs-discuss@v2.listbox.com

Re: Rejecting spam from <>: SES, message-id caches and cookies (long) [ In reply to ]

dwmw2 at infradead

Feb 28, 2004, 12:22 PM

Post #2 of 17 (10477 views)