Mailing List Archive

1 2 3  View All
RE: CBV [ In reply to ]
> From: Stuart D. Gathman
> Sent: Friday, May 07, 2004 9:27 PM
>
>
> On Fri, 7 May 2004, Seth Goodman wrote:
>
> > > The CBV would work with the recipient sending the recipient
> > > as MAIL FROM
> > > and the sender as RCPT TO.
> >
> > This particular one hasn't been brought up before, and it's very clever.
> > The only difficulty is that recipient addresses change when
> > they go through
> > forwarding hops and the sender has no way of knowing what those new
> > addresses are. I can't think of a way around this particular
> > limitation,
> > but if you can, it would be a much better way to avoid replay attacks.
>
> This is no different than forwarders that don't do SRS with SPF. The
> solution is to just whitelist the forwarder. If the forwarder does do
> SRS, then there is no problem. The forwarder would do the CBV (or not)
> in the case of a bounce.

I think David is correct in pointing out that CBV's need to have a null
sender to prevent looping of CBV's. Putting your own address in RCPT TO:
also makes it look like you're trying to use them as a relay. Maybe you can
see a way around this, but it sounds like a problem.

As you first suggested, the sender can still put the original RCPT TO: in
the MAIL FROM:. For a single recipient, this is no worse than including the
body hash in terms of string length. Let's ignore, for the moment, the
problem of multiple recipients at the same domain. Since forwarding is set
up by the recipient, they can whitelist any forwarding addresses they set up
and proceed with the CBV. If the original RCPT TO: address that the sender
put in MAIL FROM: is not one of the recipient's forwarding accounts, they
can reject it as a forgery. This works very nicely for single recipients.
The only cost is putting the full recipient address in MAIL FROM:, which
reduces the size left for the actual local part of the sender. Since it
does not require a body check, it has a definite advantage as an anti-replay
measure.

That leaves us with multiple recipients at the same domain to deal with.
One approach would be to shorten the recipient name by replacing it with a
short hash, thereby allowing more of them to fit. Each recipient address
would need a single hash string. I don't think this would work out for the
following reasons. To attack such a hashed address, let's say someone has a
database with every currently valid email address on the planet. They then
calculate the hash for every address and index it by hash value. When they
harvest an SES address, they look up the known addresses that match that
hash value. As long as the number of such addresses is small, say ten hash
collisions per hash value, it would not be worth the effort because there
are not that many places to harvest an SES-signed address. My guess is that
the hash length would have be pretty long to accomplish this, but we'd need
to guess the number of possible email addresses active at any one time to
determine this.

The database method is intriguing, however. Since the recipient can't
decode the database key, the MX receiving the CBV would have to implement
the EXPN command, or something equivalent. I know that most people choose
to shut off VRFY to discourage dictionary attacks, and I don't know how
people feel about turning on EXPN. If allowing EXPN is acceptable, this is
a viable way to foil replay attacks before DATA, though it requires a
database at the sending end.

If we have a database at the sending end, we could also encode the local
part of the sender address in that same database key, just like the database
option in SRS. This would result in local parts that are guaranteed to fit
in the 64-byte limit, but it obscures the local part of the sender address,
which is not desirable. We can get around that by storing the sender
address along with the recipient list in the database so that EXPN would
give the sender address at the end of the list.

--

Seth Goodman
RE: CBV [ In reply to ]
> From: Seth Goodman
> Sent: Saturday, May 08, 2004 5:07 PM
>
>

<...>

> I know that most people choose
> to shut off VRFY to discourage dictionary attacks, and I don't know how
> people feel about turning on EXPN.

Replying to my own post:

Another thought that _might_ make allowing EXPN more acceptable is to only
accept the command _after_ receiving a MAIL FROM:<>, RCPT TO:<...> that
passes. This means that the SMTP-client has a valid MAIL FROM: string that
came from your outgoing MTA. Even if the MAIL FROM: was harvested from a
promiscuous sending account and the CBV passes, expanding the recipient list
will only yield the attacker's address who harvested the MAIL FROM:. If the
CBV doesn't pass, you simply deny the EXPN command. I don't know if this
violates RFC2821, but it does make it safe to permit EXPN under limited
circumstances.

--

Seth Goodman
RE: CBV [ In reply to ]
On Fri, 7 May 2004, Seth Goodman wrote:
>
> Nobody wants this, but there is a vulnerability to replay attacks. Maybe
> we've given it too much airtime, since the vulnerability only really exists
> for promiscuous sales type accounts that send mail to anyone who asks.

IME, using only an unpublished sender address (not signed, not changing,
valid as a recipient address) I'm very well protected from the hundreds
of virus bounces etc. I get each day.

--
Tony Finch <dot@dotat.at> http://dotat.at/
RE: CBV [ In reply to ]
> From: Tony Finch
> Sent: Monday, May 10, 2004 4:06 PM
>
>
> On Fri, 7 May 2004, Seth Goodman wrote:
> >
> > Nobody wants this, but there is a vulnerability to replay
> > attacks. Maybe
> > we've given it too much airtime, since the vulnerability only
> > really exists
> > for promiscuous sales type accounts that send mail to anyone who asks.
>
> IME, using only an unpublished sender address (not signed, not changing,
> valid as a recipient address) I'm very well protected from the hundreds
> of virus bounces etc. I get each day.

Thank you for this observation. This at least provides anecdotal support
for the idea that private bounce addresses can be used for typical email and
forum participation without being discovered. Though the final gateway MTA
is supposed to put MAIL FROM: into the Return-Path: header, mailing lists
and other public services that are not broken don't do this.

Unless you have the bad luck to send a message to a broken service or
someone gives a spam complaint to a spammer without obfuscating your
address, your bounce address will likely remain unknown to spammers. How
long have you been using the same unpublished bounce address without having
it harvested?

--

Seth Goodman
RE: CBV [ In reply to ]
On Tue, 11 May 2004, Seth Goodman wrote:
>
> How long have you been using the same unpublished bounce address without
> having it harvested?

Nearly two years now, though I've been using it more to send messages to
public forums recently. I also used the address for three years between
1994 and 1997, though probably not very much -- I think I mostly used a
different address then, and Google seems to concur.

--
Tony Finch <dot@dotat.at> http://dotat.at/
RE: CBV [ In reply to ]
On Thu, 2004-05-06 at 22:36, Seth Goodman wrote:
> > From: Mark Shewmaker
> > Sent: Thursday, May 06, 2004 6:04 AM

[.Deleting a lot of stuff on how and where we're in agreement, (very well
worded stuff, btw, helpful for people following along in the archives
later.)]

> Computing
> the outer hash that protects the MAIL FROM: requires knowing the hash
> secret. Unless you can crack that, you can't verify that a modified address
> string will produce the correct outer hash value. Therefore, the outer hash
> length only has to be long enough to prevent cracking the hash secret.

I would note that it's harder for an attacker to find valid
inner-hash/outer-hash pairs than to simply figure out a valid inner
pair, because while doing a brute force test of an inner hash is a
cpu-bound activity that can be done entirely on the attacker's machines
in a couple milliseconds, checking to see if any particular inner/outer
pair matches Because checking to see if a forged header/body matches
requires a CBV test--far more expensive in terms of clock-time, and
detectable by the server.

> > For completeness:
> >
> > B27= A 27-character base-64 representation of all 160 sha1
> > bits of all of the following ("SES0", TT,
> > local-part@domain, message body.)
> >
> > Note that C27 is not included in this mail_from, but
> > it is included in the computation of H27.
>
> I assume that B27 above was a typo and you meant C27.

Basically yes, or rather they're the same.

I messed up with editing my response to you there. Before sending it
out, I figured 'B' for "body (more or less) checksum" might be easier to
follow along than using a generic 'C' for "checksum", but proceeded to
change only some of the C's to B's. Oops!

> As I've argued above, the outer hash doesn't need to be 160-bits long.
> Since the attacker would need the hash secret to check possible new return
> paths, the outer hash only needs to be long enough to prevent cracking the
> hash secret. Shevek did look into this and suggested 24-bits as adequate.
> Therefore, I suggest that we stay with a 24-bit outer hash.

I am not at all confident that that is a good idea.

24 bits gives just 16 million combinations.

Since the end verification takes place on the server, an attacker would
have to present ~8 million combinations to expect a 50% chance of
getting making a match by chance.

We can detect that many attempts. :-)

However, with that small an outer hash, validation might become possible
without contacting the server because an attacker will have many valid
mail_froms to look at.

(We should assume a dedicated attacker can collect many valid MAIL
FROMs for any given inner-local-part, hundreds, thousands maybe. It's
not unrealistic to think this--places that send enormous amounts of
emails for constant local-parts are the very places most valuable for an
attacker to focus on.)

While having many valid H27=sha1(rest_of_mail_from+secret)'s from many
different mail_froms might result in the secret being hard to compute
when sha1 outputs 160bit, the secret may not be as hard to find when the
output is truncated to 24 bits.

I worry that
truncated_to_24_bits(H27=sha1sum(rest_of_mail_from+secret_1)) is
equivalent to simpler_hash(rest_of_mail_from+secret_2), in which secret
is much easier to find and vulnerable to a known plaintext attack. In
fact, I would also worry that there could be *many* potential secret_2's
for any secret_1.

If, let's say, an attacker can collect 4000 good mail_froms from a
single day, then he'll have 1/4000 of the whole problem space to study,
(4k/16M=1/4k), which I think is a conveniently large percentage to have
at hand for a known plaintext attack.

So if there's no reason to keep HHHH at 4 characters, I'd suggest making
it of variable length.

> Next let's consider the timestamp. The timestamp has two possible
> functions: to date the signed return path so it can be expired and to act
> as a salt for the outer hash. For the purpose of expiring the timestamp,
> all we really need is resolution in days. Once we add an inner hash, there
> is no longer a need to salt the outer hash calculation. Therefore, I
> suggest that we stay with two base-32 digits with one day resolution.

I can grudgingly accept that logic. I would suggest that while a two
digit timestamp seems sufficient, that there is still no need for SES to
nail it down to two digits as a requirement.

If we later find that there is a need for sub-second timestamps here,
(doubtful), and servers start using T6's, recipients parsing the SES
format should be required to still deal with such a longer T.

(Or SES sending machines might have other reasons to later overload the
T field by adding in another character of information. There's no need
to cut off that possibility.)

--
Mark Shewmaker
mark@primefactor.com
RE: CBV [ In reply to ]
On Fri, 2004-05-07 at 01:00, Seth Goodman wrote:
> > From: Mark Shewmaker
> > Sent: Thursday, May 06, 2004 6:04 AM
>
> Rereading your message, it looks like I misunderstood your proposal quite a
> bit. I didn't notice that both of your hashes in MAIL FROM: used the local
> secret. I also didn't really understand the subtleties of the two kinds of
> CBV's.
>
> Now that I understand it better, I think that the two forms of CBV's are a
> complication that would be nice to avoid, if we can.

I don't know if I agree with that or not. Combining the desired
functions of:

o Asking the sender to verify-this-MAIL-FROM (normal CBV function), in
a way that is mostly resistant to replay attacks, and works for
anyone who does simple CBV checks, and
o Allowing the recipient to verify checksums in a way that is totally
resistant to replay attacks, but requires the recipient to understand
how to do the check,

can be done in two separate CBVs as I described. If they are done
separately, then each of the above two items can be almost perfectly
handled. (If they are done with one mail-from, you have to make more
tradeoffs in numbers of bits available.)

But having two CBV types would require recipients to pick among separate
CBV checking strategies for each message. For those who do CBV checks:

o If they don't understand this different SES type format, or
are simply not interested in nonstandard checks:

o They'll simply do standard CBVs.

o If they do understand this different SES type format and want
to do the extended checks, then they can:

o In your all-in-one CBV suggestion:

o Do a standard CBV check at the first RCPT TO:
o Reject before DATA if this CBV fails.
o Do the checksum test after DATA
o Reject if the checkum test fails.

o In my dual-CBV suggestion:

o Do a standard CBV check at the first RCPT TO:
o Reject if the CBV fails.
o Generate a checksum-containing MAIL FROM after DATA
o Do a second CBV check with this generated MAIL FROM
o Reject if this second CBV test fails.

Note that if you're going to do the second CBV test, there's
no real need to do the first one.

Now currently, for non-SES messages, CBV is so very effective that no
one in their right mind really wants to wait until after DATA to do the
the test. You get rid of so many forgeries so quickly there's...well,
there's simply no reason at all to wait until after DATA; the idea is
simply so ridiculous as to be hard to even consider.

But.. let's assume that SES is extremely effective in actual practice.
That is, let's assume that it's effective enough that forgers are likely
to use/make MAIL FROMs from machines that don't sign their MAIL FROMs
anyway.

Given that assumption, then any addresses that you recognize as
SES-signed are almost certainly valid. Obviously, you're going to do
CBV checks anyway just to be sure, but unlike the general non-SES
CBV-check case where you'll end up rejecting the vast majority of the
emails after the first RCPT TO:, here the vast majority will presumably
past that first before-DATA check, meaning that for that vast majority
you'll be going through the DATA phase anyway.

So given that situation, (ie given the assumption that you're going to
be getting to the point where you're almost always accepting DATA
anyway), the arguments against DATA-time checks mostly disappear.

So a recipient wanting to do this second CBV test could simply create
the new MAIL FROM based on the DATA received, and do merely that second
CBV then. (If you're going to do the second one, there's no real need
to do the first one.)

Given two types of callbacks, you get to keep and use more checksum bits
each way, compared to having to have one MAIL FROM that works for
everything.

The one disadvantage with this two-types-of-CBV's is that you don't know
a-priori what the good checksum will be, so you have to wait to
CBV-check that checksum until after DATA. (You can't do any of the
tests ahead of time.)

So, you call this a complication, which..it is.

I guess my question is what part of the complication is objectionable to
you:

1. The extra work on the mail server creating a MAIL FROM?

Our suggestions require equivalent work here.

2. The extra work on the recipient?

(Mine is only slightly more complicated, imho.)

3. The extra work on the mail server checking CBV's?

(Admittedly, the mail server has to check among two types of
validity here.)

4. The need for multiple CBV's?

(You really don't need to do more than one CBV for either method.)

(As an aside, both of our schemes are similarly extendable to PKI-based
techniques.)

(I have to say that the need to only do only one CBV even for the
checksumming recipients only recently occurred to me, and it does depend
on the assumption that most SES'd messages are not forged and that
therefore even the pre-DATA CBV tests will mostly all pass for SES'd
messages, but I think that's a very reasonable assumption.)

>
>

[good hashing discussion deleted.]

> We could really use an
> opinion from a crypto expert here.

We can hash out, (haha), some other more general vulnerabilities first,
but yeah, we do need a real crypto expert at some point.


> Here's a slightly modified long inner hash SES address format:
>
> SES0=HHHH=C27=TT=local-part@domain
>
> where
>
> HHHH = first four base-64 digits of the SHA-1
> hash of "C27=TT=local-part@domain"
> prepended with the hash secret
> (MSA login ID + password) for
> local-part@domain
>
> C27 = 27 base-64 digits comprising the SHA-1
> hash of the concatenation of the From:,
> Sender:, Reply-To:, Date: and Subject:
> headers plus the unencoded message body
>
> TT = first two base-32 digits of the UNIX
> integer day number mod 1024
>
>
> This is longer than what I'd prefer, but it should be very secure. The
> outer hash is protected by the hash secret and the inner hash is too long to
> brute force a forgery. This version gives us 23 characters for "local-part"
> before exceeding the 64-byte limit. All CBV's are done the same way. The
> body hash in the envelope sender address ties this address to the message
> content and replay attacks are not feasible. Any of these address formats
> would survive rewriting by SRS.

Hmm. Although I'm still not convinced that single-type-callbacks are
the best way to go...

The fact that only one type of callback will exist does have the expense
of a maximum 23 character local-part, because C27 now can't be
shortened.

Let me suggest that HHHH could still be of variable length.

With a variable-length H, SES-aware recipients would still be able to
extract C27 and do their checksum tests.

SES and non-SES aware recipients that do CBVs would still do their same
CBVs, unchanged.

On receiving a CBV test, the sending machine would be able to see from
the encapsulated local-part what size HHHH it would have made, so it
could still easily validate incoming CBVs on the fly.

And we'd get the advantage of longer H's when possible.

(Would this still survive SRS rewrites? I..get confused here.)

Minor issues.

C27: I don't know if the set of headers you suggest are the optimal
set to use. There may be some debate on this.

(I'd suggest adding Message-Id: and References: .)

H4: Note of clarification:

We've been talking as if H4 or H27 has to be an (hmac'd) hash.
In reality it could be any algorithm, even random numbers generated
by the server and kept in a database. If there are vulnerabilities
found in doing (sha1sum(body+header+secret)), then any SES sender
can swap out that algorithm for another completely transparently to
the receiving machine.

It's only the meaning of C27 that has to be carved in stone.

--
Mark Shewmaker
mark@primefactor.com
RE: CBV [ In reply to ]
> From: Mark Shewmaker
> Sent: Saturday, May 15, 2004 12:46 PM
>
>
> On Thu, 2004-05-06 at 22:36, Seth Goodman wrote:
> > > From: Mark Shewmaker
> > > Sent: Thursday, May 06, 2004 6:04 AM
>
> [.Deleting a lot of stuff on how and where we're in agreement, (very well
> worded stuff, btw, helpful for people following along in the archives
> later.)]
>
> > Computing
> > the outer hash that protects the MAIL FROM: requires knowing the hash
> > secret. Unless you can crack that, you can't verify that a
> > modified address
> > string will produce the correct outer hash value. Therefore,
> > the outer hash
> > length only has to be long enough to prevent cracking the hash secret.
>
> I would note that it's harder for an attacker to find valid
> inner-hash/outer-hash pairs than to simply figure out a valid inner
> pair, because while doing a brute force test of an inner hash is a
> cpu-bound activity that can be done entirely on the attacker's machines
> in a couple milliseconds, checking to see if any particular inner/outer
> pair matches Because checking to see if a forged header/body matches
> requires a CBV test--far more expensive in terms of clock-time, and
> detectable by the server.

Without knowing the key for the outer truncated HMAC, I think an attacker is
pretty much stuck. Remember, there is more than one possible key that will
give the same HMAC on a given block of data. This particular situation gets
worse for the attacker when we shorten the result string, even though his
effort to find a candidate key is made easier. Shortening the hash result
weakens the protection quite a bit, but it is still an extremely difficult
problem.

Shevek can better address the required compute resources to crack the
shortened hash than myself, since he researched the minimum length required
for SRS. However, even if the attacker did succeed in getting one candidate
key from the truncated hash result, at very considerable CPU effort, they
have no assurance that this is the actual key used by the mail sender. From
a probabilistic standpoint, since the number of possible keys that would
yield the same truncated HMAC result from the same data block is larger when
the hash string is truncated more, it is rather unlikely that the attacker's
first successful candidate key would be the correct key. After all the work
of cracking multiple candidate hash secrets to get the correct one, all the
attacker has is the hash key for one local address at the domain in
question. All you have to do to stop the joe-job is to change the hash
secret for that one single user. There are easier ways to make a living and
that's all we need to accomplish.

What's more important here is my assertion, backed up by Tony's anecdotal
observation, that signed return-paths are, in general, not harvestable.
Since part of the spec would be that the receiving MTA strip the signature
part and just put the usable email address in the Return-Path: header, only
malicious parties that run hacked MTA's would have access to the full signed
return path. Normal users of email don't send mail to such parties. Mailing
lists already rewrite the return path so the original SES signed address is
gone.


<...>

> > As I've argued above, the outer hash doesn't need to be 160-bits long.
> > Since the attacker would need the hash secret to check possible
> > new return
> > paths, the outer hash only needs to be long enough to prevent
> > cracking the
> > hash secret. Shevek did look into this and suggested 24-bits
> > as adequate.
> > Therefore, I suggest that we stay with a 24-bit outer hash.
>
> I am not at all confident that that is a good idea.
>
> 24 bits gives just 16 million combinations.
>
> Since the end verification takes place on the server, an attacker would
> have to present ~8 million combinations to expect a 50% chance of
> getting making a match by chance.
>
> We can detect that many attempts. :-)

Hopefully, you'd cut them off well before then :) Guessing the outer hash
is therefore not a realistic attack method.

>
> However, with that small an outer hash, validation might become possible
> without contacting the server because an attacker will have many valid
> mail_froms to look at.
>
> (We should assume a dedicated attacker can collect many valid MAIL
> FROMs for any given inner-local-part, hundreds, thousands maybe. It's
> not unrealistic to think this--places that send enormous amounts of
> emails for constant local-parts are the very places most valuable for an
> attacker to focus on.)

I still assert that signed return path harvesting is only possible from a
relatively small number of promiscuous mail senders. Those are the only
accounts that need to worry about replay attacks, so the rest of us can use
the original, much simpler form of SES address with no protection against
replay attacks. Owners of promiscuous accounts, or those who, because of
their job functions, must communicate with spammers, can use one of the
protected SES formats we have been discussing. I'll list those in a
separate post to give everyone the most up-to-date versions to criticize.


>
> While having many valid H27=sha1(rest_of_mail_from+secret)'s from many
> different mail_froms might result in the secret being hard to compute
> when sha1 outputs 160bit, the secret may not be as hard to find when the
> output is truncated to 24 bits.
>
> I worry that
> truncated_to_24_bits(H27=sha1sum(rest_of_mail_from+secret_1)) is
> equivalent to simpler_hash(rest_of_mail_from+secret_2), in which secret
> is much easier to find and vulnerable to a known plaintext attack. In
> fact, I would also worry that there could be *many* potential secret_2's
> for any secret_1.

Shortening the hash to 24-bits does make it easier to crack, but recall that
a good hash key for a SHA-1 is 64-bytes, which is 512-bits. Each MTA can
generate keys anyway they want and the attacker has no knowledge of the
local key-generation mechanism. Unless the MTA uses a cryptographically
inferior method of generating keys (we can address this by providing
recommendations), the attacker is left to guess a 512-bit key. Again, I
would call on Shevek to provide the specifics, since he already did the
detailed research on this, but it appears pretty obvious that at some
minimum of hash length, this job becomes intractable. On such a matter, I
would definitely trust Shevek's research over either my own or your
intuition.

>
> If, let's say, an attacker can collect 4000 good mail_froms from a
> single day, then he'll have 1/4000 of the whole problem space to study,
> (4k/16M=1/4k), which I think is a conveniently large percentage to have
> at hand for a known plaintext attack.

As I've argued above, being able to collect this many signed return paths
would be pretty difficult. If you have an account that is prone to this,
you can use one of the more secure SES alternatives.

>
> So if there's no reason to keep HHHH at 4 characters, I'd suggest making
> it of variable length.

I changed the description to a minimum of four base-64 characters.


>
> > Next let's consider the timestamp. The timestamp has two possible
> > functions: to date the signed return path so it can be expired
> > and to act
> > as a salt for the outer hash. For the purpose of expiring the
> > timestamp,
> > all we really need is resolution in days. Once we add an inner
> > hash, there
> > is no longer a need to salt the outer hash calculation. Therefore, I
> > suggest that we stay with two base-32 digits with one day resolution.
>
> I can grudgingly accept that logic. I would suggest that while a two
> digit timestamp seems sufficient, that there is still no need for SES to
> nail it down to two digits as a requirement.
>
> If we later find that there is a need for sub-second timestamps here,
> (doubtful), and servers start using T6's, recipients parsing the SES
> format should be required to still deal with such a longer T.
>
> (Or SES sending machines might have other reasons to later overload the
> T field by adding in another character of information. There's no need
> to cut off that possibility.)

I agree, so I also changed the description to a minimum of two base-32
digits with more digits possible to the left or right of the binary point.

--

Seth Goodman
RE: CBV [ In reply to ]
> From: Mark Shewmaker
> Sent: Saturday, May 15, 2004 12:46 PM
>
>
> On Fri, 2004-05-07 at 01:00, Seth Goodman wrote:
> > > From: Mark Shewmaker
> > > Sent: Thursday, May 06, 2004 6:04 AM
> >
> > Rereading your message, it looks like I misunderstood your
> > proposal quite a
> > bit. I didn't notice that both of your hashes in MAIL FROM:
> > used the local
> > secret. I also didn't really understand the subtleties of the
> > two kinds of
> > CBV's.
> >
> > Now that I understand it better, I think that the two forms of
> > CBV's are a
> > complication that would be nice to avoid, if we can.
>
> I don't know if I agree with that or not. Combining the desired
> functions of:
>
> o Asking the sender to verify-this-MAIL-FROM (normal CBV function), in
> a way that is mostly resistant to replay attacks, and works for
> anyone who does simple CBV checks, and
> o Allowing the recipient to verify checksums in a way that is totally
> resistant to replay attacks, but requires the recipient to understand
> how to do the check,
>
> can be done in two separate CBVs as I described. If they are done
> separately, then each of the above two items can be almost perfectly
> handled. (If they are done with one mail-from, you have to make more
> tradeoffs in numbers of bits available.)

A new idea to foil replay attacks was introduced more recently by Stuart
Gathman. Basically, his idea was to associate a MAIL FROM: with a given set
of RCPT TO: addresses. This would limit the replay attack to the same
address that harvested the return path, making the replay attack useless.
This makes it unnecessary to include a body check and also makes it possible
to detect the replay attack before DATA, which is what we really want. It
is possible to use an extended precision timestamp as the database key and
it still works with one CBV. See my next post in this thread for details.
I think this beats our solution, but maybe not.

>
> But having two CBV types would require recipients to pick among separate
> CBV checking strategies for each message. For those who do CBV checks:
>
> o If they don't understand this different SES type format, or
> are simply not interested in nonstandard checks:
>
> o They'll simply do standard CBVs.
>
> o If they do understand this different SES type format and want
> to do the extended checks, then they can:
>
> o In your all-in-one CBV suggestion:
>
> o Do a standard CBV check at the first RCPT TO:
> o Reject before DATA if this CBV fails.
> o Do the checksum test after DATA
> o Reject if the checkum test fails.
>
> o In my dual-CBV suggestion:
>
> o Do a standard CBV check at the first RCPT TO:
> o Reject if the CBV fails.
> o Generate a checksum-containing MAIL FROM after DATA
> o Do a second CBV check with this generated MAIL FROM
> o Reject if this second CBV test fails.
>
> Note that if you're going to do the second CBV test, there's
> no real need to do the first one.

Except that it requires you to download the whole message, including
attachments.

>
> Now currently, for non-SES messages, CBV is so very effective that no
> one in their right mind really wants to wait until after DATA to do the
> the test. You get rid of so many forgeries so quickly there's...well,
> there's simply no reason at all to wait until after DATA; the idea is
> simply so ridiculous as to be hard to even consider.

Agreed.

>
> But.. let's assume that SES is extremely effective in actual practice.
> That is, let's assume that it's effective enough that forgers are likely
> to use/make MAIL FROMs from machines that don't sign their MAIL FROMs
> anyway.

That would be great.

>
> Given that assumption, then any addresses that you recognize as
> SES-signed are almost certainly valid. Obviously, you're going to do
> CBV checks anyway just to be sure, but unlike the general non-SES
> CBV-check case where you'll end up rejecting the vast majority of the
> emails after the first RCPT TO:, here the vast majority will presumably
> past that first before-DATA check, meaning that for that vast majority
> you'll be going through the DATA phase anyway.

If the spammers are aware that it is pointless to forge an SES return path,
why do you think they would be any more willing to forge anything else that
could be detected by the same protocol? Either a site implements SES or it
doesn't, no?

>
> So given that situation, (ie given the assumption that you're going to
> be getting to the point where you're almost always accepting DATA
> anyway), the arguments against DATA-time checks mostly disappear.

Well, the after-DATA checks kind of made SES look like a watered-down PKI
scheme, which made all of us a little twitchy, yourself included, if I
recall. I included the body check scheme in my next post which lists the
current SES variations, but I think that Stuart's RCPT TO: idea makes it
possible to foil the replay attack before DATA, so I think that one is the
best so far.

>
> So a recipient wanting to do this second CBV test could simply create
> the new MAIL FROM based on the DATA received, and do merely that second
> CBV then. (If you're going to do the second one, there's no real need
> to do the first one.)
>
> Given two types of callbacks, you get to keep and use more checksum bits
> each way, compared to having to have one MAIL FROM that works for
> everything.
>
> The one disadvantage with this two-types-of-CBV's is that you don't know
> a-priori what the good checksum will be, so you have to wait to
> CBV-check that checksum until after DATA. (You can't do any of the
> tests ahead of time.)
>
> So, you call this a complication, which..it is.
>
> I guess my question is what part of the complication is objectionable to
> you:
>
> 1. The extra work on the mail server creating a MAIL FROM?
>
> Our suggestions require equivalent work here.

Responding to the second CBV is extra work and bandwidth for the domain MX.

>
> 2. The extra work on the recipient?
>
> (Mine is only slightly more complicated, imho.)

Doing a second CBV is extra work and bandwidth for the recipient MTA.

>
> 3. The extra work on the mail server checking CBV's?
>
> (Admittedly, the mail server has to check among two types of
> validity here.)

This isn't so bad, and you do get extra information for your work.

>
> 4. The need for multiple CBV's?
>
> (You really don't need to do more than one CBV for either method.)

This is the main objection that I would have. It may not bother some people
as much, but it has been hard to get some people to accept CBV's at all.

>
> (As an aside, both of our schemes are similarly extendable to PKI-based
> techniques.)
>
> (I have to say that the need to only do only one CBV even for the
> checksumming recipients only recently occurred to me, and it does depend
> on the assumption that most SES'd messages are not forged and that
> therefore even the pre-DATA CBV tests will mostly all pass for SES'd
> messages, but I think that's a very reasonable assumption.)
>
> >
> >
>
> [good hashing discussion deleted.]
>
> > We could really use an
> > opinion from a crypto expert here.
>
> We can hash out, (haha), some other more general vulnerabilities first,
> but yeah, we do need a real crypto expert at some point.
>
>
> > Here's a slightly modified long inner hash SES address format:
> >
> > SES0=HHHH=C27=TT=local-part@domain
> >
> > where
> >
> > HHHH = first four base-64 digits of the SHA-1
> > hash of "C27=TT=local-part@domain"
> > prepended with the hash secret
> > (MSA login ID + password) for
> > local-part@domain
> >
> > C27 = 27 base-64 digits comprising the SHA-1
> > hash of the concatenation of the From:,
> > Sender:, Reply-To:, Date: and Subject:
> > headers plus the unencoded message body
> >
> > TT = first two base-32 digits of the UNIX
> > integer day number mod 1024
> >
> >
> > This is longer than what I'd prefer, but it should be very secure. The
> > outer hash is protected by the hash secret and the inner hash
> is too long to
> > brute force a forgery. This version gives us 23 characters for
> "local-part"
> > before exceeding the 64-byte limit. All CBV's are done the
> same way. The
> > body hash in the envelope sender address ties this address to
> the message
> > content and replay attacks are not feasible. Any of these
> address formats
> > would survive rewriting by SRS.
>
> Hmm. Although I'm still not convinced that single-type-callbacks are
> the best way to go...
>
> The fact that only one type of callback will exist does have the expense
> of a maximum 23 character local-part, because C27 now can't be
> shortened.
>
> Let me suggest that HHHH could still be of variable length.

It can. It's only a minor complication.

>
> With a variable-length H, SES-aware recipients would still be able to
> extract C27 and do their checksum tests.
>
> SES and non-SES aware recipients that do CBVs would still do their same
> CBVs, unchanged.
>
> On receiving a CBV test, the sending machine would be able to see from
> the encapsulated local-part what size HHHH it would have made, so it
> could still easily validate incoming CBVs on the fly.
>
> And we'd get the advantage of longer H's when possible.
>
> (Would this still survive SRS rewrites? I..get confused here.)

I can't think of any reason that it wouldn't. SRS should be able to
encapsulate any valid address and the final gateway MTA should be able to
unencapsulate it after any number of forwarding hops.

>
> Minor issues.
>
> C27: I don't know if the set of headers you suggest are the optimal
> set to use. There may be some debate on this.
>
> (I'd suggest adding Message-Id: and References: .)
>
> H4: Note of clarification:
>
> We've been talking as if H4 or H27 has to be an (hmac'd) hash.
> In reality it could be any algorithm, even random numbers generated
> by the server and kept in a database. If there are vulnerabilities
> found in doing (sha1sum(body+header+secret)), then any SES sender
> can swap out that algorithm for another completely transparently to
> the receiving machine.

Exactly. No one but the sender needs to be able to evaluate it.

>
> It's only the meaning of C27 that has to be carved in stone.

Yes, and even that could be changed if future developments in the crypto
world make a SHA-1 MAC insecure.

--

Seth Goodman
RE: CBV [ In reply to ]
On Sat, 15 May 2004, Seth Goodman wrote:

> A new idea to foil replay attacks was introduced more recently by Stuart
> Gathman. Basically, his idea was to associate a MAIL FROM: with a given set
> of RCPT TO: addresses. This would limit the replay attack to the same
> address that harvested the return path, making the replay attack useless.
> This makes it unnecessary to include a body check and also makes it possible
> to detect the replay attack before DATA, which is what we really want. It
> is possible to use an extended precision timestamp as the database key and
> it still works with one CBV. See my next post in this thread for details.
> I think this beats our solution, but maybe not.

I'll call this idea "SESR" - Signed Envelope Sender and Recipients -
to distinguish it from plain SES until someone has a better idea.
Here is my reiteration with a few more details thought about:

To validate a message, a recipient does CBV with either
a MAIL FROM of <> or a MAIL FROM selected from the message envelope recipients.
The CBV will fail for any MAIL FROM other than <> or one of the
original message recipients. This can be accomplished cryptographically
(by using one of the algorithms where any k out of m keys can
decrypt / validate a message, treating each recipient as a key), or with
a database. I am not a crypto expert, but I can hunt up where I've
seen such algorithms. **

Existing practice is to use <> as the MAIL FROM for CBV (is this
true?), as long as we always include this as a valid recipient,
we stay compatible. Recipients who still use <> will still be subject
to the limited replay attacks allowed by SES currently. However, those
recipients who use the enhanced CBV and send an address selected from a
RCPT TO for the CBV MAIL FROM enjoy greater protection.

PROBLEMS

Do we need a flag to distinguish SESR from SES in the signature? If we send
something other than <> in the MAIL FROM for a CBV to an MTA that has never
heard of this idea, will the CBV still work as intended? If not, we need
to know when to use the old <> for CBV. I am also worried that
existing CBV implementations might send something like 'postmaster@rcpt.com'
as the MAIL FROM - and this would require some special cases equivalent to
<>.

Someone posted to complain that my idea was a tortured abuse
of the original intent of <>. I admit it - it's true. However,
everything associated with SPF/SRS/SES tortures the original intent
of SMTP. We are brought to this pass by the torture and abuse inflicted
by spammers. The only relevant question for any sugested scheme is
whether it is sufficiently compatible with existing software.

**
If there are n recipients, then the algorithm uses k = 2 and m = n + 1.
The m keys are the hash secret plus the n recipients. Two keys are
needed to validate. The hash secret is used for one, so only one
of the recipients will work for the other (other than highly improbable
collisions).

--
Stuart D. Gathman <stuart@bmsi.com>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.
RE: CBV [ In reply to ]
On Sat, 15 May 2004, Stuart D. Gathman wrote:
>
> To validate a message, a recipient does CBV with either
> a MAIL FROM of <> or a MAIL FROM selected from the message envelope recipients.

It is often the case that none of the original recipients are available.

> Existing practice is to use <> as the MAIL FROM for CBV (is this
> true?)

Postfix does callouts using postmaster@.... in the reverse path.

> If we send something other than <> in the MAIL FROM for a CBV to an MTA
> that has never heard of this idea, will the CBV still work as intended?

Not necessarily. The MTA may reject non-bounces to sender-only addresses.

--
Tony Finch <dot@dotat.at> http://dotat.at/
RE: CBV [ In reply to ]
On Sun, 16 May 2004, Tony Finch wrote:

> > To validate a message, a recipient does CBV with either
> > a MAIL FROM of <> or a MAIL FROM selected from the message envelope
> > recipients.

> It is often the case that none of the original recipients are available.

If you are not one of the recipients, then you certainly don't want
the mail - unless you have arranged for it to be forwarded by
some outfit that doesn't do SRS, in which case you would have
the appropriate whitelist.

SES/SESR doesn't break sender forwarding (e.g. the hospital
sending baby pictures with your email as MAIL FROM). Although spam filters
will likely look askance at an unauthenticated sender. With cryptographic
validation, you could put the hash secret on your Palm Pilot (or Zaurus
or whatever) and have it give you a valid SES/SESR sender address
for use with sender forwarders.

> > Existing practice is to use <> as the MAIL FROM for CBV (is this
> > true?)
>
> Postfix does callouts using postmaster@.... in the reverse path.

Then this would need to be a special case: treat postmaster@... identically
to <>.

> > If we send something other than <> in the MAIL FROM for a CBV to an MTA
> > that has never heard of this idea, will the CBV still work as intended?

> Not necessarily. The MTA may reject non-bounces to sender-only addresses.

Then we would need to flag SESR and send only plain <> for plain SES.

--
Stuart D. Gathman <stuart@bmsi.com>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.
RE: CBV [ In reply to ]
Here are the latest SES methods incorporating some suggestions and
refinements for all to criticize. All of the methods below validate that
the message originated from the address in the return path, including the
local part of that address.

I) This format is the simplest and is suitable for most email senders. This
format has no protection against replay attack, so it is assumed that these
senders do not send messages to known spammers, malicious individuals or
organizations.

SES0=HHHH=TT=local-part@domain

where

HHHH = minimum of the first four base-64 digits of the
SHA-1 HMAC of "TT=local-part@domain" using a
key unique for local-part@domain

TT = minimum of the first two base-32 digits of the
integer part of the UNIX day number mod 1024;
additional digits for the integer or fractional
day parts may be added

This format has 13 characters of overhead, which allows 51 characters in the
local-part before exceeding the 64-byte limit. All messages sent out by
that local-part with the same timestamp will have the same HMAC. If this is
a concern, adding fractional day digits to the timestamp acts to salt the
hash and the HMAC value will change whenever the timestamp does. For
example, adding three factional day digits to the timestamp will give a
timestamp resolution of 2.6 seconds, which should be adequate for typical
users. This reduces the available local-part to 48 characters, which is
still fairly long.


II) A relatively small number of accounts have, by their nature, a
significant chance of having their signed return path harvested and used in
a replay attack. To partially mitigate the threat of replay attack, use the
same format as in I) above but extend the timestamp field with enough
fractional day digits to act as a unique message identifier for that
local-part on that day. By adding four fractional day digits to the
timestamp field, this gives 82msec time resolution, and a maximum of 1
million messages per day for a given local-part. This reduces the available
local-part to 47 digits, which is still fairly long.

For each local-part, the MTA maintains a list of invalidated timestamps that
are not expired. For any incoming CBV for a local-part, the MX first checks
for expiration of the timestamp, then for invalidation of the timestamp and
finally validates the SHA-1 HMAC. This method requires the owner of that
local-part address to realize that a joe-job is taking place and to request
that the specific return-path timestamp being used in the attack be
invalidated. It only requires a minimal extra burden on the originating
gateway MTA to maintain an invalidated timestamp list for each local
address.


III) Where much stronger and automatic protection against replay attacks is
desired, another option is to use II) above, but the sending MTA
additionally maintains a database of all outgoing timestamps that are not
expired for each local sending address. The database record for each
timestamp contains the list of RCPT-TO: addresses for that particular
message. Since the database records function as mailing list expansions,
there may be existing mechanisms in the MTA to accomplish the database
function.

For any incoming CBV for a local-part, the MX first checks for expiration of
the timestamp, then for invalidation of the timestamp and finally validates
the SHA-1 HMAC. If the result is a 250 response, the SMTP-client then
issues an EXPN command for "TT=local-part@domain". The SMTP-server looks up
the timestamp entry for the local-part in the database (or does mailing list
expansion, if implemented that way) and sends the SMTP-client the list of
original RCPT TO: addresses that were on the outgoing message. If the CBV
resulted in a 5xx response, the EXPN command is declined.

Since all the recipients listed on a given message will be served by the
same destination MTA, that MTA should have access to a whitelist of any
forwarding accounts that it's users have set up that forward to that MTA.
If the list of recipients does not match the list of recipients (including
whitelisted forwarders) for the current message, the MTA can reject the
message as a forgery before DATA. Just as in II) above, if the owner of a
sending account notices that a joe-job is taking place with their return
address, they can request that the specific return-path timestamp be
invalidated to give additional protection in case the final recipient does
not issue the EXPN command during the CBV.

This approach gives the best immunity to replay attacks while still being
able to detect and reject them before DATA. However, it does require the
originating gateway MTA to maintain a database with the list of all
recipients for every outgoing message. The database entries expire when the
timestamp indexing them expires.


IV) Another option for very strong protection against replay attacks is to
include a MAC in the MAIL FROM: that covers the entire DATA part, minus the
headers that are unknown or may change during message transit. This has the
advantage that no database is required at the sending end. It has two
disadvantages: first, the MAC cannot be verified until the end of the DATA
phase, and second, including a strong MAC in the MAIL FROM: reduces the
available length for the local part of the address. Here is the format:

SES0=HHHH=TT=C27=local-part@domain

where

HHHH = minimum of the first four base-64 digits of the
SHA-1 HMAC of "TT=C27=local-part@domain" using a
key unique for local-part@domain

TT = minimum of the first two base-32 digits of the
integer part of the UNIX day number mod 1024;
additional digits for the integer or fractional
day parts may be added

C27 = 27 base-64 digits comprising the SHA-1 MAC of
the From:, Sender:, Reply-To:, Date:, Subject:,
In-Reply-To: and References: headers plus the
unencoded message body; the MAC may be shortened
down to the first 14 base-64 digits, as
necessary to accommodate the local-part

This format has between 27 and 41 characters of overhead, which limits the
local-part to 37 downto 23 characters, respectively, before exceeding the
64-byte limit. The final gateway MTA should do a CBV upon receiving the
MAIL FROM: command to make sure that the message originated with the
purported sending MTA. Though the initial CBV will detect casual forgeries
and allow rejection before DATA, more sophisticated replay attacks will be
detected only at the end of DATA. The recipient MTA can still reject the
message at the end of DATA and have no further responsibility for dealing
with it.

Here's an attempt at a brief justification for a shorter MAC, only when
necessary, of course. Shortening the C27 MAC down to 14-digits still
maintains at least 80-bits, as recommended in RFC2104. Here's a
"back-of-the-envelope" justification for doing this when needed. The SHA-1
MAC takes on the _order_ of millisecond to compute with today's hardware.
In order to brute force a message to give an identical MAC shortened to 14
base-64 digits (84-bits), one would need to compute roughly 2^42 trials.
Assuming the originating MTA will expire the timestamp in two weeks, the
average MAC computation time would have to be 275nsec. This is a factor of
3600 faster than is possible today. As a hardware engineer, my opinion is
that this is _very_ unlikely in the foreseeable future with silicon-based
transistors (or SiGe, or HEMFET's, etc.), proponents of Moore's law
notwithstanding. Parallelizing the problem is certainly possible, but then
communications bandwidth becomes the limiting factor. Assuming that typical
message data covered by the MAC is 1Kbyte, this would require 29Gbits/sec
communication bandwidth, exclusive of overhead. While this is within reach
in the foreseeable future using an advanced TCA backplane architecture with
multiple bit lanes, since this means using _local_ CPU's you would need a
very large CPU farm which would be prohibitively expensive. This kind of
network bandwidth will not be available outside of a single facility for a
very long time, so the idea of using an army of hijacked PC's would not be
practical.

----------------------------------

Here is a question for those more knowledgeable than myself on the
practicalities of SMTP. What are the _practical_ consequences, aside from
breaking RFC compliance, of exceeding the 64-byte limit for the local part
of MAIL FROM: addresses? How do the majority MTA's deal with this error
today?

--

Seth Goodman


As a general question to those who know more about
RE: CBV [ In reply to ]
> From: Stuart D. Gathman
> Sent: Sunday, May 16, 2004 8:10 PM
>
>
> On Sun, 16 May 2004, Tony Finch wrote:
>
> > > To validate a message, a recipient does CBV with either
> > > a MAIL FROM of <> or a MAIL FROM selected from the message envelope
> > > recipients.

You really can't do a CBV with anything but MAIL FROM:<>. We could come up
with something that flags that this is a special query, but that changes
SMTP and would be a very tough row to hoe.

I proposed a slight variation to this that is compatible with current SMTP
in my previous post listing various ways to do SES. The recipient does a
normal null-sender CBV. If the CBV returns a 250, the recipient then issues
the EXPN command with the extended precision timestamp-indexed local address
for an argument. The SMTP-server expands this list into the list of
original RCPT TO: addresses. This mechanism is part of RFC2821, but
requires that the originating gateway MTA store the list expansion for the
timestamped local address for each outgoing message.

This accomplishes your original goal of connecting a given MAIL FROM:
exclusively with a given list of RCPT TO:'s and it does so without breaking
SMTP. Though many MTA's have shut off the EXPAND function today, they would
only need to enable it under this one very limited circumstance: the MX has
just answered a CBV with a 250, proving that the SMTP-client was in
possession of a MAIL FROM: that came from that MX, and the SMTP-client then
asks to expand _that exact_ return address to get the list of RCPT TO:
addresses. Any other attempted use EXPN could still be denied. Therefore,
this does not provide a security exploit.

I personally think this variation is the best of the bunch of the SES
proposals when replay attacks are possible, but let's see what others think.

>
> > It is often the case that none of the original recipients are available.
>
> If you are not one of the recipients, then you certainly don't want
> the mail - unless you have arranged for it to be forwarded by
> some outfit that doesn't do SRS, in which case you would have
> the appropriate whitelist.

Agreed, the recipients who use this have to whitelist their own forwarders.
I don't see this as a significant limitation.

>
> SES/SESR doesn't break sender forwarding (e.g. the hospital
> sending baby pictures with your email as MAIL FROM). Although
> spam filters
> will likely look askance at an unauthenticated sender. With cryptographic
> validation, you could put the hash secret on your Palm Pilot (or Zaurus
> or whatever) and have it give you a valid SES/SESR sender address
> for use with sender forwarders.

That is one of the nicer properties of any SES scheme: if you hand an MSA a
SES-signed return-path, that MSA can do a CBV to the appropriate MX and
validate your right to use that return path before accepting the message.
This authentication is down to the user level, not just the domain. With
this mechanism, people can again send mail from anywhere, as long as they
can authenticate who they are by providing a valid SES signature that will
pass a CBV to their domain MX.

>
> > > Existing practice is to use <> as the MAIL FROM for CBV (is this
> > > true?)
> >
> > Postfix does callouts using postmaster@.... in the reverse path.
>
> Then this would need to be a special case: treat postmaster@...
> identically
> to <>.

This is only done to support the fact that a few MTA's do CBV's with this
mechanism in violation of RFC2821. It is not a good practice and hopefully
those who do it will eventually stop.


>
> > > If we send something other than <> in the MAIL FROM for a CBV
> > > to an MTA
> > > that has never heard of this idea, will the CBV still work as
> > > intended?
>
> > Not necessarily. The MTA may reject non-bounces to sender-only
> > addresses.
>
> Then we would need to flag SESR and send only plain <> for plain SES.

Or use another mechanism that does not break RFC2821. We could flag SES
signed addresses that will expand the recipient list by using, for example,
SES1 as the initial string. I would argue against this, however, as it
would give attackers a signal as to which SES addresses could be hijacked.
I think it is better to have all the SES signed addresses look the same, and
if an MX does not care to support RCPT TO: address list expansion, it can
simply decline the EXPN command.

--

Seth Goodman

1 2 3  View All