Mailing List Archive

Answers to some queries
I've seen some comments on IRC, so I thought I'd better clarify them,
since I'm in and out of buildings at the moment.

DATABASING IS OPTIONAL
======================

I think people are being confused by the fact that I offered two systems:

One conforming to the SRS "standard" and possibly allowing compaction
without store. (The SRS rewriting system).

The other offering the SRS functionality, but within the existing SMTP
rules. (The database driven system).

The choice of system has significant impact on what data goes where.

On Fri, 6 Feb 2004, Shevek wrote:

> What information should be encoded into the SRS address?
>
> Clearly required fields:
> * Original sender.

In general, there is no database. In that case, in order to do the
reversal, this must be embedded in the SRS address. The secondary system
in my Perl code offers the ability to use the database, in which case all
that is required is the cryptographic hash. In general, I do not see
people using the database solution, unless they have a particular need to
store more data than is stored in the standard SRS format.

IF someone does choose to use a database, they could use either the
standard SRS remailing address as the key or just the hash (which is what
mine does). In the first case, compaction is possible. In the second, it
is not.

> * Cryptographic hash with secret to avoid remailing.
>
> Useful fields:
> * Timestamp.

Again, IF you have a database, then this is not required. And if you want
so much more of a complex system, then you can probably run a database
anyway.

If you're going to limit the number of bounces that can go through a
particular SRS address, then you need to store a counter per address,
which means you're running a database anyway, which means ... use the
database solution.

CRYPTOGRAPHY AND FORMAT
=======================

About the cryptography and whether the recipient host can decode the SRS
to work out where the original mail came from:

I want to remove the concept of "reversible hash" from the discussion.
There's no such concept, and even if there was, there wouldn't be any
point. There are N bytes of data in the sender address (where there is no
bound on N), and simple information theory says that you can't compress N
bytes of arbitrary data into a 48-byte MD5 hash.

Cryptographically, the best way to do it is to give the data in-clear and
add a MAC (message authentication code). I mean, simply, what is desired
is that we pass the information, and prove it authentic. This is served by
passing... the information and an unforgeable authentication code.

PRIVACY AND ANONYMITY
=====================

This is where the database-driven solution comes into its own. You want to
forward a mail. You don't want to pass on the sender address. But you want
to be able to return bounces. The database solution provides all of these.
But if you don't want to pass on the sender address, then you have to
store the mapping somewhere. I happened to use a DBM because it was the
simplest solution available. You may well find the same. Just put the
fields into a struct and add it as a value for the hash key.

I'm afraid I'm not at a terminal all the time. I can work to whatever
schedule people want to set, but please mail me to keep me updated, since
I am likely to miss things on IRC.

S.

--
Shevek http://www.anarres.org/
I am the Borg. http://www.gothnicity.org/

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname@Ë`Ì{5¤¨wâÇSÓ°)h
Re: Answers to some queries [ In reply to ]
Shevek <shevek@anarres.org> [2004-02-06/20:24]:
> I think people are being confused by the fact that I offered two
> systems [...]

This too, is one of the things that was not entirely clear to me from
reading the manual page of the "original" Mail:SRS -- whether or not the
module keeps state or not.


> In general, I do not see people using the database solution, unless
> they have a particular need to store more data than is stored in the
> standard SRS format.
>
> IF someone does choose to use a database, they could use either the
> standard SRS remailing address as the key or just the hash (which is
> what mine does). In the first case, compaction is possible. In the
> second, it is not.

Compaction is automatically done already when you only use a unique
random short string (with say 128 bits of entropy) as the rewritten
local part and key into the database. I can imagine short addresses to
be a good reason for using the database approach. The middlemen cannot
be cut later on. But as the primary reason for cutting the middle man is
to shorten the address, with the bounce travel through less mailservers
being only a positive side-effect, I personally think that's ok.


> If you're going to limit the number of bounces that can go through a
> particular SRS address, then you need to store a counter per address,
> which means you're running a database anyway, which means ... use the
> database solution.

But as noted in my last post, I think limiting the number of bounces per
address is harmful, as there is no way you know how many bounces will be
legitimately generated from this one message you resend.


> I can work to whatever schedule people want to set, but please mail me
> to keep me updated, since I am likely to miss things on IRC.

Me too. Please inform the list of anything important that was discussed,
if you hope to get a comment from me :)

Cheers,
Dan


--
Daniel Roethlisberger <daniel@roe.ch>
OpenPGP key id 0x804A06B1 (1024/4096 DSA/ElGamal)
144D 6A5E 0C88 E5D7 0775 FCFD 3974 0E98 804A 06B1
!->

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname@Ï#ÄÏÉæGã!'Rzš´ˆ»£‡Æ~3com
Re: several messages [ In reply to ]
I think a lot of the comments in this mail come down to the fact that
Mail::SRS 0.11 and 0.12 are a total and utter rewrite of Mail::SRS and do
not do any escaping/encoding. So...

On Fri, 6 Feb 2004, Daniel Roethlisberger wrote:

> - Make sure SRS works with IDN. Currently, with IDNA punycode, the only
> thing to note is the double dashes (--) in domain names.

The new Mail::SRS which I wrote is a complete throw-out-and-rewrite of
Mail::SRS. It does not use double dashes (--). I do not know enough about
Punycode to comment further, but from a brief reading of the RFC, I think
we're now fine.

> But if we assume that the domain name can only consist of [a-zA-Z.-],
> we might have a nasty surprise when one day somebody decides to
> extend the characters allowed in SMTP to those allowed by DNS (though
> little known, DNS itself imposes no restrictions on the character set
> whatsoever; see RFC 2181 for details).

This would be a pain in the arse, since it's generally assumed by SMTP
that neither the @ nor the % is used, and frequently + and = are also
assumed "available". The point of the new encoding is that we only need
make one assumption. I chose + fairly arbitrarily out of the choices. It
can't be - because that _is_ used in current domain names.

> - Make sure SRS works with the full address specification as per RFC
> 2821/2822, including the quoted form (which can get nasty when taking
> messages apart and replacing parts).

I will have a look at this. Can you please give some test examples that I
can put in the testsuite? This is presumably just a question of writing or
using an appropriate address parser.

> - Escaping some parts of the address, and not others, seems to be more
> prone to implementation errors: too much en/decoding, or not enough.
[AND]
> For these reasons, I am all in favour of encoding everything, including
> the original sender domain, and not relying on any assumptions about the
> domain names, neither for the encoding/decoding of special characters,
> nor for the parsing. If SRS is going to be a widely adopted standard, it
> should be as "clean" and straightforward to implement as possible.

I do not escape _any_ part of the address. It is no longer necessary.
This should be much simpler to implement. Simply count the + signs.

Well, the timestamp is a 2-digit base64 number, which is simple enough,
but that isn't quite what you're referring to as "encoding", I don't
think.

It's also important to consider that if we don't escape, then the
addresses are still visible to ordinary regexp engines, simple sh and sed
scripts, etc. This is a major point in favour of the nonencoding method.

> Encoding the full unix epoch time into the address would serve as the
> required "datestamp", and additionally allow to track the time the
> message was resent, and make sure that the same rewritten address will
> not be used too many times.

It would also take 8 bytes of base64 at 6 bits per byte. Currently I use
2, and wrap around every 64K days. This means that with a 1 month window,
the chances of getting a random valid timestamp are 1 in 2114. If we want
to move this up to 8 bytes and have a full Unix timestamp, please say so
now.

> Have you considered the special case of locally generated messages with
> legitimately faked envelope sender address, with nonlocal receipient?

If I understand correctly what you're saying, then surely SPF would block
these messages. I'm not sure that I quite understand though. You must send
such messages out with a sender @this-host, in which case your hackish MSA
must perform the SRS transformation before submitting to sendmail.

This requires modifications to MSAs such as PHP? Or would it be sufficient
to add it to sendmail when used as an MSA as well as a forwarding MTA?

> One case where this kind of rewriting will happen: say I ssh into my
> private host "gateway.private.net" and use mutt or whatever local MUA,
> or I use Squirrelmail via the web, (which both can directly use the
> local sendmail command for sending the message). Then I send mail with a
> nonlocal sender address "me@somecorp.com" because it is impossible or
> inconvenient to use somecorp's real mailserver to send the message.

This is surely exactly what SPF is trying to prevent: Messages being sent
from arbitrary hosts and arbitrary domains.

> While this scenario may look somewhat constructed, I really do have that
> scenario here (both webmail and local MUA), and I'm confident that
> people will find other reasons why we must cater for rewriting such
> messages too.

That should be handled in the MSA.

On Fri, 6 Feb 2004, Daniel Roethlisberger wrote:

> Shevek <shevek@anarres.org> [2004-02-06/20:24]:
> > I think people are being confused by the fact that I offered two
> > systems [...]
>
> This too, is one of the things that was not entirely clear to me from
> reading the manual page of the "original" Mail:SRS -- whether or not the
> module keeps state or not.

The original Mail::SRS offered only one system. Anything below 0.11 is the
"old" system and was totally unaware of the possible existence of the
"new" system. The documentation overview for the "new" system is not yet
complete; I haven't actually slept this week, and I'm working on it.

> > IF someone does choose to use a database, they could use either the
> > standard SRS remailing address as the key or just the hash (which is
> > what mine does). In the first case, compaction is possible. In the
> > second, it is not.
>
> Compaction is automatically done already when you only use a unique
> random short string (with say 128 bits of entropy) as the rewritten
> local part and key into the database. I can imagine short addresses to
> be a good reason for using the database approach. The middlemen cannot
> be cut later on. But as the primary reason for cutting the middle man is
> to shorten the address, with the bounce travel through less mailservers
> being only a positive side-effect, I personally think that's ok.

What I meant here was that if this goes through Yet Another SRS Stage,
then that SRS stage has no option but to forward the bounces to the
database host. It can't find out the _original_ sender. This is not
necessarily a bad thing, and certainly does not represent a problem of any
sort.

S.

--
Shevek http://www.anarres.org/
I am the Borg. http://www.gothnicity.org/

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname@Ë`Ì{5¤¨wâÇSÓ°)h
Re: Re: several messages [ In reply to ]
Shevek <spf@anarres.org> [2004-02-06/22:32]:
> > But if we assume that the domain name can only consist of [a-zA-Z.-],
> > we might have a nasty surprise when one day somebody decides to
> > extend the characters allowed in SMTP to those allowed by DNS (though
> > little known, DNS itself imposes no restrictions on the character set
> > whatsoever; see RFC 2181 for details).

> This would be a pain in the arse, ...

I agree that such a change would have severe consequences for a lot of
software. Nevertheless, it has been proposed before, eg.:
http://cr.yp.to/djbdns/idn.html
http://www.apng.org/idns/

Somebody might decide to run their private, experimental, fully
8bit-domain capable mail environment.

I am just saying that the encoding and parsing might as well be designed
without assuming anything on behalf of the domains, it doesn't get more
complicated, quite the opposite for my taste.

> ... since it's generally assumed by SMTP that neither the @ nor the %
> is used, and frequently + and = are also assumed "available".

Not sure whether you mean just the domain, or the local part as well.

It should not be assumed that +, % or = are not used in local parts.
Even the @ can be there, if it is escaped or quoted or both (ie,
"foo@bar"@somewhere.com, "foo\@bar"@domain.com and foo\@bar@domain.com
are all equivalent and valid addresses as per RFC 2821/2822 unless my
ability to read BNF has utterly failed me; though I'm not sure whether
all M[STU]A software correctly regard them as valid; I guess some
don't).

As for domains, yes, SMTP specifies that no special characters except
the dash and the dot must be present in domain parts (but we don't have
to rely on this).

> The point of the new encoding is that we only need make one
> assumption. I chose + fairly arbitrarily out of the choices. It can't
> be - because that _is_ used in current domain names.

Why cannot it be '-'?

'some-thing--weird' becomes 'some--thing----weird' ('-' as escape
character, like '\' in the shell) or 'some--thing---weird' (prepend '-'
to all sequences of '-', like the '.' in the response to the POP3 DATA
command)

> I do not escape _any_ part of the address. It is no longer necessary.
> This should be much simpler to implement. Simply count the + signs.

The '+' is not safe unless you encode every non-delimiter occurence of
'+' into something like '++' or '_+' (with the latter, '_' would become
'__' too). And not forget about ignoring the escaped pluses when
counting them.

No character allowed in local parts is safe, and as we must restrict
ourselves to characters allowed in local parts ... I guess we are out of
luck here, and by nature of the problem, we *must* do some form of
encoding/escaping. Please do correct me if I'm totally wrong here.


> > - Make sure SRS works with the full address specification as per RFC
> > 2821/2822, including the quoted form (which can get nasty when
> > taking messages apart and replacing parts).

> I will have a look at this. Can you please give some test examples
> that I can put in the testsuite? This is presumably just a question of
> writing or using an appropriate address parser.

See above. Note that !#$%&'*+-/=?^_`{|}~ can appear in local parts
unquoted, and that pretty much everything us-ascii can appear as a
quoted pair, that is prepended with backslash (like \@). Quoted local
parts ("blah"@) can contain everything us-ascii except white-space
controls, backslash and the double quote (but the exception can appear
in quoted local parts when escaped with backslash).

I hope this was clear enough (and correct), if in doubt, check RFC 2821
and 2822, the definitions are spread over the two documents in BNF (look
for the productions for 'Local-part' and recurse from there).

> Well, the timestamp is a 2-digit base64 number, which is simple
> enough, but that isn't quite what you're referring to as "encoding", I
> don't think.

No, with encoding I meant only the encoding/escaping of characters that
have a special meaning in the rewriting scheme, ie. the delimiters, plus
the '@' of the original sender address. Everything which could
potentially mess up the parsing. (yes, I've also used the term encoding
for encoding the time, sorry about that)

> It's also important to consider that if we don't escape, then the
> addresses are still visible to ordinary regexp engines, simple sh and
> sed scripts, etc. This is a major point in favour of the nonencoding
> method.

Yes, absolutely. But as pointed out above, you cannot avoid encoding
entirely, and once you start encoding, you end up encoding at least a
small number of characters in order to end up with a fully bijective
encoding (the character replacing the @, the delimiter, and the chosen
magic escape character(s) depending on the encoding/escaping used; some
of them could be the same magic character, though that will make the
scheme somewhat less comprehensible).


> > Encoding the full unix epoch time into the address would serve as
> > the required "datestamp", and additionally allow to track the time
> > the message was resent, and make sure that the same rewritten
> > address will not be used too many times.

> It would also take 8 bytes of base64 at 6 bits per byte. Currently I
> use 2, and wrap around every 64K days. This means that with a 1 month
> window, the chances of getting a random valid timestamp are 1 in 2114.

Which is longer than any spammer will walk this earth (and the secret
used in the MAC should not be the same for such a long period of time
anyway :)).

> If we want to move this up to 8 bytes and have a full Unix timestamp,
> please say so now.

I agree that is not necessary, and saving 4 bytes might be reason enough
to restrict ourselves to days mod 2^16 (though we might make use of a
third byte without getting a longer base64 string).


> > Have you considered the special case of locally generated messages
> > with legitimately faked envelope sender address, with nonlocal
> > receipient?

> If I understand correctly what you're saying, then surely SPF would
> block these messages.

Yes, but not SPF on the local host, rather SPF on the host such a
message gets sent to. SPF would not be applied to authenticated SMTP
connections, at least my users require the ability to relay with
arbitrary envelope senders, because their mailer cannot send mail with
different envelope sender and From: line.

> You must send such messages out with a sender @this-host, in which
> case your hackish MSA must perform the SRS transformation before
> submitting to sendmail. This requires modifications to MSAs such as
> PHP? [...] That should be handled in the MSA.

Sendmail is the MSA. So I assume you mean the piece of software calling
the sendmail binary?

It is infeasible to fix every webmail application and local mail client
(MUA) which supports multiple "identities" to either send mail with a
local envelope sender no matter what the From: line says, or have them
do authenticated SMTP to a different SMTP server depending on the sender
address.

> Or would it be sufficient to add it to sendmail when used as an MSA as
> well as a forwarding MTA?

I currently apply return path rewriting to every message leaving my
server which has a non-local return path. There are three kinds of
messages that fall into this category:

a) those forwarded by an aliases or .forward style forwarding,
b) those which were generated locally by an MUA via the sendmail
(or exim or whatever) binary, and
c) those accepted for relay by authenticated SMTP.

Whether b) and c) are possible on a given server depends on the setup; I
know that I depend on that functionality, and if any given return path
rewriting scheme cannot be adopted to work in those situations, I'd be
highly unlikely to adopt it. If SRS should be widely adopted, it should
work in as many situations as possible without breaking the concept as a
whole.

Yes, b) and c) are a different problem than a), but they all have the
very same solution (rewriting the return path), and it would be stupid
not to solve both with the same solution (read: piece of code).


Cheers,
Dan



--
Daniel Roethlisberger <daniel@roe.ch>
OpenPGP key id 0x804A06B1 (1024/4096 DSA/ElGamal)
144D 6A5E 0C88 E5D7 0775 FCFD 3974 0E98 804A 06B1
!->

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname@Ï#ÄÏÉæGã!'Rzš´ˆ»£‡Æ~3com
Re: Re: several messages [ In reply to ]
----- Original Message -----
From: "Shevek" <spf@anarres.org>
To: <spf-devel@v2.listbox.com>
Sent: Friday, February 06, 2004 11:32 PM
Subject: [spf-devel] Re: several messages

Hello Shevek,

Nice to meet you. :) I have not looked at Mail::SRS 0.13 yet (only 0.12),
but reading your below comment, a question popped up nonetheless.

> I do not escape _any_ part of the address. It is no longer necessary.
> This should be much simpler to implement. Simply count the + signs.

Then how will you deal with sendmail's "plussed users"?

root+db: root, dbadmin@server.db.here.edu

Which makes <root+db@dbadmin@server.db.here.edu> a perfectly valid address.

Cheers,

- Mark

System Administrator Asarian-host.org

---
"If you were supposed to understand it,
we wouldn't call it code." - FedEx

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname@Ë`Ì{5¤¨wâÇSÓ°)h
Re: Re: several messages [ In reply to ]
----- Original Message -----
From: "Mark" <admin@asarian-host.net>
To: <spf-devel@v2.listbox.com>
Sent: Saturday, February 07, 2004 10:45 AM
Subject: Re: [spf-devel] Re: several messages


> Which makes <root+db@dbadmin@server.db.here.edu> a perfectly valid
> address.

Doh. I did not have my coffee yet; I meant:

Which makes <root+db@server.db.here.edu> a perfectly valid address.

- Mark

System Administrator Asarian-host.org

---
"If you were supposed to understand it,
we wouldn't call it code." - FedEx

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname@Ë`Ì{5¤¨wâÇSÓ°)h
Re: Re: several messages [ In reply to ]
On Sat, 7 Feb 2004, Mark wrote:

> Hello Shevek,

Hi,

> Nice to meet you. :) I have not looked at Mail::SRS 0.13 yet (only 0.12),
> but reading your below comment, a question popped up nonetheless.

Things change again a little for 0.13. The API simplifies to match what
will presumably be the C API, documentation improves considerably, and
there are interactive and code examples in the distribution which I also
recommend strongly to anyone with queries about this proposed
implementation.

> > I do not escape _any_ part of the address. It is no longer necessary.
> > This should be much simpler to implement. Simply count the + signs.
>
> Then how will you deal with sendmail's "plussed users"?
>
> root+db: root, dbadmin@server.db.here.edu
>
> Which makes <root+db@dbadmin@server.db.here.edu> a perfectly valid address.

If we were to rewrite it forwards, since the username ends up in the last
field, the + really doesn't matter and will be preserved without escaping.

It isn't a valid SRS address since it doesn't start with ^srs\d, so it
won't get reversed.

I strongly recommend running "make teach" in version 0.13.

I need to add a lot more cases to the test suite to test for failure
cases. Most of the cases in there are success cases at the moment, except
for the individual subsystem tests. People are encouraged to submit test
cases, preferably difficult ones, and especially if you can find any for
which the code does the wrong thing. I hope there will be none of these
last.

Thank you for your questions.

S.

--
Shevek http://www.anarres.org/
I am the Borg. http://www.gothnicity.org/

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname@Ë`Ì{5¤¨wâÇSÓ°)h