Mailing List Archive

[Fwd: [spf-discuss] Support for Internationalized Explanations]
This message really belongs in here...

-----Forwarded Message-----
From: Chris Haynes <chris@harvington.org.uk>
To: spf-discuss@v2.listbox.com
Subject: [spf-discuss] Support for Internationalized Explanations
Date: Sun, 25 Jul 2004 14:57:06 +0100

I raised in the IETF MARID WG list (ietf-mxcomp@imc.org ) the fact that the
current protocol draft (draft-ietf-marid-spf-3-protocol-00.txt) does not appear
to permit rejection explanations to be written in languages other than English
(to be more precise, provides no support for characters other than US-ASCII).

Meng asked me to clarify my concerns.

I thought it would be a good idea to give an informal presentation here first,
so that the SPF experts here can help verify / improve the proposal (or show its
not needed) before it goes back to MARID.

-----------------------

draft-ietf-marid-spf-3-protocol-00.txt, section 5.2 " exp: Explanation" explains
how the explanation string - an expanded macro - "allows the publishing domain
to communicate further information via the SMTP receiver to legitimate senders
in the form of a short message or URL."

My observations / assumptions are:
1) This communication takes place by putting the explanation string into a
rejection message.

2) As the draft says, it is the 'legitimate senders' who are intended to read
this message, i.e. they are frequently humans who were intending to communicate
with the recipient, and who probably share with the intended recipient his/her
preferred language of communication.

3) It think it is now widely accepted across the IETF and W3C that all new
specifications etc. should provide support for all languages, not just for
English - especially where non-technical end-users may be involved.

I therefore conclude that it should be possible to create and transmit the
explanation message using a wide range of International characters.

I've checked RFC1035 section 3.3.14 (which defines DNS TXT records). As far as I
can tell, it assumes that only 7-bit US-ASCII characters may be used in the
<character-string>s which comprise a TXT record.

I'm presuming that it is impractical to change any of the DNS-related RFCs to
provide support for internationalised TXT strings.

So we need a way of representing international characters using a sequence of
DNS-compatible US-ASCII characters, so that no changes to the DNS RFCs, servers,
clients or associated infrastructure are needed.

So, given all the above assumptions and constraints, here is my proposal,
expressed informally.....

----------------
The very, very brief overview in three steps.......

A) Use the well-known '%HH' notation to encode international Unicode
characters in the DNS TXT record,

B) Extend the SPF macro expander to convert the '%HH' inserts into UTF-8 octets

C) Use MIME to label the rejection message as using the UTF-8 character set.

----------------------------
The detailed walk-through and rationale...



We permit all characters from the Unicode character set to be used in the
explanations in rejection messages.

We choose the UTF-8 encoding, which encodes each character as a sequence of one,
two or three octets.


Note:
UTF-8 has the desirable property that 7-bit, US-ASCII is a sub-set of UTF-8,
i.e. all strings written in US-ASCII are automatically also valid UTF-8 strings.


We now have to hold the sequence of UTF-8 octets in a DNS TXT record.

We do this as follows:

a) An octet which has a value <= 127 (decimal) is placed in the TXT string
unaltered (i.e. it is a US-ASCII character which is entered unmodified),

b) If an octet has a value >= 128(decimal), the value is converted into a pair
of hexadecimal characters HH. A literal '%' is inserted into the TXT, followed
by the two hexadecimal characters.


Example:
The name "Dürst" has a u-umlaut in it. The u-umlaut is represented in UTF-8 by
the two octets C3 and B3. A TXT record comprising his name would read
"D%C3%BCrst".


The above enables TXT records to contain messages comprised of any desired
sequence of Unicode characters.

The SPF processor has to be able to convert these messages into UTF-8 mail
responses, so it has to convert each '%HH' triplet back into an 8-bit octet,
which becomes part of the mailed response message.

Fortunately, section 5.2 of the protocol draft states that the explanation
String obtained from the TXT record is a macro string that is to be
macro-expanded before being shown to the sender.

All we need to do is extend the scope of the macro language to add a new macro
definition (section 7.1) to cover this.


The ABNF definition of 'macro-char' has appended to it a further alternative

/ "%" hex hex

and, still using RFC2234 notation, we add the definition

hex = DIGIT / %x41 - 46 / %x61 - 66 ; 0-9 / A-F / a-f


Note:
I've permitted both upper-case and lower-case letters. I could have used
RFC2234's HEXDIG, but that would have allowed only upper-case letters to be
used.



We re-word the sentence stating that

"A '%' character not followed by .. must be interpreted as a literal."

to give validity to the 'HH' character pair.


We add a sub-section stating that

"The message string is represented as a sequence of octet values.

"US-ASCII characters in the expanded message are represented by their
(single) octet values.

"Each three-character sequence %HH in the macro sequence, where each
character H is a character in 'hex', expands into a single octet value, which
replaces the three characters in the macro-string.

"Once all such %HH substitutions have been made the resulting sequence of
octets is a valid UTF-8 octet sequence, and is the response message."



All SPF response messages MUST use MIME and be described as having a character
set of "UTF-8".

Example - for a plain response message encoded according to RFC1522, the
response headers would include

MIME-Version: 1.0
Content-type: text/plain; charset=UTF-8

---------------------------------
Compatability with early SPF implementations:

This proposal invites SPF record-writers to use %HH sequences in the TXT
explanation values.

Current SPF macro processors will not be expecting the '%HH' characters and are
required to pass them as literal characters. Strictly-speaking, existing macro
parsers are required to copy through the "%" as a literal; the two HH characters
should then be parsed as VCHARs and also copied through.

Use of the %HH by record-writers will thus not break existing SPF-compliant
macro processing systems.

(However, this proposal goes against the draft's recommendation that
record-writers 'SHOULD NOT rely' on the above literal pass-through).


-------------------------
Changes needed to early SPF implementations:

1) TXT record publishers to use the %HH method to encode any desired
international characters,

2) Macro parsers need to detect and convert the %HH triplets into octet values,

3) Message-handling code is to ensure that the rejection explanation produced
by the macro processor can comprise full 8-bit octet value sequences (e.g. it is
stored as a byte array; the output is _not_ a sequence of characters),

4) All rejection messages MUST be flagged as using "charset="UTF-8", using the
appropriate MIME headers.

-------------------------
Note:
To those who read my original MARID posting...
I confused things there by also referring to URIs and IRIs. In the above I've
simply 'borrowed' the %HH UTF-8 encoding method used by the IRI internet
draft - which we can't reference because unrelated ietf drafts should not
reference one another. There is no functional interdependence between what they
are doing and what we are doing here, I am just proposing to solve the same
problem in the same way, leveraging all the detailed research and analysis that
they did.


--------------------------------
Comments, error corrections and omissions invited.

If all goes well, I'll then prepare a much briefer proposal for MARID,
concentrating on the proposed changes to the text of the protocol draft.


Chris Haynes


-------
Sender Policy Framework: http://spf.pobox.com/
Archives at http://archives.listbox.com/spf-discuss/current/
Send us money! http://spf.pobox.com/donations.html
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname=spf-discuss@v2.listbox.com
--
James Couzens,
Programmer
( ( (
((__)) __lib__ __SPF__ '. ___ .'
(00) (o o) (0~0) ' (> <) '
---nn-(o__o)-nn---ooO--(_)--Ooo--ooO--(_)--Ooo---ooO--(_)--Ooo---

http://libspf.org -- ANSI C Sender Policy Framework library
http://libsrs.org -- ANSI C Sender Rewriting Scheme library
-----------------------------------------------------------------
PGP: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x7A7C7DCF

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname=spf-devel@v2.listbox.com
Re: [Fwd: [spf-discuss] Support for Internationalized Explanations] [ In reply to ]
"James Couzens" opined:
>This message really belongs in here...

Really??

My message is about changes to the overall protocol.

I thought this list was intended just for those 'developing an SPF client'
(quote from spf.pobox.com).

Chris



-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname=spf-devel@v2.listbox.com
RE: [Fwd: [spf-discuss] Support for Internationalized Explanations] [ In reply to ]
How about we scrap both/all implementations, SPF and SRS alike,
sit down, document a new API and framework, and work on it
together as friends? :)

If the SPF battle goes on long enough people will just get bored
of it and switch - see XFree86.. Microsoft have a similar (if not
better, in my opinion) implementation that while is cooperating
with SPF, may as well become the defacto standard if SPF
implementation issues won't be resolved amicably. At which point
SPF as a standard and a name for libraries becomes moot..

--
Matt Sealey <matt@genesi.co.uk>
Genesi, Manager, Developer Relations

-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?listname=spf-devel@v2.listbox.com