Mailing List Archive

Message-ID with IPv6 domain-literal
An unknown MUA (user agent header removed by sender) writes its Message-IDs as <omissis@[IPv6::ffff:193.168.1.30]>.

Is the header syntactically corrext?

A custom SpamAssassin rule added a penalty for syntax error, and another for using a non-public address.
Re: Message-ID with IPv6 domain-literal [ In reply to ]
On 9/21/21 7:09 AM, Rupert Gallagher wrote:
> An unknown MUA (user agent header removed by sender) writes its
> Message-IDs as <omissis@[IPv6::ffff:193.168.1.30]>.

Ew.

> Is the header syntactically corrext?

After looking at EBNF from RFC 5322 for 90 seconds, I /think/ that it is
using obs-id-right syntax. -- I say think because I see the left and
right square brackets are part of domain-literal, which chains up to
obs-id-right which itself chains up to message-id. But I stopped at the
dtext and didn't check to see if the colon character is allowed or not.

> A custom SpamAssassin rule added a penalty for syntax error, and another
> for using a non-public address.

I get the penalty for the syntax error.

But why the penalty for using non-public addresses* in a Message-ID: string?

I was not aware that Message-ID had any requirements that the content
had to mean anything beyond being syntactically correct. As such I
would expect private / non-globally routed content to be allowed. After
all, isn't the purpose of the Message-ID to be a universally unique
identifier? If so, why does it matter what the contents is as long as
it's syntactically correct? What am I missing?



--
Grant. . . .
unix || die
Re: Message-ID with IPv6 domain-literal [ In reply to ]
On 2021-09-21 at 12:25:30 UTC-0400 (Tue, 21 Sep 2021 10:25:30 -0600)
Grant Taylor <gtaylor@tnetconsulting.net>
is rumored to have said:

> But why the penalty for using non-public addresses* in a Message-ID: string?

Empirical evidence. The use of a non-public address in a Message-ID correlates to a message being spam. In my experience, so does using an IP literal of any sort in a Message-ID, but that may be an idiosyncrasy in my mail.

> I was not aware that Message-ID had any requirements that the content had to mean anything beyond being syntactically correct. As such I would expect private / non-globally routed content to be allowed. After all, isn't the purpose of the Message-ID to be a universally unique identifier? If so, why does it matter what the contents is as long as it's syntactically correct? What am I missing?

Private IP addresses in general cannot specify globally unique devices (consider 127.0.0.1 or the very-popular 192.168.1.1) and therefore a Message-ID using an IP literal as the RHS part with a non-public IP cannot assure uniqueness.
Re: Message-ID with IPv6 domain-literal [ In reply to ]
On Tue, 21 Sep 2021, Bill Cole wrote:

> On 2021-09-21 at 12:25:30 UTC-0400 (Tue, 21 Sep 2021 10:25:30 -0600)
> Grant Taylor <gtaylor@tnetconsulting.net>
> is rumored to have said:
>
>> But why the penalty for using non-public addresses* in a Message-ID: string?
>
> Empirical evidence. The use of a non-public address in a Message-ID correlates to a message being spam. In my experience, so does using an IP literal of any sort in a Message-ID, but that may be an idiosyncrasy in my mail.
>
>> I was not aware that Message-ID had any requirements that the content had to mean anything beyond being syntactically correct. As such I would expect private / non-globally routed content to be allowed. After all, isn't the purpose of the Message-ID to be a universally unique identifier? If so, why does it matter what the contents is as long as it's syntactically correct? What am I missing?
>
> Private IP addresses in general cannot specify globally unique devices (consider 127.0.0.1 or the very-popular 192.168.1.1) and therefore a Message-ID using an IP literal as the RHS part with a non-public IP cannot assure uniqueness.

That is valid for Private IP addresses.

However "[IPv6::ffff:193.168.1.30]" is the representation of IPv4: 193.168.1.30
which is a Public IP address, thus that 'hit' is in error.
This should be considered a parsing bug.


--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: Message-ID with IPv6 domain-literal [ In reply to ]
On 9/21/21 11:03 AM, Bill Cole wrote:
> Empirical evidence. The use of a non-public address in a Message-ID
> correlates to a message being spam. In my experience, so does using an
> IP literal of any sort in a Message-ID, but that may be an idiosyncrasy
> in my mail.

Fair enough. To each their own.

> Private IP addresses in general cannot specify globally unique devices
> (consider 127.0.0.1 or the very-popular 192.168.1.1) ...

Agreed. However, I don't think the non-uniqueness of the IP address
actually matters.

> ... therefore a Message-ID using an IP literal as the RHS part with a
> non-public IP cannot assure uniqueness.

The use of a domain name or IP literal is RECOMMENDED, not even a
SHOULD, much less MUST.

The thing that MUST be the case is that the message ID is unique. So to
me, it doesn't matter if multiple servers use the same IP literal (or
domain name) as long as the entire message ID is universally / globally
unique.

I am still not seeing anything beyond RECOMMENDED that states that the
RHS of a message ID needs to have any form of uniqueness. Hence why I
think it's okay for multiple systems to have the same RHS.

Aside: I agree that the RHS ideally is universally / globally unique to
separate divide the message ID space such that it's per sending system.

I simply don't see any requirement for the RHS of the message ID to be
unique. In fact I only see a requirement for the message ID in it's
entirety to be unique.

I guess this is a "spirit of the RFC" (RHS = unique) vs "letter of the
RFC" (LHS + RHS = unique) type thing.

What am I missing?



--
Grant. . . .
unix || die
Re: Message-ID with IPv6 domain-literal [ In reply to ]
Grant Taylor <gtaylor@tnetconsulting.net> writes:

> What am I missing?

You are missing that SA is not a standards conformance test suite. It
is a tool to guess if a message is spam. Bill said that some forms of
Message-ID are correlated with spamminess. So whether the form that is
correlated is compliant to the spec or not is not a relevant question.
Re: Message-ID with IPv6 domain-literal [ In reply to ]
On 9/21/21 2:00 PM, Greg Troxel wrote:
> You are missing that SA is not a standards conformance test suite. It
> is a tool to guess if a message is spam. Bill said that some forms of
> Message-ID are correlated with spamminess. So whether the form that is
> correlated is compliant to the spec or not is not a relevant question.

Fair enough.

Rupert's original question was about syntax, which seems to be more RFC
based than convention applied by SpamAssassin. This seems perfectly
legitimate to me, just different than what I understood Rupert's
question to be about.

Thank you for clarification.



--
Grant. . . .
unix || die
Re: Message-ID with IPv6 domain-literal [ In reply to ]
Grant Taylor <gtaylor@tnetconsulting.net> writes:

> On 9/21/21 2:00 PM, Greg Troxel wrote:
>> You are missing that SA is not a standards conformance test suite. It
>> is a tool to guess if a message is spam. Bill said that some forms of
>> Message-ID are correlated with spamminess. So whether the form that is
>> correlated is compliant to the spec or not is not a relevant question.
>
> Fair enough.
>
> Rupert's original question was about syntax, which seems to be more
> RFC based than convention applied by SpamAssassin. This seems
> perfectly legitimate to me, just different than what I understood
> Rupert's question to be about.
>
> Thank you for clarification.

It could be a fair question if a SA plugin/rule is trying to evaluate
"is this field correct according to the standards", and gets that wrong,
as a separate issue from "is it a clue of spam". I mean that a rule
that is "MESSAGE_ID_SYNTAX_ERROR" is buggy even if it fires on spammy
but legit message ids, but that the same rule called
"MESSAGE_ID_IS_ICKY" isn't buggy.

As a separate comment, I didn't go read the RFC, but my quick reaction
about the message-id values with IPv6 literals with embedded IPv4
addresses was: these are not reasonable values, and reasonable software
would not emit them. So to me, the question of whether they are
technically compliant was not likely to be that important, within the
context of spam filtering.

Greg
Re: Message-ID with IPv6 domain-literal [ In reply to ]
My mistake in quoting. The IP was 192.168.1.30, a LAN address.

-------- Original Message --------
On Sep 21, 2021, 19:25, Dave Funk < dbfunk@engineering.uiowa.edu> wrote:
On Tue, 21 Sep 2021, Bill Cole wrote:
> On 2021-09-21 at 12:25:30 UTC-0400 (Tue, 21 Sep 2021 10:25:30 -0600)
> Grant Taylor <gtaylor@tnetconsulting.net>
> is rumored to have said:
>
>> But why the penalty for using non-public addresses* in a Message-ID: string?
>
> Empirical evidence. The use of a non-public address in a Message-ID correlates to a message being spam. In my experience, so does using an IP literal of any sort in a Message-ID, but that may be an idiosyncrasy in my mail.
>
>> I was not aware that Message-ID had any requirements that the content had to mean anything beyond being syntactically correct. As such I would expect private / non-globally routed content to be allowed. After all, isn't the purpose of the Message-ID to be a universally unique identifier? If so, why does it matter what the contents is as long as it's syntactically correct? What am I missing?
>
> Private IP addresses in general cannot specify globally unique devices (consider 127.0.0.1 or the very-popular 192.168.1.1) and therefore a Message-ID using an IP literal as the RHS part with a non-public IP cannot assure uniqueness.
That is valid for Private IP addresses.
However "[IPv6::ffff:193.168.1.30]" is the representation of IPv4: 193.168.1.30
which is a Public IP address, thus that 'hit' is in error.
This should be considered a parsing bug.
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: Message-ID with IPv6 domain-literal [ In reply to ]
A LAN address is not the "Internet address of the particular host", and therefore, by RFC 5322 line 969, the header in the OP is not RFC compliant.

-------- Original Message --------
On Sep 21, 2021, 20:54, Grant Taylor wrote:

The use of a domain name or IP literal is RECOMMENDED, not even a
SHOULD, much less MUST.
Re: Message-ID with IPv6 domain-literal [ In reply to ]
On 9/23/21 2:38 AM, Rupert Gallagher wrote:
> A LAN address is not the "Internet address of the particular host", and
> therefore, by RFC 5322 line 969, the header in the OP is not RFC compliant.

Sure it is. What you refer to as a "LAN address" is in fact an Internet
(Protocol) address just like what you re referring to as an "Internet
address". The only effective difference is the values assigned to them.
Both of them function identically from a technological stand point.
Particularly as viewed by the end systems.

The only difference between them is an agreed upon convention of how /
where the different address values are used. -- That is a human
imposed requirement and definitely not a technical requirement.

The closest that it comes to a technical requirement is that the
Internet at large does not have routes for RFC 1918 IP addresses. This
lack of routes has been chosen by humans for the aforementioned agreed
upon convention. There is no /technical/ reason that RFC 1918 IP
addresses can't be routed across the Internet. -- We have all
experienced leaks of RFC 1918 addresses at some point.

What's more is that RFC 5322 § 3.6.4 ¶ 5 states: The message identifier
is intended to be machine readable and not necessarily meaningful to humans.

Further, the entire message ID is what's to be globally unique. And
using a domain or a globally routed IP address via domain-literal on the
RHS is the RECOMMENDED way to achieve global uniqueness. But there are
other ways.

If we take meaning for humans out, we can have something like the
"Message-ID: <omissis@43f011297907b952855484a6635191ff>"

That's the same domain-literal converted to an MD5 hash. It complies
with obs-id-right -> domain -> obs-domain -> atom -> atext.

"Message-ID: <omissis@[IPv6::ffff:193.168.1.30]>"

So why does "43f011297907b952855484a6635191ff" work for the id-right
when you say that "[IPv6::ffff:193.168.1.30]" doesn't work for the
id-right? They are both the same information, just different
representations.

If you don't like MD5 because it's lossy, how about Base64
"W0lQdjY6OmZmZmY6MTkzLjE2OC4xLjMwXQ==".

You seem to be enforcing that the id-right be meaningful to humans, when
RFC 5322 explicitly states that such is not necessary.

If you do not super-impose human conventions on top of the Message-ID,
then the Message-ID that Rupert asked about is perfectly valid.

If you do super-impose human conventions on top of the Message-ID, then
say that you are doing so. But know that you are going above and beyond
the RFC. I believe in the same spirit that grey listing did years ago.
Do so if you want to, but admit that you are doing so.



--
Grant. . . .
unix || die
Re: Message-ID with IPv6 domain-literal [ In reply to ]
The RFC 5322 as cited is concerned about domains and their internet address, where the sender's address needs to be resolvable through DNS by the recipient. If the email infrastructure serves local messages in a company, then LAN addresses get the job done. But delivering messages across autonomous systems calls for *public* fully qualified domain names and their *public* IP addresses, or the delivery will fail.

-------- Original Message --------
On Sep 23, 2021, 19:56, Grant Taylor < gtaylor@tnetconsulting.net> wrote:
On 9/23/21 2:38 AM, Rupert Gallagher wrote:
> A LAN address is not the "Internet address of the particular host", and
> therefore, by RFC 5322 line 969, the header in the OP is not RFC compliant.
Sure it is. What you refer to as a "LAN address" is in fact an Internet
(Protocol) address just like what you re referring to as an "Internet
address". The only effective difference is the values assigned to them.
Both of them function identically from a technological stand point.
Particularly as viewed by the end systems.
The only difference between them is an agreed upon convention of how /
where the different address values are used. -- That is a human
imposed requirement and definitely not a technical requirement.
The closest that it comes to a technical requirement is that the
Internet at large does not have routes for RFC 1918 IP addresses. This
lack of routes has been chosen by humans for the aforementioned agreed
upon convention. There is no /technical/ reason that RFC 1918 IP
addresses can't be routed across the Internet. -- We have all
experienced leaks of RFC 1918 addresses at some point.
What's more is that RFC 5322 § 3.6.4 ¶ 5 states: The message identifier
is intended to be machine readable and not necessarily meaningful to humans.
Further, the entire message ID is what's to be globally unique. And
using a domain or a globally routed IP address via domain-literal on the
RHS is the RECOMMENDED way to achieve global uniqueness. But there are
other ways.
If we take meaning for humans out, we can have something like the
"Message-ID: <omissis@43f011297907b952855484a6635191ff>"
That's the same domain-literal converted to an MD5 hash. It complies
with obs-id-right -> domain -> obs-domain -> atom -> atext.
"Message-ID: <omissis@[IPv6::ffff:193.168.1.30]>"
So why does "43f011297907b952855484a6635191ff" work for the id-right
when you say that "[IPv6::ffff:193.168.1.30]" doesn't work for the
id-right? They are both the same information, just different
representations.
If you don't like MD5 because it's lossy, how about Base64
"W0lQdjY6OmZmZmY6MTkzLjE2OC4xLjMwXQ==".
You seem to be enforcing that the id-right be meaningful to humans, when
RFC 5322 explicitly states that such is not necessary.
If you do not super-impose human conventions on top of the Message-ID,
then the Message-ID that Rupert asked about is perfectly valid.
If you do super-impose human conventions on top of the Message-ID, then
say that you are doing so. But know that you are going above and beyond
the RFC. I believe in the same spirit that grey listing did years ago.
Do so if you want to, but admit that you are doing so.

--
Grant. . . .
unix || die
Re: Message-ID with IPv6 domain-literal [ In reply to ]
Anyway, this part of the original RFC 822 reads loud and clear on the matter. Each new RFC aiming to improve it seems the result of spamming lobbies aiming at hiding themselves. The latest grammar for MIDs is horrible.

-------- Original Message --------
On Sep 24, 2021, 18:17, Rupert Gallagher < ruga@protonmail.com> wrote:
The RFC 5322 as cited is concerned about domains and their internet address, where the sender's address needs to be resolvable through DNS by the recipient. If the email infrastructure serves local messages in a company, then LAN addresses get the job done. But delivering messages across autonomous systems calls for *public* fully qualified domain names and their *public* IP addresses, or the delivery will fail.

-------- Original Message --------
On Sep 23, 2021, 19:56, Grant Taylor < gtaylor@tnetconsulting.net> wrote:
On 9/23/21 2:38 AM, Rupert Gallagher wrote:
> A LAN address is not the "Internet address of the particular host", and
> therefore, by RFC 5322 line 969, the header in the OP is not RFC compliant.
Sure it is. What you refer to as a "LAN address" is in fact an Internet
(Protocol) address just like what you re referring to as an "Internet
address". The only effective difference is the values assigned to them.
Both of them function identically from a technological stand point.
Particularly as viewed by the end systems.
The only difference between them is an agreed upon convention of how /
where the different address values are used. -- That is a human
imposed requirement and definitely not a technical requirement.
The closest that it comes to a technical requirement is that the
Internet at large does not have routes for RFC 1918 IP addresses. This
lack of routes has been chosen by humans for the aforementioned agreed
upon convention. There is no /technical/ reason that RFC 1918 IP
addresses can't be routed across the Internet. -- We have all
experienced leaks of RFC 1918 addresses at some point.
What's more is that RFC 5322 § 3.6.4 ¶ 5 states: The message identifier
is intended to be machine readable and not necessarily meaningful to humans.
Further, the entire message ID is what's to be globally unique. And
using a domain or a globally routed IP address via domain-literal on the
RHS is the RECOMMENDED way to achieve global uniqueness. But there are
other ways.
If we take meaning for humans out, we can have something like the
"Message-ID: <omissis@43f011297907b952855484a6635191ff>"
That's the same domain-literal converted to an MD5 hash. It complies
with obs-id-right -> domain -> obs-domain -> atom -> atext.
"Message-ID: <omissis@[IPv6::ffff:193.168.1.30]>"
So why does "43f011297907b952855484a6635191ff" work for the id-right
when you say that "[IPv6::ffff:193.168.1.30]" doesn't work for the
id-right? They are both the same information, just different
representations.
If you don't like MD5 because it's lossy, how about Base64
"W0lQdjY6OmZmZmY6MTkzLjE2OC4xLjMwXQ==".
You seem to be enforcing that the id-right be meaningful to humans, when
RFC 5322 explicitly states that such is not necessary.
If you do not super-impose human conventions on top of the Message-ID,
then the Message-ID that Rupert asked about is perfectly valid.
If you do super-impose human conventions on top of the Message-ID, then
say that you are doing so. But know that you are going above and beyond
the RFC. I believe in the same spirit that grey listing did years ago.
Do so if you want to, but admit that you are doing so.

--
Grant. . . .
unix || die
Re: Message-ID with IPv6 domain-literal [ In reply to ]
On 9/24/21 10:17 AM, Rupert Gallagher wrote:
> The RFC 5322 as cited is concerned about domains and their internet
> address, where the sender's address needs to be resolvable through DNS
> by the recipient.

"where the sender's address" seems to be discussing the email address,
which is completely independent from the Message-ID.

"needs to be resolvable through DNS by the recipient" seems to be
discussing the recipient's email system's ability to resolve something,
which can include B2B partners across any intermediate network, be it a
VPN or the public Internet. It also seems to mean that it doesn't
matter if other DNS servers are able to resolve it or not.

> If the email infrastructure serves local messages in a company,
> then LAN addresses get the job done. But delivering messages across
> autonomous systems calls for *public* fully qualified domain names
> and their *public* IP addresses, or the delivery will fail.

Again, email addresses and IP addresses are independent of the content
of the Message-ID.

You may dislike the content of the Message-ID. That's fine. That's
your prerogative to have. But your prerogative does not negate the fact
that the email was successfully delivered using a Message-ID that you
question. The simple fact that the message arrived at your MTA such
that SpamAssassin could score based on the questionable Message-ID is
evidence to the fact that the message was successfully delivered.



--
Grant. . . .
unix || die
Re: Message-ID with IPv6 domain-literal [ In reply to ]
On 9/24/2021 12:30 PM, Grant Taylor wrote:
> On 9/24/21 10:17 AM, Rupert Gallagher wrote:

This is a good case study of interpretation, subjectivity, and why there
can only be one Artificial Super Intelligence.

Put two ASIs in a room and their cores would meltdown arguing over
whether to use a while() or a foreach() loop.

:)

-- Jared Hall
Re: Message-ID with IPv6 domain-literal [ In reply to ]
-------- Original Message --------
On Sep 24, 2021, 18:30, Grant Taylor < gtaylor@tnetconsulting.net> wrote:

On 9/24/21 10:17 AM, Rupert Gallagher wrote:
>> The RFC 5322 as cited is concerned about domains and their internet
>> address, where the sender's address needs to be resolvable through DNS
>> by the recipient.

>"where the sender's address" seems to be discussing the email address,
>which is completely independent from the Message-ID.

Nope.

The Message-ID is generated by the MUA, whose only reference is the sender's address. Bad MUA's use the LAN hostname of the sending machine, and thus generate non-RFC compliant headers. If the MUA does not include the Message-ID, then the server intervenes by adding its own RFC-compliant header, explicitly marked as added by server. Note that if the aim with the RFC's new nasty grammar for this limited to the uniqueness of the ID without any reference to the domain, then a random string [a-zA-z0-9]{N} would be enough if N is a big enough integer. But no, the RFC's grammar is at pains with domains and domain literals, so they are important and must have a semantics.

>"needs to be resolvable through DNS by the recipient" seems to be discussing the recipient's email system's ability to resolve something, which can include B2B partners across any intermediate network, be it a VPN or the public Internet. It also seems to mean that it doesn't matter if other DNS servers are able to resolve it or not.

All domains and domain literals in the headers are required to be resolvable. B2B make no exception to the rule. On VPN, being private networks by definition, they belong to the LAN-like network treatment: they are not public, and thus private for anyone outside those private network, both by RFC and by law (GDPR).

>> If the email infrastructure serves local messages in a company, then LAN addresses get the job done. But delivering messages across autonomous systems calls for *public* fully qualified domain names and their *public* IP addresses, or the delivery will fail.

> Again, email addresses and IP addresses are independent of the content of the Message-ID.

I disagree.

> You may dislike the content of the Message-ID. That's fine. That's your prerogative to have. But your prerogative does not negate the fact that the email was successfully delivered using a Message-ID that you
question.

Those e-mails are systematically rejected by our servers.

> The simple fact that the message arrived at your MTA such that SpamAssassin could score based on the questionable Message-ID is evidence to the fact that the message was successfully delivered.

They arrive in a special mailbox for admin verification, like a spam log. The end users do not see them at all.

RG