Mailing List Archive

Rule to detect non-standard headers that aren't X- prefixed
Anyone have a rule to detect the following nonsense headers seen in this message I got?

Return-Path: <cowser@uakron.edu>
Received: from cp24.deluxehosting.com (cp24.deluxehosting.com [207.55.244.13])
by mail (envelope-sender <cowser@uakron.edu>) (MIMEDefang) with ESMTP id 23C2ch8H717309
for <xyzzy@redfish-solutions.com>; Mon, 11 Apr 2022 20:38:50 -0600
To: "xyzzy@redfish-solutions.com" <xyzzy@redfish-solutions.com>
From: "Nabil, Home Depot" <cowser@uakron.edu>
Message-ID: <35ee7c.8b8cf6.ac82@uakron.edu>
Date: Mon, 11 Apr 2022 22:38:48 +0000 (UTC)
Minicomputers-Exhume: sides
Subject: Nabil, 1 searches this week
Malthus-Films: 88976dea
List-Unsubscribe: <https://uakron.edu/?e=d567f7ae55e4&t=lun&midToken=39e56a34&ek=email_notification_single_search_appearance_01&li=7&m=unsub&ts=unsub&loid=cd5be889cc8fde15c6d1ebf62c92cc37375723f3fea3ce35af8da>
Parasitic-Homogeneity: db5da28ba3e69a
MIME-Version: 1.0
Capitalizations-Grievously: oilers
Content-type: multipart/mixed; boundary="----------=_1649731129-716331-86"

Obviously, the following bogus header names are present:

Minicomputers-Exhume
Malthus-Films
Parasitic-Homogeneity
Capitalizations-Grievously

The list of legitimate headers is quite small, per RFC-2822 Section 3.6 and 3.6.7 (odd that 3.6.8 doesn't call out the X-* requirement).

I'd like to fingerprint messages based on non-standard header names.

Has anyone undertaken this already? I tried playing with:

header __L_NON_STD_HEADERS ALL !~ /^(Return-Path|Received|Resent-Date|Resent-From|Resent-Sender|Resent-To|Resent-Cc|Resent-Bcc|Resent-Message-ID|Date|From|Sender|Reply-To|To|Cc|Bcc|Message-ID|In-Reply-To|References|Subject|Comments|Keywords|Content-Type|Content-Transfer-Encoding|MIME-Version|DKIM-Signature|X-([A-Z][a-z]+(-[A-Z][a-z]*)*))\:/m

But that will only match if *none* of the headers are standard ones, so that won't work... I really need to examine the headers one-by-one.

Thanks,

-Philip
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On 5/10/2022 6:10 PM, Philip Prindeville wrote:
> Anyone have a rule to detect the following nonsense headers seen in this message I got?

Interesting. Those look more like something that Bayesian learning would
be best to handle.

But, have you built a corpora of spam and ham?  Do a list of headers
that appear in ham and spam corpora and xor out the spam ones.  Then
write a rule if any of those exist.  They look like they might change a
lot and they are randomized to avoid these type of issues so I see your
dilemma and a plugin might be needed.

Regards,
KAM

--
Kevin A. McGrail
KMcGrail@Apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
> On May 10, 2022, at 4:58 PM, Kevin A. McGrail <kmcgrail@apache.org> wrote:
>
> On 5/10/2022 6:10 PM, Philip Prindeville wrote:
>> Anyone have a rule to detect the following nonsense headers seen in this message I got?
>
> Interesting. Those look more like something that Bayesian learning would be best to handle.
>
> But, have you built a corpora of spam and ham? Do a list of headers that appear in ham and spam corpora and xor out the spam ones. Then write a rule if any of those exist. They look like they might change a lot and they are randomized to avoid these type of issues so I see your dilemma and a plugin might be needed.
>
> Regards,
> KAM


You're correct that they're different in every message received.
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>
> You're correct that they're different in every message received.
>
So write a rule that fires on any header name that *doesn't* match
anything in the list of legit headers as defined in the relevant RFCs.

Of course you may need to extend that list to include some extras, such
as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.

Martin
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
> On May 10, 2022, at 5:57 PM, Martin Gregorie <martin@gregorie.org> wrote:
>
> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>>
>> You're correct that they're different in every message received.
>>
> So write a rule that fires on any header name that *doesn't* match
> anything in the list of legit headers as defined in the relevant RFCs.


See my original message.

I can't think of a single way to match each header, and then test for any of them not matching the pattern...


>
> Of course you may need to extend that list to include some extras, such
> as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.


That's the easy part.


>
> Martin
>
>
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
> On May 10, 2022, at 5:57 PM, Martin Gregorie <martin@gregorie.org> wrote:
>
> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>>
>> You're correct that they're different in every message received.
>>
> So write a rule that fires on any header name that *doesn't* match
> anything in the list of legit headers as defined in the relevant RFCs.


See my original message.

I can't think of a single way to match each header, and then test for any of them not matching the pattern...


>
> Of course you may need to extend that list to include some extras, such
> as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.


That's the easy part.


>
> Martin
>
>
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
> Minicomputers-Exhume: sides
> Malthus-Films: 88976dea
> Parasitic-Homogeneity: db5da28ba3e69a
> Capitalizations-Grievously: oilers

It looks like the pattern is
/[A-Z][a-z]{1,20}-[A-Z][a-z]{1.20}\:\s{1,10}[\w\d]{3,20}/
or something close to that.
Obviously it can mutate, but generally these are made by a tool, and until a
new version of the tool comes along, they will be stable.

Try someting like
header LW_BOGUS_HEADERS ALL =~
/[A-Z][a-z]{1,20}-[A-Z][a-z]{1.20}\:\s{1,10}[\w\d]{3,20}\n/is
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On 2022-05-10 at 18:10:23 UTC-0400 (Tue, 10 May 2022 16:10:23 -0600)
Philip Prindeville <philipp_subx@redfish-solutions.com>
is rumored to have said:

> Anyone have a rule to detect the following nonsense headers seen in
> this message I got?

No, and complicating your circumstance: RFC6648

Here's the title & abstract:


Deprecating the "X-" Prefix and Similar Constructs
in Application Protocols

Abstract

Historically, designers and implementers of application protocols
have often distinguished between standardized and unstandardized
parameters by prefixing the names of unstandardized parameters with
the string "X-" or similar constructs. In practice, that convention
causes more problems than it solves. Therefore, this document
deprecates the convention for newly defined parameters with textual
(as opposed to numerical) names in application protocols.



--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On 2022-05-10 at 20:20:14 UTC-0400 (Tue, 10 May 2022 18:20:14 -0600)
Philip Prindeville <philipp_subx@redfish-solutions.com>
is rumored to have said:

>> On May 10, 2022, at 5:57 PM, Martin Gregorie <martin@gregorie.org>
>> wrote:
>>
>> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>>>
>>> You're correct that they're different in every message received.
>>>
>> So write a rule that fires on any header name that *doesn't* match
>> anything in the list of legit headers as defined in the relevant
>> RFCs.
>
>
> See my original message.
>
> I can't think of a single way to match each header, and then test for
> any of them not matching the pattern...

As documented in the POD in Mail::SpamAssassin::Conf, a header rule
checking "ALL:raw" actually matches against the pristine header section,
in which you could check for lines that do not begin with the 'standard'
headers.

Unfortunately, as noted elsewhere in the thread, this pattern uses
one-time header names AND there is nothing wrong about using random
words as header names without a leading 'X-' so it's likely a low-yield
approach.



--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On Tue, 2022-05-10 at 18:19 -0600, Philip Prindeville wrote:
> I can't think of a single way to match each header, and then test for
> any of them not matching the pattern...
>
>
I had in mind a subrule that triggers on valid header names, combined
with a meta rule that inverts the subrule result. At least, that's what
I'd try as a starting point.

Martin
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> See my original message.
>
> I can't think of a single way to match each header, and then test for any of them not matching the pattern...

Simply use regex negative lookahead.

ALL =~ /^(?!Foo|Bar):/m

It will hit any line _not_ starting with Foo: or Bar:
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> > See my original message.
> >
> > I can't think of a single way to match each header, and then test for any of them not matching the pattern...
>
> Simply use regex negative lookahead.
>
> ALL =~ /^(?!Foo|Bar):/m
>
> It will hit any line _not_ starting with Foo: or Bar:

Oops I think it was buggy.. more like:

ALL =~ /^(?!(?:Foo|Bar):)/m

Unless you want to write colon to all alternations

ALL =~ /^(?!Foo:|Bar:)/m
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
> > On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> > > See my original message.
> > >
> > > I can't think of a single way to match each header, and then test for any of them not matching the pattern...
> >
> > Simply use regex negative lookahead.
> >
> > ALL =~ /^(?!Foo|Bar):/m
> >
> > It will hit any line _not_ starting with Foo: or Bar:
>
> Oops I think it was buggy.. more like:
>
> ALL =~ /^(?!(?:Foo|Bar):)/m

And for debug logging to log the missing header (to easily inspect what was
matched) you need some additional string matching, lookahead itself doesn't
save any string

ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On Tue, 10 May 2022, Philip Prindeville wrote:

> Anyone have a rule to detect the following nonsense headers seen in this message I got?
>
> Return-Path: <cowser@uakron.edu>
> Received: from cp24.deluxehosting.com (cp24.deluxehosting.com [207.55.244.13])
> by mail (envelope-sender <cowser@uakron.edu>) (MIMEDefang) with ESMTP id 23C2ch8H717309
> for <xyzzy@redfish-solutions.com>; Mon, 11 Apr 2022 20:38:50 -0600
> To: "xyzzy@redfish-solutions.com" <xyzzy@redfish-solutions.com>
> From: "Nabil, Home Depot" <cowser@uakron.edu>
> Message-ID: <35ee7c.8b8cf6.ac82@uakron.edu>
> Date: Mon, 11 Apr 2022 22:38:48 +0000 (UTC)
> Minicomputers-Exhume: sides
> Subject: Nabil, 1 searches this week
> Malthus-Films: 88976dea
> List-Unsubscribe: <https://uakron.edu/?e=d567f7ae55e4&t=lun&midToken=39e56a34&ek=email_notification_single_search_appearance_01&li=7&m=unsub&ts=unsub&loid=cd5be889cc8fde15c6d1ebf62c92cc37375723f3fea3ce35af8da>
> Parasitic-Homogeneity: db5da28ba3e69a
> MIME-Version: 1.0
> Capitalizations-Grievously: oilers
> Content-type: multipart/mixed; boundary="----------=_1649731129-716331-86"
>
> Obviously, the following bogus header names are present:
>
> Minicomputers-Exhume
> Malthus-Films
> Parasitic-Homogeneity
> Capitalizations-Grievously

Take a look at __RAND_HEADER and RAND_HEADER_MANY


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Of the twenty-two civilizations that have appeared in history,
nineteen of them collapsed when they reached the moral state the
United States is in now. -- Arnold Toynbee
-----------------------------------------------------------------------
3 days until the 74th anniversary of Israel's independence
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
> On May 11, 2022, at 1:44 AM, Henrik K <hege@hege.li> wrote:
>
> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>> See my original message.
>>
>> I can't think of a single way to match each header, and then test for any of them not matching the pattern...
>
> Simply use regex negative lookahead.
>
> ALL =~ /^(?!Foo|Bar):/m
>
> It will hit any line _not_ starting with Foo: or Bar:
>


Ah, that did it.

Of course, if I get false positives, I'll have to search for the header names I forgot to include manually...
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
> On May 11, 2022, at 1:53 AM, Henrik K <hege@hege.li> wrote:
>
> On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
>> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
>>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>>>> See my original message.
>>>>
>>>> I can't think of a single way to match each header, and then test for any of them not matching the pattern...
>>>
>>> Simply use regex negative lookahead.
>>>
>>> ALL =~ /^(?!Foo|Bar):/m
>>>
>>> It will hit any line _not_ starting with Foo: or Bar:
>>
>> Oops I think it was buggy.. more like:
>>
>> ALL =~ /^(?!(?:Foo|Bar):)/m
>
> And for debug logging to log the missing header (to easily inspect what was
> matched) you need some additional string matching, lookahead itself doesn't
> save any string
>
> ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
>


How do you look at what a rule is matching? I've never figured that out...

-Philip
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On Fri, May 13, 2022 at 12:22:48PM -0600, Philip Prindeville wrote:
>
> How do you look at what a rule is matching? I've never figured that out...

Debug output:
spamassassin -t -D rules < message.eml 2>&1 | grep 'got hit'
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
> On May 11, 2022, at 9:24 AM, John Hardin <jhardin@impsec.org> wrote:
>
> On Tue, 10 May 2022, Philip Prindeville wrote:
>
>> Anyone have a rule to detect the following nonsense headers seen in this message I got?
>>
>> Return-Path: <cowser@uakron.edu>
>> Received: from cp24.deluxehosting.com (cp24.deluxehosting.com [207.55.244.13])
>> by mail (envelope-sender <cowser@uakron.edu>) (MIMEDefang) with ESMTP id 23C2ch8H717309
>> for <xyzzy@redfish-solutions.com>; Mon, 11 Apr 2022 20:38:50 -0600
>> To: "xyzzy@redfish-solutions.com" <xyzzy@redfish-solutions.com>
>> From: "Nabil, Home Depot" <cowser@uakron.edu>
>> Message-ID: <35ee7c.8b8cf6.ac82@uakron.edu>
>> Date: Mon, 11 Apr 2022 22:38:48 +0000 (UTC)
>> Minicomputers-Exhume: sides
>> Subject: Nabil, 1 searches this week
>> Malthus-Films: 88976dea
>> List-Unsubscribe: <https://uakron.edu/?e=d567f7ae55e4&t=lun&midToken=39e56a34&ek=email_notification_single_search_appearance_01&li=7&m=unsub&ts=unsub&loid=cd5be889cc8fde15c6d1ebf62c92cc37375723f3fea3ce35af8da>
>> Parasitic-Homogeneity: db5da28ba3e69a
>> MIME-Version: 1.0
>> Capitalizations-Grievously: oilers
>> Content-type: multipart/mixed; boundary="----------=_1649731129-716331-86"
>>
>> Obviously, the following bogus header names are present:
>>
>> Minicomputers-Exhume
>> Malthus-Films
>> Parasitic-Homogeneity
>> Capitalizations-Grievously
>
> Take a look at __RAND_HEADER and RAND_HEADER_MANY
>
>

For my test messages, __RAND_HEADER_MANY isn't firing.

Also, Return-Path: is listed in RFC-2822, and many delivering (terminal) MTA's add it, including Sendmail.

-Philip
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
> On May 11, 2022, at 1:53 AM, Henrik K <hege@hege.li> wrote:
>
> On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
>> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
>>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>>>> See my original message.
>>>>
>>>> I can't think of a single way to match each header, and then test for any of them not matching the pattern...
>>>
>>> Simply use regex negative lookahead.
>>>
>>> ALL =~ /^(?!Foo|Bar):/m
>>>
>>> It will hit any line _not_ starting with Foo: or Bar:
>>
>> Oops I think it was buggy.. more like:
>>
>> ALL =~ /^(?!(?:Foo|Bar):)/m
>
> And for debug logging to log the missing header (to easily inspect what was
> matched) you need some additional string matching, lookahead itself doesn't
> save any string
>
> ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
>


Ended up using .*$ instead of [^:]* but that worked too.

Is it possible to count how many times we didn't see matching headers and then count those, setting some threshold, like 3 or more unknown headers?

Thanks,

-Philip
Re: Rule to detect non-standard headers that aren't X- prefixed [ In reply to ]
On Mon, May 23, 2022 at 10:48:51PM -0600, Philip Prindeville wrote:
>
>
> > On May 11, 2022, at 1:53 AM, Henrik K <hege@hege.li> wrote:
> >
> > On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
> >> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
> >>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> >>>> See my original message.
> >>>>
> >>>> I can't think of a single way to match each header, and then test for any of them not matching the pattern...
> >>>
> >>> Simply use regex negative lookahead.
> >>>
> >>> ALL =~ /^(?!Foo|Bar):/m
> >>>
> >>> It will hit any line _not_ starting with Foo: or Bar:
> >>
> >> Oops I think it was buggy.. more like:
> >>
> >> ALL =~ /^(?!(?:Foo|Bar):)/m
> >
> > And for debug logging to log the missing header (to easily inspect what was
> > matched) you need some additional string matching, lookahead itself doesn't
> > save any string
> >
> > ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
> >
>
>
> Ended up using .*$ instead of [^:]* but that worked too.
>
> Is it possible to count how many times we didn't see matching headers and then count those, setting some threshold, like 3 or more unknown headers?

tflags multiple should work

header UNKNOWN_HDR ALL ...
tflags UNKNOWN_HDR multiple maxhits=3
meta UNKNOWN_HDR_TOOMANY UNKNOWN_HDR >= 3