Mailing List Archive: CONTENT_AFTER_HTML: better not discuss formatting!!

CONTENT_AFTER_HTML: better not discuss formatting!!

Feb 7, 2022, 5:27 PM

Post #1 of 11 (1075 views)

(Instances of html have been changed to htnl in this message to
avoid tripping the rule I'm talking about.)

A legit message arrived at my server, for me and another user, and it
scored 8 for them and I think about 11 for me. This is really unusual.
The big issues were:

Sent by sendgrid: points from KAM and from URIBL_GREY both, each
reasonable separately and I think URIBL_GREY newly lists sendgrid.

From: was someone's (class teacher) gmail address, but it got sent out
via sendgrid via a schoool, and there was no DKIM, so it lit up all
sorts of FREEMAIL_FORGED, From:/env mismatch with freemail, ought to
have DKIM from google and doesn't.

So I wrote to the person because they probably had no idea, and exlained
the above and added some other "deliverabilty hygiene" :-) comments:

> with more minor issues:
>
> The message is html only, rather than also having text/plain.
>
> The message body doesn't have enclosing <htnl> </htnl> tags, so it is
> malformed.

and then I got a reply back with the content he was trying to send etc.
But, it had:

* 2.5 CONTENT_AFTER_HTML More content after HTML close tag

but one was only text/plain and I could see nothing wrong. reading
72_active.cf I found:

rawbody __CONTENT_AFTER_HTML /<\/htnl>\s*[a-z0-9]/i
which fires on a text/plain part that discusses html formatting!

So I'll be reducing that score...

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

lwilton at earthlink

Feb 7, 2022, 6:31 PM

Post #2 of 11 (1075 views)

> But, it had:
>
> * 2.5 CONTENT_AFTER_HTML More content after HTML close tag
>
> but one was only text/plain and I could see nothing wrong. reading
> 72_active.cf I found:
>
> rawbody __CONTENT_AFTER_HTML /<\/htnl>\s*[a-z0-9]/i
> >
> which fires on a text/plain part that discusses html formatting!

Note you show __CONTENT_AFTER_HTML and CONTENT_AFTER_HTML, which are not the
same rule. I suspect the meta for CONTENT_AFTER_HTML contains some other
things that should in theory make it not hit in this case.

I've personally never seen this rule hit, and didn't know it existed. Are
you sure it isn't a local rule? I have a rule of my own that gives 1 point
for extra trash after the /html end tag. I see it frequently on spam and UCE
that has a tracking tag in the HTML section after the official end of the
html.

Loren

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

jhardin at impsec

Feb 7, 2022, 6:38 PM

Post #3 of 11 (1075 views)

On Mon, 7 Feb 2022, Greg Troxel wrote:

> and then I got a reply back with the content he was trying to send etc.
> But, it had:
>
> * 2.5 CONTENT_AFTER_HTML More content after HTML close tag
>
> but one was only text/plain and I could see nothing wrong. reading
> 72_active.cf I found:
>
> rawbody __CONTENT_AFTER_HTML /<\/htnl>\s*[a-z0-9]/i
> which fires on a text/plain part that discusses html formatting!

Ah, I'll see if I can add something to that so it only fires when there's
an actual HTML body part. Thanks for the report.

Pity there's not an "htmlbody" rule type...

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
USMC Rules of Gunfighting #2: Anything worth shooting
is worth shooting twice. Ammo is cheap. Your life is expensive.
-----------------------------------------------------------------------
5 days until Abraham Lincoln's and Charles Darwin's 213th Birthdays

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

jhardin at impsec

Feb 7, 2022, 6:43 PM

Post #4 of 11 (1075 views)

On Mon, 7 Feb 2022, Loren Wilton wrote:

>> But, it had:
>>
>> * 2.5 CONTENT_AFTER_HTML More content after HTML close tag
>>
>> but one was only text/plain and I could see nothing wrong. reading
>> 72_active.cf I found:
>>
>> rawbody __CONTENT_AFTER_HTML /<\/htnl>\s*[a-z0-9]/i
>> >
>> which fires on a text/plain part that discusses html formatting!
>
> Note you show __CONTENT_AFTER_HTML and CONTENT_AFTER_HTML, which are not the
> same rule. I suspect the meta for CONTENT_AFTER_HTML contains some other
> things that should in theory make it not hit in this case.
>
> I've personally never seen this rule hit, and didn't know it existed. Are you
> sure it isn't a local rule? I have a rule of my own that gives 1 point for
> extra trash after the /html end tag. I see it frequently on spam and UCE that
> has a tracking tag in the HTML section after the official end of the html.

No, I added that after observing multiple spams with random garbage after
the closing HTML tag in the HTML body part. Presumably it was an attempt
at Bayes poison, checksum avoidance, or some other filter evasion
technique.

I'll tighten it up.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
You do not examine legislation in the light of the benefits it
will convey if properly administered, but in the light of the
wrongs it would do and the harms it would cause if improperly
administered. -- Lyndon B. Johnson
-----------------------------------------------------------------------
5 days until Abraham Lincoln's and Charles Darwin's 213th Birthdays

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

lwilton at earthlink

Feb 8, 2022, 1:28 AM

Post #5 of 11 (1074 views)

> No, I added that after observing multiple spams with random garbage after
> the closing HTML tag in the HTML body part. Presumably it was an attempt
> at Bayes poison, checksum avoidance, or some other filter evasion
> technique.
>
> I'll tighten it up.

FWIW, here is the rule I use. It obviously could be better, but I haven't
noticed that it misfires.

full __GOODEHTML1 m'</html>'i

full __GOODEHTML2 m'</html>(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime
ending boundary

meta LW_BADEHTML1 (__GOODEHTML1 && !__GOODEHTML2)

describe LW_BADEHTML1 Bad ending - something after </HTML>

score LW_BADEHTML1 1

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

Feb 8, 2022, 4:18 AM

Post #6 of 11 (1074 views)

John Hardin <jhardin@impsec.org> writes:

> On Mon, 7 Feb 2022, Greg Troxel wrote:
>
>> and then I got a reply back with the content he was trying to send etc.
>> But, it had:
>>
>> * 2.5 CONTENT_AFTER_HTML More content after HTML close tag
>>
>> but one was only text/plain and I could see nothing wrong. reading
>> 72_active.cf I found:
>>
>> rawbody __CONTENT_AFTER_HTML /<\/htnl>\s*[a-z0-9]/i
>> which fires on a text/plain part that discusses html formatting!
>
> Ah, I'll see if I can add something to that so it only fires when
> there's an actual HTML body part. Thanks for the report.
>
> Pity there's not an "htmlbody" rule type...

Agreed - I think the way you are trying to tighten is correct.

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

sausers-20150205 at billmail

Feb 8, 2022, 9:02 AM

Post #7 of 11 (1073 views)

On 2022-02-08 at 04:28:16 UTC-0500 (Tue, 8 Feb 2022 01:28:16 -0800)
Loren Wilton <lwilton@earthlink.net>
is rumored to have said:

>> No, I added that after observing multiple spams with random garbage after the closing HTML tag in the HTML body part. Presumably it was an attempt at Bayes poison, checksum avoidance, or some other filter evasion technique.
>>
>> I'll tighten it up.
>
> FWIW, here is the rule I use. It obviously could be better, but I haven't noticed that it misfires.
>
> full __GOODEHTML1 m'</html>'i
>
> full __GOODEHTML2 m'</html>(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime ending boundary

TANGENTIAL:

I would advise against using such alternative regex syntax in rules. As you obviously figured out, you CAN (for now...) use any valid Perl syntax for writing a regex match, but I do not believe that we want to bless that as something which will never break.

--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

kdeugau at vianet

Feb 8, 2022, 10:14 AM

Post #8 of 11 (1073 views)

Bill Cole wrote:
> On 2022-02-08 at 04:28:16 UTC-0500 (Tue, 8 Feb 2022 01:28:16 -0800)
> Loren Wilton <lwilton@earthlink.net>
> is rumored to have said:
>
>>> No, I added that after observing multiple spams with random garbage after the closing HTML tag in the HTML body part. Presumably it was an attempt at Bayes poison, checksum avoidance, or some other filter evasion technique.
>>>
>>> I'll tighten it up.
>>
>> FWIW, here is the rule I use. It obviously could be better, but I haven't noticed that it misfires.
>>
>> full __GOODEHTML1 m'</html>'i
>>
>> full __GOODEHTML2 m'</html>(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime ending boundary
>
> TANGENTIAL:
>
> I would advise against using such alternative regex syntax in rules. As you obviously figured out, you CAN (for now...) use any valid Perl syntax for writing a regex match, but I do not believe that we want to bless that as something which will never break.

Maybe it's just inexperience with deep regex voodoo, but I'm not seeing
anything odd in those.

Are you talking about the use of m'' as the regex delimiter?

-kgd

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

sausers-20150205 at billmail

Feb 8, 2022, 10:51 AM

Post #9 of 11 (1073 views)

On 2022-02-08 at 13:14:06 UTC-0500 (Tue, 8 Feb 2022 13:14:06 -0500)
Kris Deugau <kdeugau@vianet.ca>
is rumored to have said:
[...]
> Are you talking about the use of m'' as the regex delimiter?

Yes.

It will probably work just fine for the foreseeable future, as long as the input validation of rules files is lenient.

It isn't beyond the realm of possibility that someday we'll tighten up syntax checking. We've had security issues in the past which involved the hypothetical potential to sneak in malicious code via rules. I don't expect that we'll have another one bad enough to make a rewrite of the config parser justified, but it could happen, and I don't think we'd design it today as it was done 20 years ago.

--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

lwilton at earthlink

Feb 8, 2022, 12:22 PM

Post #10 of 11 (1073 views)

>> Are you talking about the use of m'' as the regex delimiter?
>
> Yes.
>
> It will probably work just fine for the foreseeable future, as long as the
> input validation of rules files is lenient.

I think you may have a very hard time removing the m<char> matching
delimiters from SA. I suspect there are at least hundreds of rules like that
in the release database. I have about a hundred local rules of my own that
use that.

Any time I have more than one backslash in a pattern, I use an alternate
delimiter (usually single quote) so that I don't have to escape all the
backslashes in the rule body. I'm not a fan of obfuscated rule bodies where
it is impossible to tell what it is intended to match. My experience is that
any time you have to write \\\\ or \\\\\\ multiple times in a rule body, you
are almost guaranteed to get the number of backslahses wrong, and the rule
won't work. But of course it may work in some cases (like the one you used
to test it) while not working in general.

I don't have time in my life to deal with that sort of thing. It caused me
enough grief when I started writing rules 20 years ago, which is why I
started using m'.

BTW, that particular rule dates from RulesEmporium days, which was what,
2005 or so?

Loren

Re: CONTENT_AFTER_HTML: better not discuss formatting!! [ In reply to ]

jhardin at impsec

Feb 8, 2022, 12:52 PM

Post #11 of 11 (1073 views)

On Tue, 8 Feb 2022, Loren Wilton wrote:

>>> Are you talking about the use of m'' as the regex delimiter?
>>
>> Yes.
>>
>> It will probably work just fine for the foreseeable future, as long as the
>> input validation of rules files is lenient.
>
> I think you may have a very hard time removing the m<char> matching
> delimiters from SA. I suspect there are at least hundreds of rules like that
> in the release database. I have about a hundred local rules of my own that
> use that.

Indeed.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Journalism is about covering important stories.
With a pillow, until they stop moving. -- David Burge
-----------------------------------------------------------------------
74 more days working to pay your (average) annual US tax bill
before you're finally working for yourself.