Mailing List Archive

More undetected hidden test spam signs
I just got a batch of spams containing

<span style="display:none">

That was followed by about 2K bytes of garbage containing GUIDs and links to
putatively some youtube video. The span was then terminated correctly, the
body of the spam, and then the same garbage for about another 2KB.

The small font rules didn't seem to catch this.

Loren
Re: More undetected hidden test spam signs [ In reply to ]
On 16 Dec 2020, at 23:21, Loren Wilton <lwilton@earthlink.net> wrote:
> I just got a batch of spams containing
>
> <span style="display:none">

Interesting. I remember in the early days of html spam there were various rules to tag messages as spam when they had content that did not display. (Possibly pre-SpamAssasin or at least pre my use of SpamAssasin).

--
>You are forgetting something: the Nazgul are immune to non-magical
> weapons.
>
"Any sufficiently advanced technology is indistinguishable from magic."
Re: More undetected hidden test spam signs [ In reply to ]
On Wed, 16 Dec 2020 22:21:12 -0800
Loren Wilton wrote:

> I just got a batch of spams containing
>
> <span style="display:none">
>
> That was followed by about 2K bytes of garbage containing GUIDs and
> links to putatively some youtube video. The span was then terminated
> correctly, the body of the spam, and then the same garbage for about
> another 2KB.
>
> The small font rules didn't seem to catch this.

There is an existing sub-rule that just misses this:

rawbody __STY_INVIS
/\bstyle\s*=\s*"[^">]{0,80}(?:visibility\s*:\s*hidden\s*;|display\s*:\s*none\s*;)/i

It's looking for a ";" after the "none".
Re: More undetected hidden test spam signs [ In reply to ]
On Thu, 17 Dec 2020, @lbutlr wrote:

> On 16 Dec 2020, at 23:21, Loren Wilton <lwilton@earthlink.net> wrote:
>> I just got a batch of spams containing
>>
>> <span style="display:none">
>
> Interesting. I remember in the early days of html spam there were various rules to tag messages as spam when they had content that did not display. (Possibly pre-SpamAssasin or at least pre my use of SpamAssasin).

Such rules are there. Unfortunately, for whatever reason, lots of ham uses
"invisible" text so it's not useful as a spam sign by itself and it's hard
to come up with any useful combination rules.

https://ruleqa.spamassassin.org/?rule=%2Fsty_invis

Perhaps this would be useful if it hits bayes but not hard enough to push
it over the threshold:

meta INVIS_TEXT_BAYES __STY_INVIS && (BAYES_80 || BAYES_95 || BAYES_99 || BAYES_999)


N.B.: I just fixed a minor error in __STY_INVIS that made it fail to see
that specific form of "invisible text".

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
"Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
does quite what I want. I wish Christopher Robin was here."
-- Peter da Silva in a.s.r
-----------------------------------------------------------------------
8 days until Christmas
Re: More undetected hidden test spam signs [ In reply to ]
On 17 Dec 2020, at 09:58, John Hardin <jhardin@impsec.org> wrote:
> Such rules are there. Unfortunately, for whatever reason, lots of ham uses "invisible" text so it's not useful as a spam sign by itself and it's hard to come up with any useful combination rules.

In the "Archive" folder on my work email there are 76,200 emails and 113,566 incidents of the string "display:\s*none". Who knew?

One archived email I noticed had 24 occurrences of the string, about a third of them followed by "!important".

I used to have a dehtmlizer tool that stripped the HTML down to bare text and links by piping the html mime part pf the messages through lynx --dump, but that proved to be problematic in its own way and I haven't gotten pipes working with sieve anyway.ZZ


--
I AM ZOMBOR! (kelly) ZOMBOR!
Re: More undetected hidden test spam signs [ In reply to ]
On Thu, 17 Dec 2020 08:58:07 -0800 (PST)
John Hardin wrote:

> On Thu, 17 Dec 2020, @lbutlr wrote:
>
> > On 16 Dec 2020, at 23:21, Loren Wilton <lwilton@earthlink.net>
> > wrote:
> >> I just got a batch of spams containing
> >>
> >> <span style="display:none">
> >
> > ... various rules to tag messages as spam when they had content that
> > did not display.
>
> Such rules are there. Unfortunately, for whatever reason, lots of ham
> uses "invisible" text so it's not useful as a spam sign by itself and
> it's hard to come up with any useful combination rules.

The trouble with this kind of thing is that you can make anything look
marginally useful with the right meta rule - even something like
__RCVD_ON_MONDAY.

rawbody rules are relatively expensive, if they don't show some kind of
initial promise, they aren't worth pursuing IMO.

> Perhaps this would be useful if it hits bayes but not hard enough to
> push it over the threshold:
>
> meta INVIS_TEXT_BAYES __STY_INVIS && (BAYES_80 || BAYES_95 ||
> BAYES_99 || BAYES_999)

__STY_INVIS has an S/O of 0.122 in QA hitting 6.4% of ham. In my corpus
the semi-colon doesn't make much difference to the historic numbers.
Unless __STY_INVIS is dominating spam I wouldn't do the above. If it
works it's most likely a sign that Bayes itself is underscored.

Strangely the S/O is even worst for __STY_INVIS_MANY (__STY_INVIS > 5)
Re: More undetected hidden test spam signs [ In reply to ]
On Thu, 17 Dec 2020, John Hardin wrote:

> On Thu, 17 Dec 2020, @lbutlr wrote:
>
>> On 16 Dec 2020, at 23:21, Loren Wilton <lwilton@earthlink.net> wrote:
>>> I just got a batch of spams containing
>>>
>>> <span style="display:none">
>>
>> Interesting. I remember in the early days of html spam there were various
>> rules to tag messages as spam when they had content that did not display.
>> (Possibly pre-SpamAssasin or at least pre my use of SpamAssasin).
>
> Such rules are there. Unfortunately, for whatever reason, lots of ham uses
> "invisible" text so it's not useful as a spam sign by itself and it's hard to
> come up with any useful combination rules.

I think I may have figured it out - tracking images. Like:

<img src="long unique tracking uri" width="0" height="0" border="0" style="visibility: hidden !important; display:none !important; max-height: 0; width: 0; line-height: 0; mso-hide: all;">

The src link gets visited to retrieve the image so the message is tracked,
but the display of the image is suppressed.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
"Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
does quite what I want. I wish Christopher Robin was here."
-- Peter da Silva in a.s.r
-----------------------------------------------------------------------
3 days until Christmas
Re: More undetected hidden test spam signs [ In reply to ]
>>> On 16 Dec 2020, at 23:21, Loren Wilton <lwilton@earthlink.net> wrote:
>>>> I just got a batch of spams containing
>>>>
>>>> <span style="display:none">
>>>
>> Such rules are there. Unfortunately, for whatever reason, lots of ham
>> uses "invisible" text so it's not useful as a spam sign by itself and
>> it's hard to come up with any useful combination rules.
>
> I think I may have figured it out - tracking images. Like:
>
> <img src="long unique tracking uri" width="0" height="0" border="0"
> style="visibility: hidden !important; display:none !important; max-height:
> 0; width: 0; line-height: 0; mso-hide: all;">

Note in your example the display:none is in a contained tag and not in an
opening tag of a span. The tag is probably fairly long because the URL is
probably huge, but it is still the one item that is hidden.

I put in a local rawbody rule for
m'<span style="display:none">.{100,}(?:$|</span>)'is
and so far I haven't gotten any hits on ham.

Of course that is a pretty heavy rule, but it would seem to indicate that
hidden spans may not be that common in ham.
Re: More undetected hidden test spam signs [ In reply to ]
On Tue, 22 Dec 2020, Loren Wilton wrote:

>>>> On 16 Dec 2020, at 23:21, Loren Wilton <lwilton@earthlink.net> wrote:
>>>>> I just got a batch of spams containing
>>>>>
>>>>> <span style="display:none">
>>>>
>>> Such rules are there. Unfortunately, for whatever reason, lots of ham uses
>>> "invisible" text so it's not useful as a spam sign by itself and it's hard
>>> to come up with any useful combination rules.
>>
>> I think I may have figured it out - tracking images. Like:
>>
>> <img src="long unique tracking uri" width="0" height="0" border="0"
>> style="visibility: hidden !important; display:none !important; max-height:
>> 0; width: 0; line-height: 0; mso-hide: all;">
>
> Note in your example the display:none is in a contained tag and not in an
> opening tag of a span. The tag is probably fairly long because the URL is
> probably huge, but it is still the one item that is hidden.

Right, but __STY_INVIS is currently tag-blind (it only looks for the
style="" clause), so it hits that, and if lots of ham is hiding tracking
images that way that might explain the poor S/O.

> I put in a local rawbody rule for
> m'<span style="display:none">.{100,}(?:$|</span>)'is
> and so far I haven't gotten any hits on ham.

How much spam hits that very simple case? I had a __SPAN_INVIS rule
(currently commented out) but IIRC it also had poor S/O. It wasn't as
simple as yours, though - perhaps I'm allowing for too many
syntactically-valid cases to try to avoid trivial avoidance by spam?

> Of course that is a pretty heavy rule

It would be lighter if you didn't look for the tag closing. Is there a
reason you care about the closing for that?

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
"Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
does quite what I want. I wish Christopher Robin was here."
-- Peter da Silva in a.s.r
-----------------------------------------------------------------------
3 days until Christmas
Re: More undetected hidden test spam signs [ In reply to ]
> Right, but __STY_INVIS is currently tag-blind (it only looks for the
> style="" clause), so it hits that, and if lots of ham is hiding tracking
> images that way that might explain the poor S/O.

I suspect that might be the case.

The vast majority of invisible garbage I see is hidden in a <style> ...
</style> pair, typically two per spam and about 50K in each one. Looking at
the definition of the <style> tag, it says that it should only appear in the
<head> section. Of course this "bayes killer" (sic) stuff appears in the
body, so in theory the whole <style> tag should be worth some points for
being out of place. So far though I haven't been able to craft a rule that
will check if <style> is in the body and not the head.

The next most common is 0 point font stuff, again appearing between a
<font...> and </font> tag. I haven't done much yet, but I've been
considering trying to find a "valid length" for 0 point font stuff to hide
tracking cookies, and dinging stuff that is just hiding random word garbage.

>> I put in a local rawbody rule for
>> m'<span style="display:none">.{100,}(?:$|</span>)'is
>> and so far I haven't gotten any hits on ham.
>
> How much spam hits that very simple case?

Probably not much, but that is most likely because for the last month or two
I've seemigly only been getting spam from two different spammers, and they
have rigid and very predictable spam formats for all of their spam. One is
sending short spams that just have a pair of image links. The other is
sending 100KB spams that today are using <font style="font-size:0px"> 50K of
stuff stuff </font> in the format. Last month they were hiding this in the
<style> tag as I mentioned above.

>> Of course that is a pretty heavy rule
>
> It would be lighter if you didn't look for the tag closing. Is there a
> reason you care about the closing for that?

It was written as an initial test rule to try to search for a split length
between ham and spam. Of course since it is rawbody and rawbody globs text,
the length will be a bit random, which might make the determination useless.
At this point I haven't had enough hits on it (because of my limited spam
sources) to be able to decide of 100 is too much or too little.