Mailing List Archive

Re: svn commit: r1885178 - /spamassassin/trunk/rulesrc/sandbox/jhardin/40_local_419replyto.cf
John,

1st: thank you very much for working on generated rules and for all the rest of your work on rules.

I am curious about whether these very long regexes have been proven to actually work in full, or if it is possible that they are getting mishandled quietly. I don't see any hits on either rule on any of the mail systems I work with going back a month, so I am wondering if it is worthwhile to construct test messages that should hit due to elements in the latter parts of the patterns or if you've already done such tests.

--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: svn commit: r1885178 - /spamassassin/trunk/rulesrc/sandbox/jhardin/40_local_419replyto.cf [ In reply to ]
On Wed, 6 Jan 2021, Bill Cole wrote:

> John,
>
> 1st: thank you very much for working on generated rules and for all the rest of your work on rules.
>
> I am curious about whether these very long regexes have been proven to
> actually work in full, or if it is possible that they are getting
> mishandled quietly. I don't see any hits on either rule on any of the
> mail systems I work with going back a month, so I am wondering if it is
> worthwhile to construct test messages that should hit due to elements in
> the latter parts of the patterns or if you've already done such tests.

I did some quick searching to see if there's any documented max RE size
and couldn't find any such. I'd wager that's based on available resources
rather than being a hard size limit.

If there is a hard length limit on REs, and SA silently breaks when that
limit is exceeded, it doesn't appear to have hit it yet:

Jan 6 09:28:07.206 [30594] dbg: rules: ran header rule __REPTO_419_FRAUD_GM_LOOSE ======> got hit: "zminhong65@gmail.com"
Jan 6 09:28:07.207 [30594] dbg: rules: ran header rule REPTO_419_FRAUD_GM ======> got hit: "zminhong65@gmail.com"

Jan 6 09:28:40.398 [30728] dbg: rules: ran header rule REPTO_419_FRAUD ======> got hit: "zimcargoservicehelpdesks@tlen.pl"


But in retesting this I did find and fix a minor RE error that caused it
to miss addresses in yahoo.com.XX

Thanks for asking!


There's no guarantee you'll see hits. The only feed I have for this is 419
spams sent to me and my wife, and a few 419 spamples that others have
provided, so the sample set is probably rather small even though it feels
like I get a metric buttload of such garbage.

If anybody has a well-vetted 419 scam corpus that they'd be willing to
extract reply-to addresses from to contribute to this, feel free to
contact me privately.


The reason I decided to do this is that I'm still getting 419 pitches
having gmail contact addresses that I started reporting *more than six
months ago* (and continue to report every time I get another one). I would
assume that if Google had actually suspended the accounts after (multiple)
reports then the 419 scammers would have stopped using those contact
addresses in their pitches because they couldn't receive replies, thus it
looks to me like google just doesn't give a shit about my 419 collector
mailbox reports. However, I recognize that assumption may be flawed, and
anyone with actual contacts inside the gmail administrative team is
invited to email me privately to discuss this.

I figure if I'm still seeing them then others are probably seeing them too
and would benefit from them being scored in addition to the body-based 419
scam tests.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Je ne suis pas Charlie. Je suis arm?.
-----------------------------------------------------------------------
Tomorrow: the 6th anniversary of the Charlie Hebdo massacre