Mailing List Archive

Additive scoring on a single rule for multiple matches
All:

I've been poking around in the wiki and the list archives but couldn't
find a good answer for this question:

Is it possible to score a single rule additively? That is, the rule's
final score is the sum of the number of times it matched, rather than
simply whether or not it matched?

There are several good applications for this, particularly in relation
to HTML mail.

For example, it would be fairly simple to write a rule that hits on
invalid HTML tags. However, one or two invalid HTML tags don't indicate
much (a lot of geeks use them in their messages for things like:
<grumble>f'ing Msoft!</grumble>) so the simple presence of an invalid
tag shouldn't affect the spamminess score much (say, 0.05) - however,
having twenty or thirty would be a really *good* indicator of
spamminess.

Is there some way to do this currently?

Is there support for this in CVS?

--
John Hardin KA7OHZ
Internal Systems Administrator/Guru voice: (425) 672-1304
Apropos Retail Management Systems, Inc. fax: (425) 672-0192
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute an
emergency on my part.
- David W. Barts in a.s.r
RE: Additive scoring on a single rule for multiple matches [ In reply to ]
Yes and no. :-)

Not in any release of SA as of yet. Although someone did write an eval to
add to SA that does just this. I have it, and like everything in my office,
it is lost somewhere in the vastness of knowledge...(OK, clutter!) I'll see
if I can dig it up.

--Chris

> -----Original Message-----
> From: John Hardin [mailto:johnh@aproposretail.com]
> Sent: Wednesday, February 18, 2004 2:27 PM
> To: SpamAssassin list
> Subject: Additive scoring on a single rule for multiple matches
>
>
> All:
>
> I've been poking around in the wiki and the list archives but couldn't
> find a good answer for this question:
>
> Is it possible to score a single rule additively? That is, the rule's
> final score is the sum of the number of times it matched, rather than
> simply whether or not it matched?
>
> There are several good applications for this, particularly in relation
> to HTML mail.
>
> For example, it would be fairly simple to write a rule that hits on
> invalid HTML tags. However, one or two invalid HTML tags
> don't indicate
> much (a lot of geeks use them in their messages for things like:
> <grumble>f'ing Msoft!</grumble>) so the simple presence of an invalid
> tag shouldn't affect the spamminess score much (say, 0.05) - however,
> having twenty or thirty would be a really *good* indicator of
> spamminess.
>
> Is there some way to do this currently?
>
> Is there support for this in CVS?
>
> --
> John Hardin KA7OHZ
> Internal Systems Administrator/Guru voice:
> (425) 672-1304
> Apropos Retail Management Systems, Inc. fax:
> (425) 672-0192
> --------------------------------------------------------------
> ---------
> Failure to plan ahead on someone else's part does not constitute an
> emergency on my part.
> - David W. Barts in a.s.r
>
Re: Additive scoring on a single rule for multiple matches [ In reply to ]
On Wednesday 18 February 2004 11:49 am, Chris Santerre wrote:
> Yes and no. :-)
>
> Not in any release of SA as of yet. Although someone did write an eval to
> add to SA that does just this. I have it, and like everything in my office,
> it is lost somewhere in the vastness of knowledge...(OK, clutter!) I'll see
> if I can dig it up.
>
> --Chris
>
> > Is it possible to score a single rule additively? That is, the rule's
> > final score is the sum of the number of times it matched, rather than
> > simply whether or not it matched?
> >
> > There are several good applications for this, particularly in relation
> > to HTML mail.

This is exactly what I've been complaining about with regard to too-long SA
reports (but kind of a different tack on it). These huge sets of little rules
(Tripwire, etc) would be much improved with a single additive score like Mr.
Hardin described.
--
Matt
Systems Administrator
Local Access Communications
360.330.5535
Re: Additive scoring on a single rule for multiple matches [ In reply to ]
On Wed, 18 Feb 2004 12:15:05 -0800, Matthew Trent wrote:
> This is exactly what I've been complaining about with regard to too-long SA
> reports (but kind of a different tack on it). These huge sets of little rules
> (Tripwire, etc) would be much improved with a single additive score like Mr.
> Hardin described.

In the meantime you could always try my
http://www.snoweye.com/john/metatripwire.cf which converts the standard
tripwire rules to __subrules and adds a number of meta rules to count
the number of tripwire hits.

John.

--
-- Over 2400 webcams from ski resorts around the world - www.snoweye.com
-- Translate your technical documents and web pages - www.tradoc.fr
Re: Additive scoring on a single rule for multiple matches [ In reply to ]
Hi,

On Wed, 18 Feb 2004 12:15:05 -0800 Matthew Trent <mtrent@localaccess.com> wrote:

> On Wednesday 18 February 2004 11:49 am, Chris Santerre wrote:
> >
> > > Is it possible to score a single rule additively? That is, the rule's
> > > final score is the sum of the number of times it matched, rather than
> > > simply whether or not it matched?
> >
> > Yes and no. :-)
> >
> > Not in any release of SA as of yet. Although someone did write an eval to
> > add to SA that does just this. I have it, and like everything in my office,
> > it is lost somewhere in the vastness of knowledge...(OK, clutter!) I'll see
> > if I can dig it up.
>
> This is exactly what I've been complaining about with regard to too-long SA
> reports (but kind of a different tack on it). These huge sets of little rules
> (Tripwire, etc) would be much improved with a single additive score like Mr.
> Hardin described.

The huge rulesets are likely more efficient than code to count the
number of occurrences. What people sometimes forget is that the rules
that work great on a machine filtering 1,000 messages a day for three
people don't work at all for sites filtering 500,000 messages a day for
1,000 people.

-- Bob
Re: Additive scoring on a single rule for multiple matches [ In reply to ]
On Thursday 19 February 2004 06:35 am, Bob Apthorpe wrote:
> Hi,
> The huge rulesets are likely more efficient than code to count the
> number of occurrences. What people sometimes forget is that the rules
> that work great on a machine filtering 1,000 messages a day for three
> people don't work at all for sites filtering 500,000 messages a day for
> 1,000 people.

My site happens to handle well over 600,000/day for 10,000 users and I still
can't imagine much of a performance hit for a few more addition operations...
--
Matt
Systems Administrator
Local Access Communications
360.330.5535
Re: Additive scoring on a single rule for multiple matches [ In reply to ]
On Thu, 2004-02-19 at 06:35, Bob Apthorpe wrote:

> The huge rulesets are likely more efficient than code to count the
> number of occurrences.

I am skeptical. Give us numbers and I'll believe you.

My supposition is that the additive rules will present a fairly small
percentage of the total number of rules.

--
John Hardin KA7OHZ
Internal Systems Administrator/Guru voice: (425) 672-1304
Apropos Retail Management Systems, Inc. fax: (425) 672-0192
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute an
emergency on my part.
- David W. Barts in a.s.r
-----------------------------------------------------------------------
11 days until ICQ Corp goes away - have you installed Jabber yet?