Mailing List Archive

Why single periods in regex in spamassassin rules?
I'm looking at KAM.cf. There is this rule:

body __KAM_WEB2 /INDIA based
IT|indian.based.website|certified.it.company/i

I'm wondering if there is a good reason why a singe period is used
instead of something like \s+ which would catch multiple spaces whereas
a singe period doesn't.
Re: Why single periods in regex in spamassassin rules? [ In reply to ]
On 23.04.21 13:03, Steve Dondley wrote:
>I'm looking at KAM.cf. There is this rule:
>
>body __KAM_WEB2 /INDIA based
>IT|indian.based.website|certified.it.company/i
>
>I'm wondering if there is a good reason why a singe period is used
>instead of something like \s+ which would catch multiple spaces
>whereas a singe period doesn't.

generally, it's safer not to allow regular expressions unlimited range, e.g.

\s{1,3}


--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Support bacteria - they're the only culture some people have.
Re: Why single periods in regex in spamassassin rules? [ In reply to ]
On Fri, Apr 23, 2021 at 01:03:33PM -0400, Steve Dondley wrote:
> I'm looking at KAM.cf. There is this rule:
>
> body __KAM_WEB2 /INDIA based
> IT|indian.based.website|certified.it.company/i
>
> I'm wondering if there is a good reason why a singe period is used instead
> of something like \s+ which would catch multiple spaces whereas a singe
> period doesn't.

It would make no difference, because body is normalized from consecutive
spaces into single spaces.

https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WritingRulesAdvanced
Re: Why single periods in regex in spamassassin rules? [ In reply to ]
On 2021-04-23 01:37 PM, Henrik K wrote:
> On Fri, Apr 23, 2021 at 01:03:33PM -0400, Steve Dondley wrote:
>> I'm looking at KAM.cf. There is this rule:
>>
>> body __KAM_WEB2 /INDIA based
>> IT|indian.based.website|certified.it.company/i
>>
>> I'm wondering if there is a good reason why a singe period is used
>> instead
>> of something like \s+ which would catch multiple spaces whereas a
>> singe
>> period doesn't.
>
> It would make no difference, because body is normalized from
> consecutive
> spaces into single spaces.
>
> https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WritingRulesAdvanced

Makes sense. And thanks for the link. I was looking for some king of
guidance on writing rules. Google didn't help much.
Re: Why single periods in regex in spamassassin rules? [ In reply to ]
On Fri, 23 Apr 2021, Steve Dondley wrote:

> I'm looking at KAM.cf. There is this rule:
>
> body __KAM_WEB2 /INDIA based
> IT|indian.based.website|certified.it.company/i
>
> I'm wondering if there is a good reason why a singe period is used instead of
> something like \s+ which would catch multiple spaces whereas a singe period
> doesn't.

Because '/indian.based.website'/ will match 'indian-based_website' but \s will
not.


--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: Why single periods in regex in spamassassin rules? [ In reply to ]
On Fri, 23 Apr 2021 13:52:40 -0500 (CDT)
David B Funk wrote:

> On Fri, 23 Apr 2021, Steve Dondley wrote:
>
> > I'm looking at KAM.cf. There is this rule:
> >
> > body __KAM_WEB2 /INDIA based
> > IT|indian.based.website|certified.it.company/i
> >
> > I'm wondering if there is a good reason why a singe period is used
> > instead of something like \s+ which would catch multiple spaces
> > whereas a singe period doesn't.
>
> Because '/indian.based.website'/ will match 'indian-based_website'
> but \s will not.

\W+ might be better though
Re: Why single periods in regex in spamassassin rules? [ In reply to ]
On Fri, 23 Apr 2021, RW wrote:

> On Fri, 23 Apr 2021 13:52:40 -0500 (CDT)
> David B Funk wrote:
>
>> On Fri, 23 Apr 2021, Steve Dondley wrote:
>>
>>> I'm looking at KAM.cf. There is this rule:
>>>
>>> body __KAM_WEB2 /INDIA based
>>> IT|indian.based.website|certified.it.company/i
>>>
>>> I'm wondering if there is a good reason why a singe period is used
>>> instead of something like \s+ which would catch multiple spaces
>>> whereas a singe period doesn't.
>>
>> Because '/indian.based.website'/ will match 'indian-based_website'
>> but \s will not.
>
> \W+ might be better though

Not unbounded it isn't. \W{1,5} might be better without being runaway.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Are you a mildly tech-literate politico horrified by the level of
ignorance demonstrated by lawmakers gearing up to regulate online
technology they don't even begin to grasp? Cool. Now you have a
tiny glimpse into a day in the life of a gun owner. -- Sean Davis
-----------------------------------------------------------------------
329 days since the first private commercial manned orbital mission (SpaceX)
Re: Why single periods in regex in spamassassin rules? [ In reply to ]
On 4/23/21 2:52 PM, David B Funk wrote:
> On Fri, 23 Apr 2021, Steve Dondley wrote:
>
>> I'm looking at KAM.cf. There is this rule:
>>
>> body    __KAM_WEB2  /INDIA based
>> IT|indian.based.website|certified.it.company/i
>>
>> I'm wondering if there is a good reason why a singe period is used
>> instead of something like \s+ which would catch multiple spaces
>> whereas a singe period doesn't.
>
> Because '/indian.based.website'/ will match 'indian-based_website' but
> \s will not.
>
>
This is the real reason (or at least, it was for all of my contributions
to KAM.cf). I was also concerned about tricks like &nbsp;, which is
visibly a space but has all the technical characteristics of
non-whitespace. Using "." was easier than knowing everything about
unicode codepoints.
Re: Why single periods in regex in spamassassin rules? [ In reply to ]
Completely agree with Joe. Normally if we did that we saw some situation
where they were using something other than a space perhaps a pipe or a plus
or a non-printable character or something else. So we made the rest of the
role like that to future proof it against other variants of the same spam.

On Sun, Apr 25, 2021, 08:51 Joe Quinn <headprogrammingczar@gmail.com> wrote:

> On 4/23/21 2:52 PM, David B Funk wrote:
> > On Fri, 23 Apr 2021, Steve Dondley wrote:
> >
> >> I'm looking at KAM.cf. There is this rule:
> >>
> >> body __KAM_WEB2 /INDIA based
> >> IT|indian.based.website|certified.it.company/i
> >>
> >> I'm wondering if there is a good reason why a singe period is used
> >> instead of something like \s+ which would catch multiple spaces
> >> whereas a singe period doesn't.
> >
> > Because '/indian.based.website'/ will match 'indian-based_website' but
> > \s will not.
> >
> >
> This is the real reason (or at least, it was for all of my contributions
> to KAM.cf). I was also concerned about tricks like &nbsp;, which is
> visibly a space but has all the technical characteristics of
> non-whitespace. Using "." was easier than knowing everything about
> unicode codepoints.
>
>