Mailing List Archive

Problems matching the last word in multi-OR Regex
HI,
Situation:i have 2 twin servers running exactly the same OS, and SA. (3.4.4)
i have an email with the word 'dog' inside.
i have this rule:      body    __ANIMALS    /cat|mouse|bird|dog/i

Problem:Rule  __ANIMALS  its in one server, but in the other one, does not!

i have noticed that if i switch the rule words order, like this:

  body    __ANIMALS    /cat|mouse|dog|bird/i

and 'dog' is not the latest word, then it hits on both servers.

I have tried many permutations and it only fails with the word that appears the last in regular expressions with multiple OR
Has anyone seed this before? is that a known bug?  
Thanks...

Pete.
Re: Problems matching the last word in multi-OR Regex [ In reply to ]
On 2022-12-15 at 07:03:25 UTC-0500 (Thu, 15 Dec 2022 12:03:25 +0000
(UTC))
Pedro David Marco via users <pedrod_marco@yahoo.com>
is rumored to have said:

> HI,
> Situation:i have 2 twin servers running exactly the same OS, and SA.
> (3.4.4)
> i have an email with the word 'dog' inside.
> i have this rule:      body    __ANIMALS   
> /cat|mouse|bird|dog/i
>
> Problem:Rule  __ANIMALS  its in one server, but in the other one,
> does not!
>
> i have noticed that if i switch the rule words order, like this:
>
>   body    __ANIMALS    /cat|mouse|dog|bird/i
>
> and 'dog' is not the latest word, then it hits on both servers.
>
> I have tried many permutations and it only fails with the word that
> appears the last in regular expressions with multiple OR
> Has anyone seed this before? is that a known bug?  

This is absolutely NOT a known bug. I'm not sure how it is possible for
something so fundamental to still be lurking in SA undiscovered. I don't
think the basic parsing of REs in rules has changed since v2.

It would help a great deal if you could open a bug at
https://bz.apache.org/SpamAssassin/ with sample messages that are hit or
not by different variants of the rule.




> Thanks...
>
> Pete.


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Problems matching the last word in multi-OR Regex [ In reply to ]
On 12/15/2022 7:03 AM, Pedro David Marco via users wrote:
> HI,
>
> Situation:
> i have 2 twin servers running exactly the same OS, and SA. (3.4.4)
>
> i have an email with the word 'dog' inside.
>
> i have this rule:
>   body    __ANIMALS /cat|mouse|bird|dog/i
>
> Problem:
> Rule  __ANIMALS  its in one server, but in the other one, does not!
>
> I have tried many permutations and it only fails with the word that
> appears the last in regular expressions with multiple OR
>
> Has anyone seed this before? is that a known bug?
>
No.

How are you testing?
sa-compile up to date on both servers?
spamd and/or milters restarted on both?
Re: Problems matching the last word in multi-OR Regex [ In reply to ]
> body __ANIMALS /cat|mouse|bird|dog/i

There is a possible problem with your rule. It probably isn't related to what you are seeing, but could be a problem for you anyway.

There is no word boundry in the regex, so 'cat' will match catamaran, 'mouse' will match mousehouse, 'bird' will match birddog, and so will 'dog'.

You can solve this by adding word boundries:

body __ANIMALS /\b(:?cat|mouse|bird|dog)\b/i

or

body __ANIMALS /\bcat\b|\bmouse\b|\bbird\b|\bdog\b/i
Re: Problems matching the last word in multi-OR Regex [ In reply to ]
On Thu, Dec 15, 2022 at 09:17:54AM -0500, Bill Cole wrote:
> On 2022-12-15 at 07:03:25 UTC-0500 (Thu, 15 Dec 2022 12:03:25 +0000 (UTC))
> Pedro David Marco via users <pedrod_marco@yahoo.com> is rumored to have said:
>
> > HI,
> > Situation:i have 2 twin servers running exactly the same OS, and SA.
> > (3.4.4)

Are there different version of some external plugins installed,
maybe?

> > i have an email with the word 'dog' inside.
> > i have this rule:      body    __ANIMALS    /cat|mouse|bird|dog/i
> >
> > Problem:Rule  __ANIMALS  its in one server, but in the other one, does
> > not!

Interesting. Is there perhaps some syntax error elsewhere in the file?
You can check with "spamassassin --lint"

Also, maybe there is another rule with same name defined elsewhere
(maybe editor backup file that SA includes?)

> > i have noticed that if i switch the rule words order, like this:
> >
> >   body    __ANIMALS    /cat|mouse|dog|bird/i
> >
> > and 'dog' is not the latest word, then it hits on both servers.
> >
> > I have tried many permutations and it only fails with the word that
> > appears the last in regular expressions with multiple OR
> > Has anyone seed this before? is that a known bug?  
>
> This is absolutely NOT a known bug. I'm not sure how it is possible for
> something so fundamental to still be lurking in SA undiscovered. I don't
> think the basic parsing of REs in rules has changed since v2.
>
> It would help a great deal if you could open a bug at
> https://bz.apache.org/SpamAssassin/ with sample messages that are hit or not
> by different variants of the rule.

I agree. Do mention the issue in this thread when you open it, so
interested parties may follow.


One other obscure situation that comes to mind that might possibly
happen is that one used "sa-compile" in the past for previous version
of the regex, but something went wrong with system clock so SA does
not detect that changed regex needs recompiling and continues to use
old outdated version)

Or are you using spamc/spamd which did not reload new rule?

Ore maybe the word "dog" is copy/pasted instead of type and so it
includes some invisible UTF8 characters.

I'd suggest if you could try creating new unique different name for
the rule (e.g. NEWANIMALS_20230621), typing the rule content manually
instead of copy/pasting, and checking if that rule matches by using
"spamassassin -t" on that?

That should rule out most of the possible other issues above.

--
Opinions above are GNU-copylefted.