Mailing List Archive

Multiple regex on same URL
I have written a small simple patch (tested in SA 3.4.2 so far, sorry) to be able to check up to three regex expressions on the "same" URL. It seems to work wellbut... any crazy (with all respects) volunteer for checks.. tests... etc?
Disclaimer: I am not a super Perl developer, so the code may be ugly for perl monks :-(  sorry..
Regards,
-----------Pedro.
Re: Multiple regex on same URL [ In reply to ]
On 07.07.20 10:18, Pedro David Marco wrote:
> I have written a small simple patch (tested in SA 3.4.2 so far, sorry) to
> be able to check up to three regex expressions on the "same" URL. It
> seems to work wellbut... any crazy (with all respects) volunteer for
> checks.. tests... etc?

>Disclaimer: I am not a super Perl developer, so the code may be ugly for perl monks :-(? sorry..
>Regards,
>-----------Pedro.

try posting the patch or a link to it.

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"To Boot or not to Boot, that's the question." [WD1270 Caviar]
Re: Multiple regex on same URL [ In reply to ]
On Tue, Jul 07, 2020 at 10:18:30AM +0000, Pedro David Marco wrote:
> I have written a small simple patch (tested in SA 3.4.2 so far, sorry) to be
> able to check up to three regex expressions on the "same" URL. It seems to work
> well
> but... any crazy (with all respects) volunteer for checks.. tests... etc?
>
> Disclaimer: I am not a super Perl developer, so the code may be ugly for perl
> monks :-( sorry..

What examply do you mean by checking multiple regex on the "same" URL? Give
an example. Most likely it's already possible without any changes.
Re: Multiple regex on same URL [ In reply to ]
>On Tuesday, July 7, 2020, 01:05:36 PM GMT+2, Henrik K <hege@hege.li> wrote:

>What examply do you mean by checking multiple regex on the "same" URL?  Give an example.  Most likely it's already possible without any changes.

for example..  checking if an URL matches Regex1  BUT does NOT matches Regex2  can be done  with looksahead/behind but is cpu-expensive and may be too complex to maintain... 

----Pedro 
Re: Multiple regex on same URL [ In reply to ]
On Tue, Jul 07, 2020 at 11:41:01AM +0000, Pedro David Marco wrote:
>
> >On Tuesday, July 7, 2020, 01:05:36 PM GMT+2, Henrik K <hege@hege.li> wrote:
>
>
> >What examply do you mean by checking multiple regex on the "same" URL? Give
> an example. Most likely it's already possible without any changes.
>
>
> for example.. checking if an URL matches Regex1 BUT does NOT matches Regex2
> can be done with looksahead/behind but is cpu-expensive and may be too complex
> to maintain...

Why would lookahead be expensive? It's normal regex. It's probably more
expensive to run two separate regexes.

uri FOO /^(?!.*?donotfind)(?=.*?findthis)/

Also newer SpamAssassin already has URIDetail plugin which can also do what
you want:

uri_detail SYMBOLIC_TEST_NAME key1 =~ /value1/ key2 !~ /value2/ ...
Re: Multiple regex on same URL [ In reply to ]
On 07 Jul 2020, at 07:16, Henrik K <hege@hege.li> wrote:
> On Tue, Jul 07, 2020 at 11:41:01AM +0000, Pedro David Marco wrote:
>>
>>> On Tuesday, July 7, 2020, 01:05:36 PM GMT+2, Henrik K <hege@hege.li> wrote:
>>
>>
>>> What examply do you mean by checking multiple regex on the "same" URL? Give
>> an example. Most likely it's already possible without any changes.
>>
>>
>> for example.. checking if an URL matches Regex1 BUT does NOT matches Regex2
>> can be done with looksahead/behind but is cpu-expensive and may be too complex
>> to maintain...
>
> Why would lookahead be expensive? It's normal regex. It's probably more
> expensive to run two separate regexes.

Is the ReDos Attack relevant here?

<https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS>
"The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). An attacker can then cause a program using a Regular Expression to enter these extreme situations and then hang for a very long time."



--
Once upon a time, a woman was picking up firewood. She came upon a
poisonous snake frozen in the snow. She took the snake home and
nurse it back to health. One day the snake bit her on the cheek.
As she lay dying, she asked the snake, "Why have you done this to
me?" And the snake answered, "Look, bitch, you knew I was a
snake."
Re: Multiple regex on same URL [ In reply to ]
>On Tuesday, July 7, 2020, 03:16:34 PM GMT+2, Henrik K <hege@hege.li> wrote:

>Also newer SpamAssassin already has URIDetail plugin which can also do what you want:

>  uri_detail SYMBOLIC_TEST_NAME key1 =~ /value1/  key2 !~ /value2/ ...
if it uses the same key more than once, then uri_detail joins them with "OR", but we need an "AND" 
-----Pedro
Re: Multiple regex on same URL [ In reply to ]
On Tue, 2020-07-07 at 20:39 +0000, Pedro David Marco wrote:
>
>
> >On Tuesday, July 7, 2020, 03:16:34 PM GMT+2, Henrik K <
> hege@hege.li> wrote:
>
> > Also newer SpamAssassin already has URIDetail plugin which can also
> > do what you want:
> > uri_detail SYMBOLIC_TEST_NAME key1 =~ /value1/ key2 !~ /value2/
> > ...
> if it uses the same key more than once, then uri_detail joins them
> with "OR", but we need an "AND"
> -----Pedro
>
That should be easy enough to do with a metarule:

uri __SUBRULE1 /(URL alternateslist1)/
uri __SUBRULE1 /(URL alternateslist2)/
meta MYMETARULE (__SUBRULE1 &&
__SUBRULE2)
score MYMETARULE 6.0

...or something like that

Martin
Re: Multiple regex on same URL [ In reply to ]
>On Tuesday, July 7, 2020, 11:56:22 PM GMT+2, Martin Gregorie <martin@gregorie.org> wrote:

> That should be easy enough to do with a metarule:

>uri  __SUBRULE1 /(URL alternateslist1)/
>uri  __SUBRULE1 /(URL alternateslist2)/
>meta  MYMETARULE (__SUBRULE1 && __SUBRULE2)
>score MYMETARULE 6.0

.>..or something like that

>Martin
Thanks Martin, but  the meta may be possitive if one URL triggers SUBRULE1 and another different URL triggers SUBRULE2...
 how can you be sure both SUBRULES are possitive in the "same" URL? 
-----Pedro
Re: Multiple regex on same URL [ In reply to ]
On Tue, 2020-07-07 at 22:07 +0000, Pedro David Marco wrote:
> Thanks Martin, but the meta may be possitive if one URL triggers
> SUBRULE1 and another different URL triggers SUBRULE2...
> how can you be sure both SUBRULES are possitive in the "same" URL?
>
I didn't spot the requirement that the URIs must match: I read your
requirement as being that two matches from a group of URLs within a
defined set or with the same second level domain would do. My mistake.

Might it be easier to define and implement with a decent RDBMS and a
clever SQL query?

Martin
Re: Multiple regex on same URL [ In reply to ]
>On Wednesday, July 8, 2020, 12:28:37 AM GMT+2, Martin Gregorie <martin@gregorie.org> wrote:
>>I didn't spot the requirement that the URIs must match: I read your
>requirement as being that two matches from a group of URLs within a
>defined set or with the same second level domain would do. My mistake.

Probably my fault, Martin.. my "English" leaves much to be desired...

>Might it be easier to define and implement with a decent RDBMS and a
>clever SQL query?
The simplest way has been to patch uri_detail plugin so it can combine multiple equal keys with OR or AND on demand... :-)
----Pedro
Re: Multiple regex on same URL [ In reply to ]
On Tue, 7 Jul 2020, Martin Gregorie wrote:

> On Tue, 2020-07-07 at 20:39 +0000, Pedro David Marco wrote:
>>
>>
>> >On Tuesday, July 7, 2020, 03:16:34 PM GMT+2, Henrik K <
>> hege@hege.li> wrote:
>>
>>> Also newer SpamAssassin already has URIDetail plugin which can also
>>> do what you want:
>>> uri_detail SYMBOLIC_TEST_NAME key1 =~ /value1/ key2 !~ /value2/
>>> ...
>> if it uses the same key more than once, then uri_detail joins them
>> with "OR", but we need an "AND"
>> -----Pedro
>>
> That should be easy enough to do with a metarule:
>
> uri __SUBRULE1 /(URL alternateslist1)/
> uri __SUBRULE2 /(URL alternateslist2)/
> meta MYMETARULE (__SUBRULE1 && __SUBRULE2)
> score MYMETARULE 6.0

Unfortunately there's no way to enforce them being checked together on the
*same* URI: uri1 could hit SR1 and uri2 could hit SR2 and the meta would fire, but it
would be inappropriate.

The (?=...)(?!...) construct is better.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
We have to realize that people who run the government can and do
change. Our society and laws must assume that bad people -
criminals even - will run the government, at least part of the
time. -- John Gilmore
-----------------------------------------------------------------------
Today: Robert Heinlein's 113th birthday
Re: Multiple regex on same URL [ In reply to ]
On Tue, 7 Jul 2020, Martin Gregorie wrote:

> On Tue, 2020-07-07 at 22:07 +0000, Pedro David Marco wrote:
>> Thanks Martin, but the meta may be possitive if one URL triggers
>> SUBRULE1 and another different URL triggers SUBRULE2...
>> how can you be sure both SUBRULES are possitive in the "same" URL?
>>
> I didn't spot the requirement that the URIs must match: I read your
> requirement as being that two matches from a group of URLs within a
> defined set or with the same second level domain would do. My mistake.
>
> Might it be easier to define and implement with a decent RDBMS and a
> clever SQL query?

Ugh, no.

The (?=...)(?!...) is a good way, but if you use * or + you need to be
careful to avoid the possibility of a backtrack DOS - use the "non-greedy"
version. However, that weakness is smaller as we're looking at URIs rather
than the entire message body - there's less to potentially backtrack over.

I suggest the positive match first, then the negative match, as the
positive match will probably occur in only a small percentage of URIs
scanned and will thus generally fail and shortcircuit the evaluation of
the (much more likely to hit) negative lookforward match.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
We have to realize that people who run the government can and do
change. Our society and laws must assume that bad people -
criminals even - will run the government, at least part of the
time. -- John Gilmore
-----------------------------------------------------------------------
Today: Robert Heinlein's 113th birthday