Mailing List Archive

regex help wanted
Hi folks,

I am tracking down a problem with a user here in this forum thread:

http://kb.monitorware.com/regex-match-for-port-t8764.html#p14411

He claims that rsyslog does not correctly handle POSIX ERE regular
expressions, but I now have written a simple program (tester) that works
in the same way. I don't think I have set any options wrong. However, I
am not at all an regex expert.

Maybe some of you could shed some light on this? I would really
appreciate some help, as the situation sounds sub-optimal, at least
until we have an explanation for it...

Thanks,
Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: regex help wanted [ In reply to ]
> He claims that rsyslog does not correctly handle POSIX ERE regular
> expressions, but I now have written a simple program (tester) that works
> in the same way. I don't think I have set any options wrong. However, I
> am not at all an regex expert.

I've not carefully read the portion of the source where you do your
matching; can you point me in a general direction? What library are
you using?


RB
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: regex help wanted [ In reply to ]
> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of RB
> Sent: Wednesday, October 29, 2008 3:47 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] regex help wanted
>
> > He claims that rsyslog does not correctly handle POSIX ERE regular
> > expressions, but I now have written a simple program (tester) that
> works
> > in the same way. I don't think I have set any options wrong.
However,
> I
> > am not at all an regex expert.
>
> I've not carefully read the portion of the source where you do your
> matching; can you point me in a general direction? What library are
> you using?

Sure. There is a very simple sample program inside the forum thread.
This is a direct link:

http://kb.monitorware.com/regex-match-for-port-t8764-15.html#p14423

Well... I'll also include it after my sig. I use the plain old clib
regex library. The problem can fully be reproduced with the minimalistic
sample, so I think it is better to look at it than at the actual rsyslog
source (but most of the regex functionality is in msg.c).

Thanks,
Rainer

#include <stdio.h>
#include <sys/types.h>
#include <regex.h>

#define STR "%ASA-6-302015: Built outbound UDP connection 25503427
for outside:198.14.210.2/53 (198.14.210.2/53) to inside:12.66.8.80/61594
(198.39.187.236/54751)"
int main()
{
regex_t preg;
size_t nmatch = 10;
regmatch_t pmatch[10];
char str[] = STR;
int i;

i = regcomp(&preg, "outside:.+?/(.+?)\\s", REG_EXTENDED);
printf("regcomp returns %d\n", i);
i = regexec(&preg, str, nmatch, pmatch, 0);
printf("regexec returns %d\n", i);
if(i == REG_NOMATCH) {
printf("found no match!\n");
return 1;
}

printf("returned substrings:\n");
for(i = 0 ; i < 10 ; i++) {
printf("%d: so %d, eo %d", i, pmatch[i].rm_so,
pmatch[i].rm_eo);
if(pmatch[i].rm_so != -1) {
int j;
printf(", text: '");
for(j = pmatch[i].rm_so ; j < pmatch[i].rm_eo ; ++j)
putchar(str[j]);
putchar('\'');
}
putchar('\n');
}
return 0;
}
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: regex help wanted [ In reply to ]
The lazy quantifier (?) is not being respected, so rather than
returning the first match of "stuff, a forward slash, more stuff,
space", it returns the last one.

I know zero C, so I'm afraid I can't be much help in figuring out why.
You could try the universal regexp trick: if in doubt, escape it!

-HKS

On Wed, Oct 29, 2008 at 10:50 AM, Rainer Gerhards
<rgerhards@hq.adiscon.com> wrote:
>> -----Original Message-----
>> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
>> bounces@lists.adiscon.com] On Behalf Of RB
>> Sent: Wednesday, October 29, 2008 3:47 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] regex help wanted
>>
>> > He claims that rsyslog does not correctly handle POSIX ERE regular
>> > expressions, but I now have written a simple program (tester) that
>> works
>> > in the same way. I don't think I have set any options wrong.
> However,
>> I
>> > am not at all an regex expert.
>>
>> I've not carefully read the portion of the source where you do your
>> matching; can you point me in a general direction? What library are
>> you using?
>
> Sure. There is a very simple sample program inside the forum thread.
> This is a direct link:
>
> http://kb.monitorware.com/regex-match-for-port-t8764-15.html#p14423
>
> Well... I'll also include it after my sig. I use the plain old clib
> regex library. The problem can fully be reproduced with the minimalistic
> sample, so I think it is better to look at it than at the actual rsyslog
> source (but most of the regex functionality is in msg.c).
>
> Thanks,
> Rainer
>
> #include <stdio.h>
> #include <sys/types.h>
> #include <regex.h>
>
> #define STR "%ASA-6-302015: Built outbound UDP connection 25503427
> for outside:198.14.210.2/53 (198.14.210.2/53) to inside:12.66.8.80/61594
> (198.39.187.236/54751)"
> int main()
> {
> regex_t preg;
> size_t nmatch = 10;
> regmatch_t pmatch[10];
> char str[] = STR;
> int i;
>
> i = regcomp(&preg, "outside:.+?/(.+?)\\s", REG_EXTENDED);
> printf("regcomp returns %d\n", i);
> i = regexec(&preg, str, nmatch, pmatch, 0);
> printf("regexec returns %d\n", i);
> if(i == REG_NOMATCH) {
> printf("found no match!\n");
> return 1;
> }
>
> printf("returned substrings:\n");
> for(i = 0 ; i < 10 ; i++) {
> printf("%d: so %d, eo %d", i, pmatch[i].rm_so,
> pmatch[i].rm_eo);
> if(pmatch[i].rm_so != -1) {
> int j;
> printf(", text: '");
> for(j = pmatch[i].rm_so ; j < pmatch[i].rm_eo ; ++j)
> putchar(str[j]);
> putchar('\'');
> }
> putchar('\n');
> }
> return 0;
> }
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: regex help wanted [ In reply to ]
> The lazy quantifier (?) is not being respected, so rather than
> returning the first match of "stuff, a forward slash, more stuff,
> space", it returns the last one.

Correct, because the lazy quantifier is not POSIX and therefore does
not work when using POSIX extended REs. I was wondering why the extra
'?', but had forgotten about lazy notation, preferring to make my
regexes a bit more strict.


RB
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com
Re: regex help wanted [ In reply to ]
Ah, thanks a lot to both of you. This greatly helps and makes me feel
much better. Of course, we can add additional regex libs, but it has
been proven that the current implementation works OK.

Thanks again,
Rainer

> -----Original Message-----
> From: rsyslog-bounces@lists.adiscon.com [mailto:rsyslog-
> bounces@lists.adiscon.com] On Behalf Of RB
> Sent: Wednesday, October 29, 2008 4:33 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] regex help wanted
>
> > The lazy quantifier (?) is not being respected, so rather than
> > returning the first match of "stuff, a forward slash, more stuff,
> > space", it returns the last one.
>
> Correct, because the lazy quantifier is not POSIX and therefore does
> not work when using POSIX extended REs. I was wondering why the extra
> '?', but had forgotten about lazy notation, preferring to make my
> regexes a bit more strict.
>
>
> RB
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com