Hello,
I'm trying to use Hashbl plugin with bodyre function.
With that function I would like to match utf8 patterns, such as
'([\p{L}\p{M}\d\S]+[\ \t]+[\p{L}\p{M}\d\S]+)'
I'm in particular interested in accented characters, such as /[?????]/.
With Perl, if I try:
```
use utf8;
use open ':std', ':encoding(UTF-8)';
$txt = ' musica ? ciao ciao.';
$re = '([\p{L}\p{M}\d\S]+[\ \t]+[\p{L}\p{M}\d\S]+)';
if ($txt =~ /$re/gs) {
print "Match: $1";
}
```
then $txt matches as well.
With Spamassassin I built my own dnsbl of hashes and the Spamassassin rule:
body HASHBL_MY_SPAM1
eval:check_hashbl_bodyre('spamhash.example.com', 'sha1/max=10/shuffle',
'([\p{L}\p{M}\d\S]+[\ \t]+[\p{L}\p{M}\d\S]+)', '^127\.0\.0\.2')
This doesn't match the above $txt in the body of the mail.
If I want to match as expected the string ' musica ? ciao ciao.' in the
body of the mail, then I must change the above regex in the following way:
body HASHBL_MY_SPAM1
eval:check_hashbl_bodyre('spamhash.example.com', 'sha1/max=10/shuffle',
'([\p{L}\p{M}\d\S?????]+[\ \t]+[\p{L}\p{M}\d\S?????]+)', '^127\.0\.0\.2')
So I have to add the accented character literally.
I can't understand why. Are there any limitation in Hashbl plugin with UTF8?
Maybe I have misunderstood something.
Thank you very much for every hint.
Kind Regards
Marco
I'm trying to use Hashbl plugin with bodyre function.
With that function I would like to match utf8 patterns, such as
'([\p{L}\p{M}\d\S]+[\ \t]+[\p{L}\p{M}\d\S]+)'
I'm in particular interested in accented characters, such as /[?????]/.
With Perl, if I try:
```
use utf8;
use open ':std', ':encoding(UTF-8)';
$txt = ' musica ? ciao ciao.';
$re = '([\p{L}\p{M}\d\S]+[\ \t]+[\p{L}\p{M}\d\S]+)';
if ($txt =~ /$re/gs) {
print "Match: $1";
}
```
then $txt matches as well.
With Spamassassin I built my own dnsbl of hashes and the Spamassassin rule:
body HASHBL_MY_SPAM1
eval:check_hashbl_bodyre('spamhash.example.com', 'sha1/max=10/shuffle',
'([\p{L}\p{M}\d\S]+[\ \t]+[\p{L}\p{M}\d\S]+)', '^127\.0\.0\.2')
This doesn't match the above $txt in the body of the mail.
If I want to match as expected the string ' musica ? ciao ciao.' in the
body of the mail, then I must change the above regex in the following way:
body HASHBL_MY_SPAM1
eval:check_hashbl_bodyre('spamhash.example.com', 'sha1/max=10/shuffle',
'([\p{L}\p{M}\d\S?????]+[\ \t]+[\p{L}\p{M}\d\S?????]+)', '^127\.0\.0\.2')
So I have to add the accented character literally.
I can't understand why. Are there any limitation in Hashbl plugin with UTF8?
Maybe I have misunderstood something.
Thank you very much for every hint.
Kind Regards
Marco