http://bugzilla.spamassassin.org/show_bug.cgi?id=3163 ------- Additional Comments From quinlan@pathname.com 2004-03-12 13:06 -------
Subject: Re: Obfuscation FP when obfuscating tag starts line or punctuation follows tag.
Hmmm... it seems that [A-Za-z] or \w in there does improve results. I
somehow botched my testing, but caught it when verifying the check-in
results. Anyhow, I restored the better/simpler of the regular
expression combinations I tested (15 total) and checked them into SVN.
I'm concerned that [A-Za-z] and \w are too locale-specific, so I'd like
to figure out exactly why they improve results so much over \S.
The tests:
if ($self->{html_text}[-1] =~ /\S$/s && $text =~ /^\S/s) {
$self->{html}{obfuscation}++;
}
if ($self->{html_text}[-1] =~ /\S\z/s &&
$text =~ /^\S/s)
{
$self->{html}{t_obfuscation1}++;
}
if ($self->{html_text}[-1] =~ /\S*[A-Za-z]\S*\z/ &&
$text =~ /^\S*[A-Za-z]\S*/)
{
$self->{html}{t_obfuscation2}++;
}
if ($self->{html_text}[-1] =~ /\S*\w\S*\z/ &&
$text =~ /^\S*\w\S*/)
{
$self->{html}{t_obfuscation3}++;
}
Results:
9.504 16.1123 3.1377 0.837 0.55 1.00 HTML_OBFUSCATE_00_10
1.697 3.4165 0.0401 0.988 0.96 1.00 HTML_OBFUSCATE_10_20
1.731 3.5204 0.0067 0.998 0.99 1.00 HTML_OBFUSCATE_20_30
1.952 3.9709 0.0067 0.998 0.99 1.00 HTML_OBFUSCATE_30_40
1.755 3.5759 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_40_50
1.211 2.4671 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_50_60
0.677 1.3791 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_60_70
0.527 1.0742 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_70_80
0.071 0.1455 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_80_90
0.037 0.0762 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_90_100
8.956 15.0243 3.1110 0.828 0.53 0.01 T_HTML_OBFUSCATE1_00_10
1.659 3.3611 0.0200 0.994 0.98 0.01 T_HTML_OBFUSCATE1_10_20
1.727 3.5066 0.0134 0.996 0.99 0.01 T_HTML_OBFUSCATE1_20_30
1.938 3.9501 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE1_30_40
1.761 3.5897 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE1_40_50
1.211 2.4671 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE1_50_60
0.673 1.3721 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE1_60_70
0.527 1.0742 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE1_70_80
0.071 0.1455 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE1_80_90
0.034 0.0693 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE1_90_100
2.941 5.1421 0.8211 0.862 0.59 0.01 T_HTML_OBFUSCATE2_00_10
1.496 3.0492 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_10_20
1.884 3.8392 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_20_30
2.105 4.2897 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_30_40
1.381 2.8136 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_40_50
1.224 2.4948 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_50_60
0.677 1.3791 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_60_70
0.394 0.8039 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE2_70_80
0.051 0.1040 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE2_80_90
0.340 0.6930 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE2_90_100
3.207 5.5925 0.9079 0.860 0.59 0.01 T_HTML_OBFUSCATE3_00_10
1.517 3.0908 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_10_20
1.789 3.6452 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_20_30
1.982 4.0402 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_30_40
1.605 3.2710 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_40_50
1.238 2.5225 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_50_60
0.524 1.0672 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE3_60_70
0.524 1.0672 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE3_70_80
0.068 0.1386 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE3_80_90
0.034 0.0693 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE3_90_100
and 00_05 and 05_10 results:
(this is using the current rule)
8.202 13.6105 2.9909 0.820 0.51 0.01 T_HTML_OBFUSCATEX00_05
1.302 2.5017 0.1469 0.945 0.82 0.01 T_HTML_OBFUSCATEX05_10
7.760 12.7443 2.9575 0.812 0.49 0.01 T_HTML_OBFUSCATE1X00_05
1.197 2.2800 0.1535 0.937 0.80 0.01 T_HTML_OBFUSCATE1X05_10
2.268 3.7769 0.8145 0.823 0.50 0.01 T_HTML_OBFUSCATE2X00_05
0.673 1.3652 0.0067 0.995 0.98 0.01 T_HTML_OBFUSCATE2X05_10
2.537 4.2412 0.8946 0.826 0.50 0.01 T_HTML_OBFUSCATE3X00_05
0.670 1.3514 0.0134 0.990 0.96 0.01 T_HTML_OBFUSCATE3X05_10
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.