Mailing List Archive

[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag.
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-12 07:42 -------
Created an attachment (id=1834)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=1834&action=view)
Proposed fix.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-12 07:47 -------
I should point out that the text samples from my first entry are
from a non-spam message. If I could I'd change the E-mail addresses
and links to something generic. Is there any way to do that?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-12 09:07 -------
With the patch above obfuscation is not suspected if the run of
non-whitespace after the tag does not consist of an alphabetic
character. I've since gotten legitimate E-mail in which the run of
text before the tag consists of only non-whitespace, non-alphabetic
characters (a single open parenthesis). With the patch below
obfuscation will be suspected only if the runs both before and after
the tag contain at least one alphabetic character.

This (<xx>an let some true 0<yy>bfuscation through, but I'm guessing
that the limited number remaining obfuscation opportunities would
increase the effectiveness of the keyword scanners and Bayesian
classifiers that obfuscation is trying to evade.

As it stands (see below) the 0-10% obfuscation rule can't be scored
very highly, so the change should help. (So would a 0-5 rule.)

From http://www.pathname.com/~corpus/DETAILS.1day (12 March 2004)
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
14.467 18.3443 3.3682 0.845 0.59 1.00 HTML_OBFUSCATE_00_10





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163

koppel@ece.lsu.edu changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #1834 is|0 |1
obsolete| |



------- Additional Comments From koppel@ece.lsu.edu 2004-03-12 09:08 -------
Created an attachment (id=1835)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=1835&action=view)
Alternate proposed fix.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From quinlan@pathname.com 2004-03-12 10:18 -------
> if ( ($self->{html_text}[-1] =~ /(\S*)\z/)[0] =~ /[A-Za-z]/

Yikes, you want

if ($self->{html_text}[-1] =~ /(\S*[A-Za-z]\S*)\z/

instead and a similar change for the other line.

Also, did you test on a corpus?




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From felicity@kluge.net 2004-03-12 10:22 -------
Subject: Re: Obfuscation FP when obfuscating tag starts line or punctuation follows tag.

On Fri, Mar 12, 2004 at 10:18:29AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Also, did you test on a corpus?

I was going to, although the current rules work pretty well:

5.655 5.7517 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_10_20
5.405 5.4977 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_70_80
1.832 1.8639 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_60_70
1.610 1.6376 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_30_40
1.192 1.2127 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_40_50
0.949 0.9650 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_20_30
0.659 0.6704 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_50_60
0.044 0.0448 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_80_90
0.031 0.0320 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_90_100
17.179 17.4218 2.9925 0.853 0.59 1.00 HTML_OBFUSCATE_00_10





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-12 10:32 -------
> Yikes, you want

> if ($self->{html_text}[-1] =~ /(\S*[A-Za-z]\S*)\z/

> instead and a similar change for the other line.

Not sure what the "Yikes" is referring to. If you meant my regexp,

($self->{html_text}[-1] =~ /(\S*)\z/)[0] =~ /[A-Za-z]/

okay, the regexp you (Daniel Quinlan) gave above looks simpler and if
you're sure it will do the same thing without too much backtracking,
great.

If the "Yikes" refers to what it will and won't catch, it didn't seem
like too radical a change to me. I did test it on several messages,
but not enough messages to call a corpus. It correctly identified
true obfuscation and did not flag the false positives that I had been
encountering.

Theo, look at the last line in your table. That rule hits on 17% of
messages and has a non-trivial false positive rate so that one
could not score the rule very highly.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From felicity@kluge.net 2004-03-12 10:46 -------
Subject: Re: Obfuscation FP when obfuscating tag starts line or punctuation follows tag.

On Fri, Mar 12, 2004 at 10:32:59AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Theo, look at the last line in your table. That rule hits on 17% of
> messages and has a non-trivial false positive rate so that one
> could not score the rule very highly.

Well, you don't know that (depends what other rules the ham hits),
but I like trying to lower FPs in general. If that makes the low-end
one work better, then... I'll try the patch out on the same corpus.
results in a minute...





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-12 10:51 -------
Could you add rules to check for 0-5% and 5-10%, given the distribution?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163

quinlan@pathname.com changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #1835 is|0 |1
obsolete| |



------- Additional Comments From quinlan@pathname.com 2004-03-12 11:01 -------
Created an attachment (id=1836)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=1836&action=view)
revision as committed to SVN

This worked just as well for me and is less locale-specific:

if ($self->{html_text}[-1] =~ /\S\z/ && $text =~ /^\S/) {




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From felicity@kluge.net 2004-03-12 11:04 -------
quinlan apparently did up a new version, but here were my results overall:

old:
17.179 17.4218 2.9925 0.853 0.59 1.00 HTML_OBFUSCATE_00_10
5.655 5.7517 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_10_20
0.949 0.9650 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_20_30
1.610 1.6376 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_30_40
1.192 1.2127 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_40_50
0.659 0.6704 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_50_60
1.832 1.8639 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_60_70
5.405 5.4977 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_70_80
0.044 0.0448 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_80_90
0.031 0.0320 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_90_100

new:
6.633 6.7274 1.1222 0.857 0.58 1.00 HTML_OBFUSCATE_00_10
3.044 3.0958 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_10_20
0.945 0.9608 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_20_30
1.583 1.6098 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_30_40
1.071 1.0889 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_40_50
0.649 0.6597 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_50_60
1.816 1.8468 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_60_70
5.414 5.5062 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_70_80
0.044 0.0448 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_80_90
0.010 0.0107 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_90_100



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-12 11:17 -------
Ouch.

Actually, I'm not too surprised the FP's didn't go to zero, which is why
I wanted to see 0-5 and 5-10. What surprises me is how many TP's have
been lost. Perhaps these used non-Roman alphabets. I guess I'll need to
test these things on more messages.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From quinlan@pathname.com 2004-03-12 13:06 -------
Subject: Re: Obfuscation FP when obfuscating tag starts line or punctuation follows tag.

Hmmm... it seems that [A-Za-z] or \w in there does improve results. I
somehow botched my testing, but caught it when verifying the check-in
results. Anyhow, I restored the better/simpler of the regular
expression combinations I tested (15 total) and checked them into SVN.

I'm concerned that [A-Za-z] and \w are too locale-specific, so I'd like
to figure out exactly why they improve results so much over \S.

The tests:

if ($self->{html_text}[-1] =~ /\S$/s && $text =~ /^\S/s) {
$self->{html}{obfuscation}++;
}
if ($self->{html_text}[-1] =~ /\S\z/s &&
$text =~ /^\S/s)
{
$self->{html}{t_obfuscation1}++;
}
if ($self->{html_text}[-1] =~ /\S*[A-Za-z]\S*\z/ &&
$text =~ /^\S*[A-Za-z]\S*/)
{
$self->{html}{t_obfuscation2}++;
}
if ($self->{html_text}[-1] =~ /\S*\w\S*\z/ &&
$text =~ /^\S*\w\S*/)
{
$self->{html}{t_obfuscation3}++;
}

Results:

9.504 16.1123 3.1377 0.837 0.55 1.00 HTML_OBFUSCATE_00_10
1.697 3.4165 0.0401 0.988 0.96 1.00 HTML_OBFUSCATE_10_20
1.731 3.5204 0.0067 0.998 0.99 1.00 HTML_OBFUSCATE_20_30
1.952 3.9709 0.0067 0.998 0.99 1.00 HTML_OBFUSCATE_30_40
1.755 3.5759 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_40_50
1.211 2.4671 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_50_60
0.677 1.3791 0.0000 1.000 1.00 1.00 HTML_OBFUSCATE_60_70
0.527 1.0742 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_70_80
0.071 0.1455 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_80_90
0.037 0.0762 0.0000 1.000 0.99 1.00 HTML_OBFUSCATE_90_100

8.956 15.0243 3.1110 0.828 0.53 0.01 T_HTML_OBFUSCATE1_00_10
1.659 3.3611 0.0200 0.994 0.98 0.01 T_HTML_OBFUSCATE1_10_20
1.727 3.5066 0.0134 0.996 0.99 0.01 T_HTML_OBFUSCATE1_20_30
1.938 3.9501 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE1_30_40
1.761 3.5897 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE1_40_50
1.211 2.4671 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE1_50_60
0.673 1.3721 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE1_60_70
0.527 1.0742 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE1_70_80
0.071 0.1455 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE1_80_90
0.034 0.0693 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE1_90_100

2.941 5.1421 0.8211 0.862 0.59 0.01 T_HTML_OBFUSCATE2_00_10
1.496 3.0492 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_10_20
1.884 3.8392 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_20_30
2.105 4.2897 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_30_40
1.381 2.8136 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_40_50
1.224 2.4948 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_50_60
0.677 1.3791 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE2_60_70
0.394 0.8039 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE2_70_80
0.051 0.1040 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE2_80_90
0.340 0.6930 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE2_90_100

3.207 5.5925 0.9079 0.860 0.59 0.01 T_HTML_OBFUSCATE3_00_10
1.517 3.0908 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_10_20
1.789 3.6452 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_20_30
1.982 4.0402 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_30_40
1.605 3.2710 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_40_50
1.238 2.5225 0.0000 1.000 1.00 0.01 T_HTML_OBFUSCATE3_50_60
0.524 1.0672 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE3_60_70
0.524 1.0672 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE3_70_80
0.068 0.1386 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE3_80_90
0.034 0.0693 0.0000 1.000 0.99 0.01 T_HTML_OBFUSCATE3_90_100

and 00_05 and 05_10 results:

(this is using the current rule)
8.202 13.6105 2.9909 0.820 0.51 0.01 T_HTML_OBFUSCATEX00_05
1.302 2.5017 0.1469 0.945 0.82 0.01 T_HTML_OBFUSCATEX05_10

7.760 12.7443 2.9575 0.812 0.49 0.01 T_HTML_OBFUSCATE1X00_05
1.197 2.2800 0.1535 0.937 0.80 0.01 T_HTML_OBFUSCATE1X05_10

2.268 3.7769 0.8145 0.823 0.50 0.01 T_HTML_OBFUSCATE2X00_05
0.673 1.3652 0.0067 0.995 0.98 0.01 T_HTML_OBFUSCATE2X05_10

2.537 4.2412 0.8946 0.826 0.50 0.01 T_HTML_OBFUSCATE3X00_05
0.670 1.3514 0.0134 0.990 0.96 0.01 T_HTML_OBFUSCATE3X05_10





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-12 15:10 -------
> I'm concerned that [A-Za-z] and \w are too locale-specific, so I'd like
> to figure out exactly why they improve results so much over \S.

HTML like this, where punctuation follows something in an anchor, is
probably fairly common:

<a href="mailto:userid@sld.info"><u>userid@sld.info</a></u>;

so that might be why it reduces false positives (\S would match the
semicolon). I guess the use of [A-Za-z] instead of \S would reduce
the number of true positives in non-Roman messages, but most spam
(that I receive) uses Roman characters so I'm not sure why you are
surprised at the improvement it does get. Maybe I misunderstood the
comment.

The reduction in the number of hits on spam messages in the tables
above is probably due to false positives in those spam messages that
do not contain obfuscation. Perhaps the false positives in ham
messages can be reduced further, I'm going to look for ham that the
rules hit so they can be tweaked.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-13 08:50 -------
Thanks to Daniel Quinlan and Theo Van Dinter for quickly trying out
the variations and posting the data yesterday.

One minor suggestion, remove the unnecessary \S* from each regexp:

$self->{html_text}[-1] =~ /[A-Za-z]\S*\z/ && $text =~ /^\S*[A-Za-z]/

Also, the "$" in the backhair regexp might also be changed to a \z.

Below is the data from last night's run
(http://www.pathname.com/~corpus/HTML.1day). (The accuracy is harder
to see in the DETAILS data, which includes html and non-html messages,
because HTML messages are more likely spam.)

Here are the additional cases that Daniel Quinlan added last night:

if ($self->{html_text}[-1] =~ m{[^\s\(\)\<\>\[\]\$\,\"\;\/\#]\z}s &&
$text =~ m{^[^\s\(\)\<\>\[\]\$\,\"\;\/\#]}s)
{
$self->{html}{t_obfuscation4}++;
}
if ($self->{html_text}[-1] =~ /[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]\z/s &&
$text =~ /^[^\s\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]/s)
{
$self->{html}{t_obfuscation5}++;
}

(For obfuscation4, a "." may have been left out of the
[^\s\(\)\<\>\[\]\$\,\"\;\/\#] )


OVERALL% SPAM% HAM% S/O RANK SCORE NAME
63467 60908 2559 0.960 0.00 0.00 (all messages)
100.000 95.9680 4.0320 0.960 0.00 0.00 (all messages as %)

2.929 2.9241 3.0481 0.490 0.11 0.01 00_05_T_HTML_OBFUSCATE5
6.567 6.3309 12.1923 0.342 0.04 0.01 00_05_T_HTML_OBFUSCATE4
3.547 3.4166 6.6432 0.340 0.04 0.01 00_05_T_HTML_OBFUSCATE3
3.107 2.9914 5.8617 0.338 0.04 0.01 00_05_T_HTML_OBFUSCATE2
7.777 7.0845 24.2673 0.226 0.02 0.01 00_05_T_HTML_OBFUSCATE
7.429 6.7282 24.1110 0.218 0.02 0.01 00_05_T_HTML_OBFUSCATE1

23.675 23.1677 35.7562 0.393 0.09 1.00 00_10_HTML_OBFUSCATE
3.561 3.5759 3.2044 0.527 0.14 0.01 00_10_T_HTML_OBFUSCATE5
4.012 3.9305 5.9398 0.398 0.06 0.01 00_10_T_HTML_OBFUSCATE2
7.796 7.5983 12.5049 0.378 0.06 0.01 00_10_T_HTML_OBFUSCATE4
4.462 4.3541 7.0340 0.382 0.06 0.01 00_10_T_HTML_OBFUSCATE3
8.874 8.1582 25.9086 0.239 0.02 0.01 00_10_T_HTML_OBFUSCATE1

0.904 0.9391 0.0782 0.923 0.71 0.01 05_10_T_HTML_OBFUSCATE2
0.632 0.6518 0.1563 0.807 0.48 0.01 05_10_T_HTML_OBFUSCATE5
1.229 1.2675 0.3126 0.802 0.47 0.01 05_10_T_HTML_OBFUSCATE4
0.915 0.9375 0.3908 0.706 0.32 0.01 05_10_T_HTML_OBFUSCATE3
1.445 1.4300 1.7976 0.443 0.08 0.01 05_10_T_HTML_OBFUSCATE1
1.492 1.4744 1.9148 0.435 0.08 0.01 05_10_T_HTML_OBFUSCATE

5.311 5.5034 0.7425 0.881 0.63 1.00 10_20_HTML_OBFUSCATE
1.788 1.8635 0.0000 1.000 0.91 0.01 10_20_T_HTML_OBFUSCATE3
1.760 1.8339 0.0000 1.000 0.91 0.01 10_20_T_HTML_OBFUSCATE2
1.803 1.8766 0.0391 0.980 0.85 0.01 10_20_T_HTML_OBFUSCATE5
1.910 1.9833 0.1563 0.927 0.72 0.01 10_20_T_HTML_OBFUSCATE4
1.963 2.0342 0.2735 0.881 0.62 0.01 10_20_T_HTML_OBFUSCATE1

3.657 3.8090 0.0391 0.990 0.88 1.00 20_30_HTML_OBFUSCATE
1.654 1.7239 0.0000 1.000 0.91 0.01 20_30_T_HTML_OBFUSCATE2
1.541 1.6057 0.0000 1.000 0.91 0.01 20_30_T_HTML_OBFUSCATE3
1.538 1.6024 0.0000 1.000 0.91 0.01 20_30_T_HTML_OBFUSCATE4
1.525 1.5893 0.0000 1.000 0.91 0.01 20_30_T_HTML_OBFUSCATE5
1.480 1.5384 0.0782 0.952 0.78 0.01 20_30_T_HTML_OBFUSCATE1

4.492 4.6792 0.0391 0.992 0.89 1.00 30_40_HTML_OBFUSCATE
2.184 2.2756 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE4
2.138 2.2280 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE5
2.074 2.1606 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE2
2.069 2.1557 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE3
2.058 2.1442 0.0000 1.000 0.91 0.01 30_40_T_HTML_OBFUSCATE1




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From quinlan@pathname.com 2004-03-16 11:19 -------
Subject: Re: Obfuscation FP when obfuscating tag starts line or punctuation follows tag.

You want to use HTML.new, not HTML.1day. 1day is all results uploaded
in the last day without regard to SVN version. new is the latest
results of the current revision.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-16 11:21 -------
Thanks for pointing that out.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163





------- Additional Comments From koppel@ece.lsu.edu 2004-03-16 11:34 -------
This data is from http://www.pathname.com/~corpus/HTML.new today.

OVERALL% SPAM% HAM% S/O RANK SCORE NAME

118520 114767 3753 0.968 0.00 0.00 (all messages)
100.000 96.8334 3.1666 0.968 0.00 0.00 (all messages as %)

5.927 5.7717 10.6848 0.351 0.05 0.01 00_05 T_HTML_OBFUSCATE5
8.334 8.2524 10.8447 0.432 0.09 0.01 00_05 T_HTML_OBFUSCATE2
13.448 13.5021 11.7772 0.534 0.16 0.01 00_05 T_HTML_OBFUSCATE3
17.130 16.3584 40.7407 0.286 0.05 0.01 00_05 T_HTML_OBFUSCATE4
23.185 22.3122 49.8801 0.309 0.06 0.01 00_05 T_HTML_OBFUSCATE1
23.620 22.7565 50.0133 0.313 0.06 0.01 00_05 T_HTML_OBFUSCATE

9.664 9.6212 10.9779 0.467 0.11 0.01 00_10 T_HTML_OBFUSCATE2
7.147 7.0194 11.0578 0.388 0.07 0.01 00_10 T_HTML_OBFUSCATE5
14.812 14.8971 12.2036 0.550 0.18 0.01 00_10 T_HTML_OBFUSCATE3
19.387 18.6508 41.9131 0.308 0.05 0.01 00_10 T_HTML_OBFUSCATE4
25.967 25.0978 52.5446 0.323 0.07 0.01 00_10 T_HTML_OBFUSCATE1
26.463 25.6014 52.8111 0.326 0.07 1.00 00_10 HTML_OBFUSCATE

1.330 1.3689 0.1332 0.911 0.72 0.01 05_10 T_HTML_OBFUSCATE2
1.220 1.2477 0.3730 0.770 0.44 0.01 05_10 T_HTML_OBFUSCATE5
1.364 1.3950 0.4263 0.766 0.43 0.01 05_10 T_HTML_OBFUSCATE3
2.257 2.2925 1.1724 0.662 0.28 0.01 05_10 T_HTML_OBFUSCATE4
2.782 2.7856 2.6645 0.511 0.13 0.01 05_10 T_HTML_OBFUSCATE1
2.843 2.8449 2.7978 0.504 0.13 0.01 05_10 T_HTML_OBFUSCATE

4.808 4.9657 0.0000 1.000 0.96 0.01 10_20 T_HTML_OBFUSCATE2
4.966 5.1278 0.0266 0.995 0.95 0.01 10_20 T_HTML_OBFUSCATE5
4.857 5.0145 0.0266 0.995 0.95 0.01 10_20 T_HTML_OBFUSCATE3
6.126 6.3163 0.2931 0.956 0.84 0.01 10_20 T_HTML_OBFUSCATE4
6.076 6.2492 0.7727 0.890 0.68 0.01 10_20 T_HTML_OBFUSCATE1
7.273 7.4821 0.8793 0.895 0.70 1.00 10_20 HTML_OBFUSCATE

2.616 2.7011 0.0000 1.000 0.96 0.01 20_30 T_HTML_OBFUSCATE2
2.420 2.4964 0.0799 0.969 0.87 0.01 20_30 T_HTML_OBFUSCATE5
2.551 2.6314 0.0799 0.971 0.88 0.01 20_30 T_HTML_OBFUSCATE3
2.766 2.8536 0.0799 0.973 0.88 0.01 20_30 T_HTML_OBFUSCATE4
2.606 2.6872 0.1332 0.953 0.83 1.00 20_30 HTML_OBFUSCATE
2.597 2.6767 0.1599 0.944 0.81 0.01 20_30 T_HTML_OBFUSCATE1




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3163] Obfuscation FP when obfuscating tag starts line or punctuation follows tag. [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3163

quinlan@pathname.com changed:

What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|spamassassin- |quinlan@pathname.com
|dev@incubator.apache.org |



------- Additional Comments From quinlan@pathname.com 2004-03-19 00:31 -------
assigning




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.