Mailing List Archive

[Bug 3268] RFE: Strip off redirectors for URI tests
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268

felicity@kluge.net changed:

What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Summary|BIZ_TLD does not catch on a |RFE: Strip off redirectors
|redirected URL |for URI tests



------- Additional Comments From felicity@kluge.net 2004-04-18 14:34 -------
since this isn't a bug, it's an enhancement, changing priorities and such. also
adding to 3208.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268

felicity@kluge.net changed:

What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO| |3208
nThis| |





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268





------- Additional Comments From johnml@michaweb.net 2004-04-18 14:48 -------
This RFE is closely related to 3261.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268





------- Additional Comments From felicity@kluge.net 2004-04-18 15:24 -------
Just to note:

uri T_REDIRECT m@^https?://.+https?://@i

3.523 3.5407 0.7149 0.832 1.00 0.01 T_REDIRECT

the FPs were from:

* AMEX
* Fidelity
* EFF (about anonymizer.com)
* Yahoo (they do use rd.yahoo.com ... and here's a fun one:
"http://pa.yahoo.com/*http://us.rd.yahoo.com/evt=23765/*http://photos.yahoo.c
om/ph/print_splash")

So IMO, the general rule isn't great. FYI.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Re: [Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
> Just to note:
>
> uri T_REDIRECT m@^https?://.+https?://@i

What about an eval that doesn't hit if LHS domain matches the RHS
domain?

Daniel

--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268





------- Additional Comments From quinlan@pathname.com 2004-04-18 18:03 -------
Subject: Re: RFE: Strip off redirectors for URI tests

> Just to note:
>
> uri T_REDIRECT m@^https?://.+https?://@i

What about an eval that doesn't hit if LHS domain matches the RHS
domain?

Daniel





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268





------- Additional Comments From felicity@kluge.net 2004-04-19 09:12 -------
ok, I added code to the URI array generator to loop through redirectors and deal with them. ie:

http://foo.com/?http://bar.com?http://baz.com

will add to the URI list:

http://foo.com/?http://bar.com?http://baz.com
http://bar.com?http://baz.com
http://baz.com

so you can still detect the redirection attempt, and all the URI-related rules will see everything as well.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268





------- Additional Comments From felicity@kluge.net 2004-04-19 11:27 -------
I tried an initial eval rule, ignoring when the redirector was redirecting to a site in the same domain:

3.628 3.6472 0.4575 0.889 1.00 0.01 T_REDIRECTOR

the main issue is that I have a bunch of Yahoo Groups mail which end up getting multiple URLs ala:

http://pa.yahoo.com/*http://us.rd.yahoo.com/evt=23765/*http://photos.yahoo.c

then

http://pa.yahoo.com/*http://us.rd.yahoo.com/evt=23765/*http://photos.yahoo.com/ph/print_splash

the latter one is fine (they're all yahoo.com), but the former one gets yahoo.com != yahoo.c, which then
fails.

I need to go through and find out where that yahoo.c parsing is happening (I'm fairly positive I know
where). I can see why it does it though:

<a href="http://pa.yahoo.com/*http://us.rd.yahoo.com/evt=23765/*http://photos.yahoo.c
om/ph/print_splash">

there's a hard newline in there, so part of the parsing sees it as EOL, and the HTML parsing successfully
sees it in an href and takes the whole thing, including the newline -- which is why I added in code to
strip the newlines out.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268





------- Additional Comments From felicity@kluge.net 2004-04-19 12:48 -------
Subject: Re: RFE: Strip off redirectors for URI tests

On Mon, Apr 19, 2004 at 11:27:49AM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> I need to go through and find out where that yahoo.c parsing is happening (I'm fairly positive I know
> where). I can see why it does it though:
>
> <a href="http://pa.yahoo.com/*http://us.rd.yahoo.com/evt=23765/*http://photos.yahoo.c
> om/ph/print_splash">
>
> there's a hard newline in there, so part of the parsing sees it as EOL, and the HTML parsing successfully
> sees it in an href and takes the whole thing, including the newline -- which is why I added in code to
> strip the newlines out.

Yeah, the problem is that the get_uri_list() was using the decoded
body and parsing for URIs using REs, then using the URI results of the
HTML parser.

I modified the code to use the rendered body for generic RE parsing,
then let the HTML parser do its thing for the HTML sections...





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268





------- Additional Comments From felicity@kluge.net 2004-04-19 14:25 -------
Subject: Re: RFE: Strip off redirectors for URI tests

On Mon, Apr 19, 2004 at 12:48:39PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> I modified the code to use the rendered body for generic RE parsing,
> then let the HTML parser do its thing for the HTML sections...

Ok, after reworking that stuff, the rule is now the best it'll get for me:

3.614 3.6344 0.1830 0.952 1.00 0.01 T_REDIRECTOR

There are 2 FPs, both valid hits.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268

felicity@kluge.net changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED



------- Additional Comments From felicity@kluge.net 2004-04-19 16:39 -------
ok, I checked in the redirector and get_uri_list() updates. should be all good, so closing the ticket as
fixed. :)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 3268] RFE: Strip off redirectors for URI tests [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3268





------- Additional Comments From marc@perkel.com 2004-04-20 11:29 -------
Subject: Re: RFE: Strip off redirectors for URI tests

For what it's worth - I'd add a separate rule for additional points when
they use a redisector to conceal a blacklisted link.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.