Mailing List Archive

update on unparsed url types
This is an update from yesterday's post on urls which are not
currently being parsed by sa in version 2.63

Further cases:

6. msn redirection services g.msn.com

workaround for PerMsgStatus.pm
$uri =~ s/^http:\/\/g.msn.com\/[^\*]+\?http\:(.*)$/http\:$1/g;

7. use of html escape sequences in the url
http://toform.net/mcp/879/1352/cap112.html
To translate these into the equivalent ascii characters,
I have used HTML::entities rather than reinvent the wheel

workaround for PerMsgStatus.pm
use HTML::Entities;
$uri = HTML::Entities::decode($uri);

Here is a cumulative diff containing the workarounds for these
and the previous cases. The diff is against PerMsgStatus.pm
2.63 already patched with SpamCopUri 0.09

Hopefully someone can include these
in version 3 and more elegantly....

diff PerMsgStatus.pm.orig PerMsgStatus.pm
----cut-------
45a47
> use HTML::Entities;
1777a1780,1789
> dbg("Got URI: $uri");
> $uri =~ s/\%68/h/g;
> $uri =~ s/\%74/t/g;
> $uri =~ s/\%70/p/g;
> $uri =~ s/http:\/([^\/])/http:\/\/$1/g;
> $uri =~ s/http:\/\/http:\/\//http:\/\//g;
> $uri =~ s/^http:\/\/(?:drs|rd).yahoo.com\/[^\*]+\*(.*)$/$1/g;
> $uri =~ s/^http:\/\/g.msn.com\/[^\*]+\?http\:(.*)$/http\:$1/g;
> $uri = HTML::Entities::decode($uri);
> dbg("URI after filter: $uri");
----cut-------
Re: update on unparsed url types [ In reply to ]
On Sun, Apr 18, 2004 at 02:29:56PM +0200, John Fawcett wrote:
> 6. msn redirection services g.msn.com

I'd rather have generic redirection handling. Something like:

$uri =~ m@^https?://.+(https?://.+)$@$1@;

anybody can start a redirection service, I don't want to have code for
each one individually unless they're doing something wonky.

come to think of this: have spammers used things like tinyurl yet?

> 7. use of html escape sequences in the url
> http://toform.net/mcp/879/1352/cap112.html
> To translate these into the equivalent ascii characters,
> I have used HTML::entities rather than reinvent the wheel

this was added a long time ago in 3.0, right after dealing with %## encodings. :)


and btw, we pretty much ignore any patches that don't come as an
attachment to bugzilla. fyi.

--
Randomly Generated Tagline:
"We all know engineers are way better than doctors because doctors can
only kill people one at a time while engineers can send 600 to a 1000
people screaming into Mt. Rushmore at 300 to 400 knots." - Unknown
Re: update on unparsed url types [ In reply to ]
On Sunday 18 April 2004 04:55 pm, Theo Van Dinter wrote:

> come to think of this: have spammers used things like tinyurl yet?

No instances of TinyUrl in any of the some 8,000 spam I'v received in April,
but looking in the news.admin.net-abuse.sightings newsgroup
(http://groups.google.com/groups?hl=en&lr=lang_en&ie=UTF-8&oe=UTF-8&safe=off&q=tinyurl&btnG=Search&meta=group%3Dnews.admin.net-abuse.sightings)
shows over 300 examples for spam with TinyUrl links.

--
Give a man a match, and he'll be warm for a minute, but set him on
fire, and he'll be warm for the rest of his life.

Advanced SPAM filtering software: http://spamassassin.org
Re: update on unparsed url types [ In reply to ]
----- Original Message -----
From: "Matthew Cline"

> On Sunday 18 April 2004 04:55 pm, Theo Van Dinter wrote:
>
> > come to think of this: have spammers used things like tinyurl yet?
>
> No instances of TinyUrl in any of the some 8,000 spam I'v received in
April,
> but looking in the news.admin.net-abuse.sightings newsgroup
>
(http://groups.google.com/groups?hl=en&lr=lang_en&ie=UTF-8&oe=UTF-8&safe=off
&q=tinyurl&btnG=Search&meta=group%3Dnews.admin.net-abuse.sightings)
> shows over 300 examples for spam with TinyUrl links.

not all occurrences are from the original spams: sometimes posters to nanas
have made the tinurl links.

I clicked on a couple of tinyurl links which were from the original spams
and
got "terminated for spamming" notices. tinyurl seems to be responsive to
spam complaints, which will deter spammers from signing up in the first
place.

John

> --
> Give a man a match, and he'll be warm for a minute, but set him on
> fire, and he'll be warm for the rest of his life.
>
> Advanced SPAM filtering software: http://spamassassin.org
>
Re: update on unparsed url types [ In reply to ]
On Sunday, April 18, 2004, 11:50:17 PM, John Fawcett wrote:
> I clicked on a couple of tinyurl links which were from the original spams
> and
> got "terminated for spamming" notices. tinyurl seems to be responsive to
> spam complaints, which will deter spammers from signing up in the first
> place.

That's excellent news! Thanks for sharing it John. Now
if we can get other redirection services to also block
spammers....

Jeff C.
Re: update on unparsed url types [ In reply to ]
On Sunday, April 18, 2004, 11:50:17 PM, John Fawcett wrote:
> I clicked on a couple of tinyurl links which were from the original spams
> and
> got "terminated for spamming" notices. tinyurl seems to be responsive to
> spam complaints, which will deter spammers from signing up in the first
> place.

That's excellent news! Thanks for sharing it John. Now
if we can get other redirection services to also block
spammers....

Jeff C.