Mailing List Archive

URL Extractor?
Just so I don't have to reinvent the wheel, does anyone have a perl script
to extract all URL's from a mailbox/message? I'd like to automatically
generate a blacklist from spamtrap mailboxes and such. The desired output
would be similar to the BigEvil script.

Sincerely,
Kirk Ismay
System Administrator
http://www.netidea.biz/

There are 10 types of people in this world...
Those who can read binary, and those who can't.
Re: URL Extractor? [ In reply to ]
On Fri, 12 Mar 2004, Kirk Ismay wrote:

> Just so I don't have to reinvent the wheel, does anyone have a perl script
> to extract all URL's from a mailbox/message? I'd like to automatically
> generate a blacklist from spamtrap mailboxes and such. The desired output
> would be similar to the BigEvil script.
>
> Sincerely,
> Kirk Ismay

Danger Will Robinson, doing this with out human supervision is a -BAD-
idea. Spammers have been known to toss in URLs to legit sites to
obfuscate spam, to add apparent legitimacy, to mess up auto-collectors
(such as what you are proposing).

for example, stock scams have been known to toss in links pointing
to SEC sites to make their claims look legit. Ditto for Nigerian spams
pointing to news sites to support their claims about events
in Africa.

Check with Chris Santerre (author of BigEvil) about the fun he's had due
to this kind of shenanigans (even humans can be occasionally mislead ;).

--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
RE: URL Extractor? [ In reply to ]
> From: Kirk Ismay
> Sent: Friday, March 12, 2004 3:52 PM
[...]
>
> Just so I don't have to reinvent the wheel, does anyone have a perl script
> to extract all URL's from a mailbox/message? I'd like to automatically
> generate a blacklist from spamtrap mailboxes and such. The desired output
> would be similar to the BigEvil script.
>

Here ya go:
http://www.exit0.us/index.php/ExtractURLs
RE: URL Extractor? [ In reply to ]
> -----Original Message-----
> From: David B Funk [mailto:dbfunk@engineering.uiowa.edu]
> Sent: Friday, March 12, 2004 7:18 PM
> To: Kirk Ismay
> Cc: spamassassin-users@incubator.apache.org
> Subject: Re: URL Extractor?
>
>
> On Fri, 12 Mar 2004, Kirk Ismay wrote:
>
> > Just so I don't have to reinvent the wheel, does anyone
> have a perl script
> > to extract all URL's from a mailbox/message? I'd like to
> automatically
> > generate a blacklist from spamtrap mailboxes and such. The
> desired output
> > would be similar to the BigEvil script.
> >
> > Sincerely,
> > Kirk Ismay
>
> Danger Will Robinson, doing this with out human supervision is a -BAD-
> idea. Spammers have been known to toss in URLs to legit sites to
> obfuscate spam, to add apparent legitimacy, to mess up auto-collectors
> (such as what you are proposing).
>
> for example, stock scams have been known to toss in links pointing
> to SEC sites to make their claims look legit. Ditto for Nigerian spams
> pointing to news sites to support their claims about events
> in Africa.
>
> Check with Chris Santerre (author of BigEvil) about the fun
> he's had due
> to this kind of shenanigans (even humans can be occasionally
> mislead ;).
>
> --
> Dave Funk University of Iowa
*snip*

Oh yeah! Automating this is a baaad idea! I still have some reservations on
the new URL checking in SA 3.0, but I'll wait and see. The only real slow
^H^H^H^H^H safe way to do this is by hand :) Every URL submitted is checked
for 3 TLDs (biz, net, and com). Then any url that isn't blatantly obvious
like (I.sell.generic.drugs.while.molesting.midgets.biz) is checked in :

1) RBL listings
2) NANAS searched
3) Site actually looked at! (Yeah a reason to view pron at work! *sigh*)

Then comes the fun of finding the best way to incorporate the name into
Bigevil!

I'm so far behind in updates it's not even funny. I think I'm going to have
to limit people's submissions to no more the 15 at a time. Those huge lists
kill me!

--Chris
True skill is effortless. -Li Mu Bai
(Guess what DVD I watched last night?) ;-)