Mailing List Archive

Mini Rex: a tool for creating regex's from a list of words
http://www.exit0.us/index.php/MiniRex

Mini_Rex reads a list of strings (one per line, blanks and other characters
are permitted), and generates a Perl Regular Expression which will match an
occurrence of any of the input strings.

Mini_Rex is invoked as follows:

mini_rex [-help] [-l <num>] [ -i ]
[-prefix="prefix string"]
-i ignore character case
-l <num> maximum pattern length
(default: no pattern length limit)
-prefix="prefix_string" prepends "prefix" to each line
(default: no prefix)
As an example, let's try feeding Mini_Rex a text file with 80/so spellings
of the ever-popular V-word:

% mini_rex.pl -prefix='body Viagra_OBFU' -i < viagra.txt
body Viagra_OBFU /(?:\\\/(?:1agr4|\ \|\ a\ g\ r\ a)|v(?:1\@gra
|\ (?:1\ (?:\@\ g\ r\ |a\ g\ r\ ?)a|\!\ a\ g\ r\ a|\|\ a\ g\ r\ a
|i(?:\ (?:\@\ g\ (?:\.r\.\ a|r\ [\@a])|a(?:\ g\ r\ [\@a]|\.g\ r\ a))
|\.\@\.g\.r\.a)|l\ (?:\@\ g\ r\ \@|a\ g\ r\
[\@a]))|\&iuml\;\&agrave\;gr\&aring\;
|\'(?:1\'a\'g\'r\'a|\|\'a\'g\'r\'a|i\'a\'g\'r\'[\@a]|l\'a\'g\'r\'a)
|\*(?:\*i\*a\*\*g\*r\*|a\@g\@r|i\*a\*g\*r)\*a|\,(?:1\,a\,g\,r\,
|\,i(?:\"a\"g\*r\+|\.a\+g\"r\")|i\,a\,g\,r\,)a|\-(?:1\-\@\-g\-r\-a
|\-(?:1\.\@\-\-g\.r\.|i\-\-a\-\-g\-\-r\-\-)a|\|\-a\-g\-r\-a|i\-(?:\@\-g\-r\-
[\@a]
|a(?:\-g(?:\'\'r\'\'|\-(?:\,\,r\,\,|r\-))|_g_r\-)a)|l\-a\-g\-r\-a)
|\.(?:1\.[\@a]\.g\.r\.|\*i\.\*a\.\*g\.\*r\.\*|\,i\,\.a\.\,g\.\,r\.\.
|i(?:\,a[\.\:]g\,r\.|\.a\.g\.r\.(?:\.\.)?|\^a\^g\.r\.|agr)|l\.a\.g\.r\,)a
|\:i\:a\:g\:r\:a|\=i\=a\=g\=r\=a|\^i\^a\^g\^[gr]\^a|\`i\`a\`g\`r\`a
|\|i\|a\|g\|r\|a|\~(?:\ i\ \~\ a\~\ g\~\ r\~\ |i(?:\~a\~g\~r\~
|ag\~r))a|_(?:1_\@_g_|i(?:\ a\ g\ |\.a\.g\.|_a_g_))r_a|i(?:\ (?:\ ag
|ag\ )ra|\&acirc\;gra|\-(?:\-a\,g\-r\-\-r\,|ag\.r)a|\.(?:a\.)?g\.r\.a
|\@raga|a(?:g(?:\-g?ra|gra|rax)|rgg?ra)|magera)|jiagmra|ye\ agrah))/i

Other tools:

Expand_regex, a tool that assists in understanding and debugging complicated
regex's.
http://www.exit0.us/index.php/ExpandRegex

Split_mail, a tool for splitting up large mbox's into smaller ones, each
containing at most a specified number of messages.
http://www.exit0.us/index.php/SplitMail

Extract_url's, a tool that extracts URL's from the messages in an mbox file,
and validates them by attempting to fetch the referenced page.
http://www.exit0.us/index.php/ExtractURLs