Mailing List Archive

Old bug with URLs and ) character still present
Please change regular expressions used to match URLs so
that final `)' isn't matched.

Probably some other characters should be disallowed too,
like comma and dot.
Re: Old bug with URLs and ) character still present [ In reply to ]
On dim, 2003-02-16 at 04:26, Tomasz Wegrzanowski wrote:
> Please change regular expressions used to match URLs so
> that final `)' isn't matched.

The trouble with ) is of course that various pages on Wikipedia (and
perhaps elsewhere) _do_ end with a close-paren. However, unless there is
an open-paren in the URL, we can likely ignore it. It should be possible
to come up with a fun regexp for that. :)

> Probably some other characters should be disallowed too,
> like comma and dot.

OutputPage::subReplaceExternalLinks():
# this is the list of separators that should be ignored if they
# are the last character of an URL but that should be included
# if they occur within the URL, e.g. "go to www.foo.com, where .."
# in this case, the last comma should not become part of the URL,
# but in "www.foo.com/123,2342,32.htm" it should.
$sep = ",;\.:";

-- brion vibber (brion @ pobox.com)