Mailing List Archive

[Bug 8214] __HAS_ANY_URI matches non-URI
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8214

Bill Cole <billcole@apache.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |billcole@apache.org
Resolution|--- |WORKSFORME
Status|NEW |RESOLVED

--- Comment #2 from Bill Cole <billcole@apache.org> ---

The hypothetical message seemingly quoted in this bug report DOES NOT match
BODY_URI_ONLY. It hits the unscored __HAS_ANY_URI, but that is not meaningful
(or even visible unless you scan it with debug messages.) It also is formatted
as a locally-submitted message that has not been transported by SMTP.

The attached message is NOT at all similar to that message, but rather it is a
DMARC report in multipart/mixed format with a one-line text part and a gzip'ed
XML file. It does hit BODY_URI_ONLY but because it hits nothing else, it comes
nowhere near the default threshold of 5.

SpamAssassin only examines text parts. As far as SA is concerned, the attached
message has one line, which includes a domain name. SA does not only detect as
'URIs' strings which meet the RFC 3986 standard, because many end-user mail
clients will make a wide range of non-URIs "clickable" as links even in plain
text mail. The purpose of SA includes detection of spam without regard to
whether the spammer was formally correct, and instead tries to detect anything
that any MUA might "make clickable" when displaying messages to users.

An effect of that is some messages hitting *_URI_* rules that in principle
include no URIs in their displayed bodies and in most cases do not make what SA
has detected clickable. If this was actually causing scores greater than 5 (or
really, anywhere near) on real-world messages it would be important to fix. I
am not convinced that this report includes any evidence of that.

--
You are receiving this mail because:
You are the assignee for the bug.