Mailing List Archive

[Bug 8206] uri_list_canonicalize adds more domains then it should
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8206

Kris Deugau <kdeugau@vianet.ca> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |kdeugau@vianet.ca

--- Comment #2 from Kris Deugau <kdeugau@vianet.ca> ---
I don't have any specific examples right at hand, but I've posted on the users
list about essentially this issue in April last year with another specific
case. See https://lists.apache.org/thread/gf3kyq2y3j1v1lj37g5tpngmk82wgmcz. I
don't recall if any patches were committed as a result of that thread.

Looking at your patch, I think this is too narrow (even if only because it
omits .png, .webp, and who knows what other image types some sender might use)
and far too late in the process to fix the root cause. There are a long list of
other HTML elements that get filed in the "URI" bin, that can trigger this
problem. I think to properly solve it, potential URIs from HTML elements need
to be more tightly preprocessed (and discarded) ahead of the rest of the
canonicalization process.

I have docucomments in my local configuration with this:

dbg: uri: canonicalizing html uri: none
dbg: uri: cleaned uri: http://www.none.com
dbg: uri: added host: www.none.com domain: none.com
dbg: uri: cleaned uri: none
dbg: uri: cleaned uri: http://none

(likely from that particular case I posted about)

and:

dbg: uri: canonicalizing html uri: assets/css/styles.css
dbg: uri: cleaned uri: http://www.assets.com/css/styles.css

along with matching uridnsbl_skip_domain entries for

none.com
assets.com
(I also have "none" listed, but that doesn't seem to work to suppress the
entry)

and

background.com
www.com

which latter two I don't have debug detail recorded but which both originated
in essentially the same source - HTML/CSS elements (not content/text!) that
specify a relative URI in some context or form. None of these were in text
that a mail program *would* often turn into a clickable link.

--
You are receiving this mail because:
You are the assignee for the bug.