https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8206
Kris Deugau <kdeugau@vianet.ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kdeugau@vianet.ca
--- Comment #2 from Kris Deugau <kdeugau@vianet.ca> ---
I don't have any specific examples right at hand, but I've posted on the users
list about essentially this issue in April last year with another specific
case. See https://lists.apache.org/thread/gf3kyq2y3j1v1lj37g5tpngmk82wgmcz. I
don't recall if any patches were committed as a result of that thread.
Looking at your patch, I think this is too narrow (even if only because it
omits .png, .webp, and who knows what other image types some sender might use)
and far too late in the process to fix the root cause. There are a long list of
other HTML elements that get filed in the "URI" bin, that can trigger this
problem. I think to properly solve it, potential URIs from HTML elements need
to be more tightly preprocessed (and discarded) ahead of the rest of the
canonicalization process.
I have docucomments in my local configuration with this:
dbg: uri: canonicalizing html uri: none
dbg: uri: cleaned uri: http://www.none.com
dbg: uri: added host: www.none.com domain: none.com
dbg: uri: cleaned uri: none
dbg: uri: cleaned uri: http://none
(likely from that particular case I posted about)
and:
dbg: uri: canonicalizing html uri: assets/css/styles.css
dbg: uri: cleaned uri: http://www.assets.com/css/styles.css
along with matching uridnsbl_skip_domain entries for
none.com
assets.com
(I also have "none" listed, but that doesn't seem to work to suppress the
entry)
and
background.com
www.com
which latter two I don't have debug detail recorded but which both originated
in essentially the same source - HTML/CSS elements (not content/text!) that
specify a relative URI in some context or form. None of these were in text
that a mail program *would* often turn into a clickable link.
--
You are receiving this mail because:
You are the assignee for the bug.
Kris Deugau <kdeugau@vianet.ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kdeugau@vianet.ca
--- Comment #2 from Kris Deugau <kdeugau@vianet.ca> ---
I don't have any specific examples right at hand, but I've posted on the users
list about essentially this issue in April last year with another specific
case. See https://lists.apache.org/thread/gf3kyq2y3j1v1lj37g5tpngmk82wgmcz. I
don't recall if any patches were committed as a result of that thread.
Looking at your patch, I think this is too narrow (even if only because it
omits .png, .webp, and who knows what other image types some sender might use)
and far too late in the process to fix the root cause. There are a long list of
other HTML elements that get filed in the "URI" bin, that can trigger this
problem. I think to properly solve it, potential URIs from HTML elements need
to be more tightly preprocessed (and discarded) ahead of the rest of the
canonicalization process.
I have docucomments in my local configuration with this:
dbg: uri: canonicalizing html uri: none
dbg: uri: cleaned uri: http://www.none.com
dbg: uri: added host: www.none.com domain: none.com
dbg: uri: cleaned uri: none
dbg: uri: cleaned uri: http://none
(likely from that particular case I posted about)
and:
dbg: uri: canonicalizing html uri: assets/css/styles.css
dbg: uri: cleaned uri: http://www.assets.com/css/styles.css
along with matching uridnsbl_skip_domain entries for
none.com
assets.com
(I also have "none" listed, but that doesn't seem to work to suppress the
entry)
and
background.com
www.com
which latter two I don't have debug detail recorded but which both originated
in essentially the same source - HTML/CSS elements (not content/text!) that
specify a relative URI in some context or form. None of these were in text
that a mail program *would* often turn into a clickable link.
--
You are receiving this mail because:
You are the assignee for the bug.