Mailing List Archive

[Bug 3191] New: Word boundaries are lost after HTML processing
http://bugzilla.spamassassin.org/show_bug.cgi?id=3191

Summary: Word boundaries are lost after HTML processing
Product: Spamassassin
Version: 2.63
Platform: All
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P5
Component: Rules
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: michaelb@opentext.com


This one's tricky. :)

HTML-escaped characters cause the word boundaries to be lost after the HTML processor
unescapes them. Hard to describe, easy to show. :)

The attached samples message matches:
body ACCZZAGRA /\bZZagrá/i
but not
body ACCZZAGRA /\bZZagrá\b/i

Seems as though the word boundary (\b) is lost after translation.
Putting the accented character in the middle has no effect:
body ACCZZALIS /\bZZális/i
and
body ACCZZALIS /\bZZális\b/i
both work properly.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.