I'm trying to come up with a way to detect bogus end tags, and so far I'm
not having much luck.
What I'm specifically trying to catch are things like
</table>
</belch></huntsville></delusion></wilma></boswell></attune>
</vasectomy></centum></surf></yeasty></molt></autocollimate>
</acrobat></harvest></gage></flagrant></fumble></nowadays>
</BODY>
</HTML>
Now, it looks like there is an html_tag_balance eval that would catch the
fact that there is no "<belch" to match the "</belch>" in the above hunk of
spam, if only there were some way that I could feed "belch" into the eval.
I can detect end tags eash enough with a regexp, but I can't find any way
that works to pull the found tag out and feed it to the eval routine within
an SA rule definition.
Alternately, is there a way to write a regexp that will let me look backward
for <belch once I have found </belch>? I can't seem to figure this one out
either.
Thanks,
Loren
not having much luck.
What I'm specifically trying to catch are things like
</table>
</belch></huntsville></delusion></wilma></boswell></attune>
</vasectomy></centum></surf></yeasty></molt></autocollimate>
</acrobat></harvest></gage></flagrant></fumble></nowadays>
</BODY>
</HTML>
Now, it looks like there is an html_tag_balance eval that would catch the
fact that there is no "<belch" to match the "</belch>" in the above hunk of
spam, if only there were some way that I could feed "belch" into the eval.
I can detect end tags eash enough with a regexp, but I can't find any way
that works to pull the found tag out and feed it to the eval routine within
an SA rule definition.
Alternately, is there a way to write a regexp that will let me look backward
for <belch once I have found </belch>? I can't seem to figure this one out
either.
Thanks,
Loren