Mailing List Archive

Seeing "check: exceeded time limit in ..." and need to resolve it
Hi,

I got an email from netdev@vger.kernel.org that was a lengthy (422K) regression test report from a patch someone had submitted.

I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < netdev.eml" and saw:

...
Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC ======> got hit (1)
...
Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ======> got hit (1)
Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping further tests
...

Am I correct that __ANY_TEXT_ATTACH alone took 4:30s? Looking at the rule, I don't understand why it's taking so long... unless that's not the smoking gun. Could there be rules that *aren't* matching but are taking a while?

72_active.cf: mimeheader __ANY_TEXT_ATTACH Content-Type =~ /text\/\w+/i

And how do I dig into why I'm getting that last message?

I can't even find type11_pri0_set1 as a string in /usr/share/perl5/vendor_perl/Mail/SpamAssassin/

Also, why are there multiple runs of:

Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ======> got hit: "e"


Should this be capped to a maximum number of matches the way __HIGHBITS is?

And I'm not sure I want messages that haven't been fully scanned being delivered. Should I crank TIME_LIMIT_EXCEEDED to 20.0?

Thanks,

-Philip
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Fri, 12 Nov 2021, Philip Prindeville wrote:

> I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < netdev.eml" and saw:
>
> ...
> Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC ======> got hit (1)
> ...
> Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ======> got hit (1)
> Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping further tests
> ...
>
> Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?

"ran ... got hit" is past tense. And it needs to complete the rule to know
whether it got a hit.

11:45:38.048 -> 11:45:38.063 = less than 20 msec.

The next rule, whatever that was, is the one that timed out after 4m20s.

> Could there be rules that *aren't* matching but are taking a while?

It's timing out on a rule that's running away. The timeout triggers before
"hit/no hit" is known.

What would be helpful here would be logging of when a rule *starts*
evaluation. Normally that would be painful, but for tracking a runaway it
would be useful. Perhaps I can code up something to capture that and log
it on a timeout...

If you want to send me that message zipped up I can try it here with those
changes and see if it's a base rule running away.



--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The most glaring example of the cognitive dissonance on the left
is the concept that human beings are inherently good, yet at the
same time cannot be trusted with any kind of weapon, unless the
magic fairy dust of government authority gets sprinkled upon them.
-- Moshe Ben-David
-----------------------------------------------------------------------
531 days since the first private commercial manned orbital mission (SpaceX)
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
>
> What would be helpful here would be logging of when a rule *starts*
> evaluation. Normally that would be painful, but for tracking a runaway it
> would be useful. Perhaps I can code up something to capture that and log it
> on a timeout...

It already exists

spamassassin -D all,rules-all < msg
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
> What would be helpful here would be logging of when a rule *starts*
> evaluation. Normally that would be painful, but for tracking a runaway it
> would be useful. Perhaps I can code up something to capture that and log
> it on a timeout...

Actually what sounds like it would be useful would be knowing the name of
the rule that timed out. I'm presuming when the timeout occurs that there is
still some indication of the current rule being processed so it can be
killed. I'd think that should be enough to backtrack to the rule name. A
modification to the timeout message could display the name of the rule and
even how long it took to that point.

I guess there might be multiple rules running when the timeout occurs and
not know which one really timed out, but that would still be a small number
of rule names.

Loren
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Sat, 13 Nov 2021, Henrik K wrote:

> On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
>>
>> What would be helpful here would be logging of when a rule *starts*
>> evaluation. Normally that would be painful, but for tracking a runaway it
>> would be useful. Perhaps I can code up something to capture that and log it
>> on a timeout...
>
> It already exists
>
> spamassassin -D all,rules-all < msg

Ugh, yeah, I remember doing that now.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Running away is the coward's way out of a war;
appeasement is the coward's way into a war. -- Thorax
-----------------------------------------------------------------------
532 days since the first private commercial manned orbital mission (SpaceX)
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Sat, 13 Nov 2021, Loren Wilton wrote:

>> What would be helpful here would be logging of when a rule *starts*
>> evaluation. Normally that would be painful, but for tracking a runaway it
>> would be useful. Perhaps I can code up something to capture that and log it
>> on a timeout...
>
> Actually what sounds like it would be useful would be knowing the name of the
> rule that timed out. I'm presuming when the timeout occurs that there is
> still some indication of the current rule being processed so it can be
> killed. I'd think that should be enough to backtrack to the rule name. A
> modification to the timeout message could display the name of the rule and
> even how long it took to that point.

That's what I was thinking when I said "capture and log".

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Running away is the coward's way out of a war;
appeasement is the coward's way into a war. -- Thorax
-----------------------------------------------------------------------
532 days since the first private commercial manned orbital mission (SpaceX)
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
> On Nov 12, 2021, at 8:49 PM, John Hardin <jhardin@impsec.org> wrote:
>
> On Fri, 12 Nov 2021, Philip Prindeville wrote:
>
>> I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < netdev.eml" and saw:
>>
>> ...
>> Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC ======> got hit (1)
>> ...
>> Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ======> got hit (1)
>> Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping further tests
>> ...
>>
>> Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?
>
> "ran ... got hit" is past tense. And it needs to complete the rule to know whether it got a hit.
>
> 11:45:38.048 -> 11:45:38.063 = less than 20 msec.
>
> The next rule, whatever that was, is the one that timed out after 4m20s.


Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.

Why can't I even find the rule?


>
>> Could there be rules that *aren't* matching but are taking a while?
>
> It's timing out on a rule that's running away. The timeout triggers before "hit/no hit" is known.
>
> What would be helpful here would be logging of when a rule *starts* evaluation. Normally that would be painful, but for tracking a runaway it would be useful. Perhaps I can code up something to capture that and log it on a timeout...


Whenever a rule gets started, you could save the name and start time, and then burp that during timeout handling, right?


>
> If you want to send me that message zipped up I can try it here with those changes and see if it's a base rule running away.
>


Sent out-of-band.

Doh. Forgot to zip it.
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
> On Nov 12, 2021, at 10:35 PM, Henrik K <hege@hege.li> wrote:
>
> On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
>>
>> What would be helpful here would be logging of when a rule *starts*
>> evaluation. Normally that would be painful, but for tracking a runaway it
>> would be useful. Perhaps I can code up something to capture that and log it
>> on a timeout...
>
> It already exists
>
> spamassassin -D all,rules-all < msg
>


Ah, that was useful.

Seeing:

...
Nov 15 16:09:40.479 [54834] dbg: check: uri_block_cidr list 54.69.70.160-54.69.70.160 65.181.64.0-65.181.127.255 97.74.42.81-97.74.42.81 98.124.199.1-98.124.199.1 167.114.67.88-167.114.67.88 173.45.101.58-173.45.101.58 185.53.128.0-185.53.131.255 192.31.186.4-192.31.186.4 192.99.0.0-192.99.255.255 194.213.125.57-194.213.125.57
Nov 15 16:11:50.610 [54834] dbg: check: uri_local_bl http.sh addrs
...
Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.193 . dns:A:100.136.134.134.psbl.surriel.com
Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.199 . dns:A:100.136.134.134.dnsbl.sorbs.net
Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X A:http.sh
Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X NS:http.sh
...


Why would resolving http.sh take this long? And can we bring down the timeout?

Hard to imagine DNS requests taking more than a couple of seconds.

-Philip
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
Philip Prindeville <philipp_subx@redfish-solutions.com> writes:

> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>
> Why can't I even find the rule?

That looks very familiar. I was having timeouts, and saw that in the
logs, on certain messages. I ended up nuking and rebuilding my TXREP
database and then things were ok.

That doesn't explain why we can't find the rule, which is a good
question.
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
> On Nov 15, 2021, at 5:06 PM, Greg Troxel <gdt@lexort.com> wrote:
>
>
> Philip Prindeville <philipp_subx@redfish-solutions.com> writes:
>
>> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>>
>> Why can't I even find the rule?
>
> That looks very familiar. I was having timeouts, and saw that in the
> logs, on certain messages. I ended up nuking and rebuilding my TXREP
> database and then things were ok.
>
> That doesn't explain why we can't find the rule, which is a good
> question.
>


Where is the TXREP database?

Also, is it possible that the name is generated through some sort of mangling, the way that C function names can be generated from macro expansions, etc?
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
Philip Prindeville <philipp_subx@redfish-solutions.com> writes:

>> That looks very familiar. I was having timeouts, and saw that in the
>> logs, on certain messages. I ended up nuking and rebuilding my TXREP
>> database and then things were ok.
>>
>> That doesn't explain why we can't find the rule, which is a good
>> question.
>
> Where is the TXREP database?

For me, it's in:

$HOME/.spamassassin/tx-reputation

> Also, is it possible that the name is generated through some sort of
> mangling, the way that C function names can be generated from macro
> expansions, etc?

Almost anything is possible...
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On 2021-11-15 at 18:08:20 UTC-0500 (Mon, 15 Nov 2021 16:08:20 -0700)
Philip Prindeville <philipp_subx@redfish-solutions.com>
is rumored to have said:

>> On Nov 12, 2021, at 8:49 PM, John Hardin <jhardin@impsec.org> wrote:
>>
>> On Fri, 12 Nov 2021, Philip Prindeville wrote:
>>
>>> I got the message, saved it to a flat file, and ran "spamassassin -t
>>> -D rules < netdev.eml" and saw:
>>>
>>> ...
>>> Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule
>>> __ANY_TEXT_ATTACH_DOC ======> got hit (1)
>>> ...
>>> Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule
>>> __ANY_TEXT_ATTACH ======> got hit (1)
>>> Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in
>>> Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1,
>>> skipping further tests
>>> ...
>>>
>>> Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?
>>
>> "ran ... got hit" is past tense. And it needs to complete the rule to
>> know whether it got a hit.
>>
>> 11:45:38.048 -> 11:45:38.063 = less than 20 msec.
>>
>> The next rule, whatever that was, is the one that timed out after
>> 4m20s.
>
>
> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>
> Why can't I even find the rule?

That "rule" is actually a subroutine that is assembled and named at
runtime in M:SA:Check from set 1 (probably the only set) of the "body
eval" (type 11) rules running at priority 0.

Which suggests a tough bit of troubleshooting.

>>> Could there be rules that *aren't* matching but are taking a while?
>>
>> It's timing out on a rule that's running away. The timeout triggers
>> before "hit/no hit" is known.
>>
>> What would be helpful here would be logging of when a rule *starts*
>> evaluation. Normally that would be painful, but for tracking a
>> runaway it would be useful. Perhaps I can code up something to
>> capture that and log it on a timeout...
>
>
> Whenever a rule gets started, you could save the name and start time,
> and then burp that during timeout handling, right?

I like that idea. I have no idea how feasible it is.




--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Mon, Nov 15, 2021 at 04:25:55PM -0700, Philip Prindeville wrote:
>
>
> > On Nov 12, 2021, at 10:35 PM, Henrik K <hege@hege.li> wrote:
> >
> > On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
> >>
> >> What would be helpful here would be logging of when a rule *starts*
> >> evaluation. Normally that would be painful, but for tracking a runaway it
> >> would be useful. Perhaps I can code up something to capture that and log it
> >> on a timeout...
> >
> > It already exists
> >
> > spamassassin -D all,rules-all < msg
> >
>
>
> Ah, that was useful.
>
> Seeing:
>
> ...
> Nov 15 16:09:40.479 [54834] dbg: check: uri_block_cidr list 54.69.70.160-54.69.70.160 65.181.64.0-65.181.127.255 97.74.42.81-97.74.42.81 98.124.199.1-98.124.199.1 167.114.67.88-167.114.67.88 173.45.101.58-173.45.101.58 185.53.128.0-185.53.131.255 192.31.186.4-192.31.186.4 192.99.0.0-192.99.255.255 194.213.125.57-194.213.125.57
> Nov 15 16:11:50.610 [54834] dbg: check: uri_local_bl http.sh addrs
> ...
> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.193 . dns:A:100.136.134.134.psbl.surriel.com
> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.199 . dns:A:100.136.134.134.dnsbl.sorbs.net
> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X A:http.sh
> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X NS:http.sh
> ...
>
>
> Why would resolving http.sh take this long? And can we bring down the timeout?
>
> Hard to imagine DNS requests taking more than a couple of seconds.
>
> -Philip

SA 3.4 has old version of URILocalBL without async lookups, probably that's
why. Not sure why it would take minutes though, unless your system DNS is
configured strangely and/or is not caching..
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Mon, 2021-11-15 at 17:12 -0700, Philip Prindeville wrote:
>
>
> > On Nov 15, 2021, at 5:06 PM, Greg Troxel <gdt@lexort.com> wrote:
> >
> >
> > Philip Prindeville <philipp_subx@redfish-solutions.com> writes:
> >
> > > Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
> > >
> > > Why can't I even find the rule?
> >
try "locate txrep"

On my Fedora system 'locate' says that TxRep is a plugin, enabled by
installing: /usr/share/spamassassin/60_txrep.cf

and, using "locate" again, that the plugin's manpage is
/usr/share/man/man3/Mail::SpamAssassin::Plugin::TxRep.3pm.gz

So, "man 3 Mail::SpamAssassin::Plugin::TxRep" describes the TxRep plugin
and 'locate' says it is installed as
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm

Of course, other Linux distros may put it somewhere else, so use
'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
again.

Not trying to teach you to suck eggs, but, incredible as it may sound,
there are still some people who don't know about the 'locate' command.

Martin

>
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On 2021-11-16 at 05:30:25 UTC-0500 (Tue, 16 Nov 2021 10:30:25 +0000)
Martin Gregorie <martin@gregorie.org>
is rumored to have said:

> Of course, other Linux distros may put it somewhere else, so use
> 'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
> again.
>
> Not trying to teach you to suck eggs, but, incredible as it may sound,
> there are still some people who don't know about the 'locate' command.

Worth noting: locate & updatedb aren't always installed.


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Tue, Nov 16, 2021 at 08:33:56AM -0500, Bill Cole wrote:
> On 2021-11-16 at 05:30:25 UTC-0500 (Tue, 16 Nov 2021 10:30:25 +0000)
> Martin Gregorie <martin@gregorie.org>
> is rumored to have said:
>
> > Of course, other Linux distros may put it somewhere else, so use
> > 'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
> > again.
> >
> > Not trying to teach you to suck eggs, but, incredible as it may sound,
> > there are still some people who don't know about the 'locate' command.
>
> Worth noting: locate & updatedb aren't always installed.

Offtopic, but there's even very little reason to install it, it just eats
resources, not to mention one more point of vulnerability. If you need to
search something, "find /" takes some seconds on any remotely modern system
and always returns fresh results.
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Tue, 2021-11-16 at 08:33 -0500, Bill Cole wrote:
>
> Worth noting: locate & updatedb aren't always installed.
>
Fair comment: they're a standard part of Fedora. IIRC they are also part
of the RaspberryPi OS distro, so are likely to be included in Debian and
most of its clones.

But: how many "intro to Linux" books mention 'updatedb' or 'locate'?
They're not mentioned in 'UNIX in a Nutshell' (so maybe not in 'Linux in
a Nutshell' either) or in the 'RaspberryPi User Guide', though they are
in Michael Kopfler's 1999 Linux sysadmin book.

You don't see them mentioned much in newsgroups either, so I'm left
wondering how many Linux users who either taught themselves to use Linux
or picked it up at work from a colleague would ever know about 'locate'
and 'updatedb'?  

For that matter how many know about 'apropos'? And, even if they do,
they may not discover 'locate' because 'apropos search' doesn't find
either 'updatedb' or 'locate'. You have to enter 'apropos find' to
discover that 'locate' exists, and even then you could get side tracked
into trying to use the much more complex 'find' utility.

Martin
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
> For that matter how many know about 'apropos'? And, even if they do,
> they may not discover 'locate' because 'apropos search' doesn't find
> either 'updatedb' or 'locate'. You have to enter 'apropos find' to
> discover that 'locate' exists, and even then you could get side tracked
> into trying to use the much more complex 'find' utility.

Or the old-school rpm:

$ rpm -ql spamassassin|grep TxRep
/usr/share/man/man3/Mail::SpamAssassin::Plugin::TxRep.3pm.gz
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm

Dave

>
> Martin
>
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Mon, 15 Nov 2021, Philip Prindeville wrote:

>
>
>> On Nov 12, 2021, at 8:49 PM, John Hardin <jhardin@impsec.org> wrote:
>>
>> On Fri, 12 Nov 2021, Philip Prindeville wrote:
>>
>>> I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < netdev.eml" and saw:
>>>
>>> ...
>>> Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC ======> got hit (1)
>>> ...
>>> Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ======> got hit (1)
>>> Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping further tests
>>> ...
>>>
>>> Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?
>>
>> "ran ... got hit" is past tense. And it needs to complete the rule to know whether it got a hit.
>>
>> 11:45:38.048 -> 11:45:38.063 = less than 20 msec.
>>
>> The next rule, whatever that was, is the one that timed out after 4m20s.
>
>
> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>
> Why can't I even find the rule?

Run it with "-D rules,rules-all" and it should list each rule as it
starts executing.

>
>
>>
>>> Could there be rules that *aren't* matching but are taking a while?
>>
>> It's timing out on a rule that's running away. The timeout triggers before "hit/no hit" is known.
>>
>> What would be helpful here would be logging of when a rule *starts* evaluation. Normally that would be painful, but for tracking a runaway it would be useful. Perhaps I can code up something to capture that and log it on a timeout...
>
>
> Whenever a rule gets started, you could save the name and start time, and then burp that during timeout handling, right?

The rule name at least.

>> If you want to send me that message zipped up I can try it here with those changes and see if it's a base rule running away.
>
> Sent out-of-band.
>
> Doh. Forgot to zip it.

I'll be happy to take a look, but running with rules-all at your end would
be faster... I forgot I'd already added that. But I will still take a look
at capturing the rule name for the timeout message.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Never forget, even for an instant, that the one and only reason
anyone has for taking your gun away is to make you weaker than
he is, so he can do something to you that you wouldn?t let him
do if you were equipped to prevent it. This goes for burglars,
muggers, and rapists, and even more so for policemen,
bureaucrats, and politicians. -- Alexander Pope
-----------------------------------------------------------------------
535 days since the first private commercial manned orbital mission (SpaceX)
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
Replies... some duplication of conversation on "mimedefang".



> On Nov 15, 2021, at 10:34 PM, Bill Cole <sausers-20150205@billmail.scconsult.com> wrote:
>
> On 2021-11-15 at 18:08:20 UTC-0500 (Mon, 15 Nov 2021 16:08:20 -0700)
> Philip Prindeville <philipp_subx@redfish-solutions.com>
> is rumored to have said:
>
>>> On Nov 12, 2021, at 8:49 PM, John Hardin <jhardin@impsec.org> wrote:
>>>
>>> On Fri, 12 Nov 2021, Philip Prindeville wrote:
>>>
>>>> I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < netdev.eml" and saw:
>>>>
>>>> ...
>>>> Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC ======> got hit (1)
>>>> ...
>>>> Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ======> got hit (1)
>>>> Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping further tests
>>>> ...
>>>>
>>>> Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?
>>>
>>> "ran ... got hit" is past tense. And it needs to complete the rule to know whether it got a hit.
>>>
>>> 11:45:38.048 -> 11:45:38.063 = less than 20 msec.
>>>
>>> The next rule, whatever that was, is the one that timed out after 4m20s.
>>
>>
>> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>>
>> Why can't I even find the rule?
>
> That "rule" is actually a subroutine that is assembled and named at runtime in M:SA:Check from set 1 (probably the only set) of the "body eval" (type 11) rules running at priority 0.


Yeah, if we can get the source file and line # that's almost as good as a function name (since there isn't one, if it's an anonymous sub block).


>
> Which suggests a tough bit of troubleshooting.
>
>>>> Could there be rules that *aren't* matching but are taking a while?
>>>
>>> It's timing out on a rule that's running away. The timeout triggers before "hit/no hit" is known.
>>>
>>> What would be helpful here would be logging of when a rule *starts* evaluation. Normally that would be painful, but for tracking a runaway it would be useful. Perhaps I can code up something to capture that and log it on a timeout...
>>
>>
>> Whenever a rule gets started, you could save the name and start time, and then burp that during timeout handling, right?
>
> I like that idea. I have no idea how feasible it is.


Me neither. I use Perl less and less everyday, and my fu is fading almost as fast as new features are coming in.

-Philip
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
> On Nov 15, 2021, at 11:12 PM, Henrik K <hege@hege.li> wrote:
>
> On Mon, Nov 15, 2021 at 04:25:55PM -0700, Philip Prindeville wrote:
>>
>>
>>> On Nov 12, 2021, at 10:35 PM, Henrik K <hege@hege.li> wrote:
>>>
>>> On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
>>>>
>>>> What would be helpful here would be logging of when a rule *starts*
>>>> evaluation. Normally that would be painful, but for tracking a runaway it
>>>> would be useful. Perhaps I can code up something to capture that and log it
>>>> on a timeout...
>>>
>>> It already exists
>>>
>>> spamassassin -D all,rules-all < msg
>>>
>>
>>
>> Ah, that was useful.
>>
>> Seeing:
>>
>> ...
>> Nov 15 16:09:40.479 [54834] dbg: check: uri_block_cidr list 54.69.70.160-54.69.70.160 65.181.64.0-65.181.127.255 97.74.42.81-97.74.42.81 98.124.199.1-98.124.199.1 167.114.67.88-167.114.67.88 173.45.101.58-173.45.101.58 185.53.128.0-185.53.131.255 192.31.186.4-192.31.186.4 192.99.0.0-192.99.255.255 194.213.125.57-194.213.125.57
>> Nov 15 16:11:50.610 [54834] dbg: check: uri_local_bl http.sh addrs
>> ...
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.193 . dns:A:100.136.134.134.psbl.surriel.com
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.199 . dns:A:100.136.134.134.dnsbl.sorbs.net
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X A:http.sh
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X NS:http.sh
>> ...
>>
>>
>> Why would resolving http.sh take this long? And can we bring down the timeout?
>>
>> Hard to imagine DNS requests taking more than a couple of seconds.
>>
>> -Philip
>
> SA 3.4 has old version of URILocalBL without async lookups, probably that's
> why. Not sure why it would take minutes though, unless your system DNS is
> configured strangely and/or is not caching..
>



Or http.sh points to an NS that's offline...

Can the async lookup be back-ported?
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
> On Nov 16, 2021, at 3:30 AM, Martin Gregorie <martin@gregorie.org> wrote:
>
> On Mon, 2021-11-15 at 17:12 -0700, Philip Prindeville wrote:
>>
>>
>>> On Nov 15, 2021, at 5:06 PM, Greg Troxel <gdt@lexort.com> wrote:
>>>
>>>
>>> Philip Prindeville <philipp_subx@redfish-solutions.com> writes:
>>>
>>>> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>>>>
>>>> Why can't I even find the rule?
>>>
> try "locate txrep"
>
> On my Fedora system 'locate' says that TxRep is a plugin, enabled by
> installing: /usr/share/spamassassin/60_txrep.cf
>
> and, using "locate" again, that the plugin's manpage is
> /usr/share/man/man3/Mail::SpamAssassin::Plugin::TxRep.3pm.gz
>
> So, "man 3 Mail::SpamAssassin::Plugin::TxRep" describes the TxRep plugin
> and 'locate' says it is installed as
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm
>
> Of course, other Linux distros may put it somewhere else, so use
> 'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
> again.
>
> Not trying to teach you to suck eggs, but, incredible as it may sound,
> there are still some people who don't know about the 'locate' command.
>
> Martin
>



I know how to find the code modules. I wasn't sure where the database was kept. And I invoke SA from within Mimedefang, not via procmail on individual delivery, so only global rules get run. Nothing is per-user.

-Philip
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
On Tue, Nov 16, 2021 at 01:08:16PM -0700, Philip Prindeville wrote:
>
> Or http.sh points to an NS that's offline...

Your resolver shoukd time out _way_ sooner than some minutes.

> Can the async lookup be back-ported?

No, and there will be no new 3.4 releases.
Re: Seeing "check: exceeded time limit in ..." and need to resolve it [ In reply to ]
> On Nov 16, 2021, at 8:03 PM, Henrik K <hege@hege.li> wrote:
>
> On Tue, Nov 16, 2021 at 01:08:16PM -0700, Philip Prindeville wrote:
>>
>> Or http.sh points to an NS that's offline...
>
> Your resolver shoukd time out _way_ sooner than some minutes.
>
>> Can the async lookup be back-ported?
>
> No, and there will be no new 3.4 releases.
>


Yeah, I still need to figure that out...

When I run "dig -t any http.sh" it times out after a few seconds. But SpamAssassin is doing something very different. Not sure why.

In any case, the workaround seems to be:

uri_block_exclude __L_BLOCK_ISP ... http.sh shlom.in


Where not resolving these last two domains makes the timeouts go away. Note that the pathology is the same in both cases:

philipp@macbook3 ~ % dig @8.8.8.8 -tns shlom.in.

; <<>> DiG 9.10.6 <<>> @8.8.8.8 -tns shlom.in.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38665
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;shlom.in. IN NS

;; ANSWER SECTION:
shlom.in. 300 IN NS ns1gmz.name.com.
shlom.in. 300 IN NS ns2jrt.name.com.
shlom.in. 300 IN NS ns3qtx.name.com.
shlom.in. 300 IN NS ns4blx.name.com.

;; Query time: 84 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Dec 26 15:25:44 MST 2021
;; MSG SIZE rcvd: 129

philipp@macbook3 ~ %
philipp@macbook3 ~ % dig @8.8.8.8 -tns http.sh.

; <<>> DiG 9.10.6 <<>> @8.8.8.8 -tns http.sh.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10013
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;http.sh. IN NS

;; ANSWER SECTION:
http.sh. 60 IN CNAME park.io.
park.io. 14797 IN NS ns-1348.awsdns-40.org.
park.io. 14797 IN NS ns-1624.awsdns-11.co.uk.
park.io. 14797 IN NS ns-441.awsdns-55.com.
park.io. 14797 IN NS ns-672.awsdns-20.net.

;; Query time: 245 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Dec 26 15:25:03 MST 2021
;; MSG SIZE rcvd: 197

philipp@macbook3 ~ %


Seems a little broken that the NS records aren't accompanied by 'A' glue records, but that's not catastrophic... normally a 2nd query would be done.

Should the resolver code in SpamAssassin be more robust when it comes to such failures?


philipp@macbook3 ~ % dig -ta ns-1348.awsdns-40.org.

; <<>> DiG 9.10.6 <<>> -ta ns-1348.awsdns-40.org.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37011
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;ns-1348.awsdns-40.org. IN A

;; ANSWER SECTION:
ns-1348.awsdns-40.org. 78740 IN A 205.251.197.68

;; Query time: 51 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sun Dec 26 15:27:16 MST 2021
;; MSG SIZE rcvd: 66

philipp@macbook3 ~ % dig @205.251.197.68 -ta http.sh

; <<>> DiG 9.10.6 <<>> @205.251.197.68 -ta http.sh
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 28411
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;http.sh. IN A

;; Query time: 67 msec
;; SERVER: 205.251.197.68#53(205.251.197.68)
;; WHEN: Sun Dec 26 15:27:32 MST 2021
;; MSG SIZE rcvd: 25

philipp@macbook3 ~ %


I'm not exactly sure what's falling down or why.

Is there anyone with more BIND-fu than me that's willing to venture a guess?

-Philip