Mailing List Archive: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready

[Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready

bugzilla-daemon at spamassassin

May 7, 2022, 10:25 AM

Post #1 of 13 (428 views)

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987

Michael Storz <sa-mst@lrz.de> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |sa-mst@lrz.de

--- Comment #1 from Michael Storz <sa-mst@lrz.de> ---
Created attachment 5779
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5779&action=edit
deletion of rule_pending, rule_ready

--
You are receiving this mail because:
You are the assignee for the bug.

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

Michael.Storz at lrz

May 8, 2022, 8:33 AM

Post #2 of 13 (428 views)

Am 2022-05-08 06:43, schrieb bugzilla-daemon@spamassassin.apache.org:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
>
> --- Comment #4 from Henrik Krohns <apache@hege.li> ---
> I mostly agree with everything, great to have extra eyeballs. Can you
> comment
> if my previous list comment and Revision 1900667 additions cleared any
> things
> up for you and changes anything?
>

Unfortunately not, I'm still struggling to understand the various
aspects of processing. However, I understand, that a call of rule_ready
for a sync rule makes sense. It does not need a call to rule_pending in
this case.

Example for a problem: a rule_pending must be followed by a rule_ready
or a got_hit. But how a got_hit should work instead of a rule_ready is a
mystery to me. In sub do_meta_tests the query whether a rule dependency
exists is defined as:

next RULE unless exists $h->{$deprule} && !$tp->{$deprule} &&
!$pl{$deprule};

(condition rearranged by me to make it easier to understand). If we have
an asynchronous rule $deprule other than a DNS lookup, a call to
rule_pending makes an entry in $tp. If the rule is true, the call to
got_hit makes an entry in $h, but it does not delete the entry in $tp
unlike rule_ready. Thus, the $deprule dependency is still present and
the rule cannot be evaluated. Or did I miss something?

Regards,
Michael

PS: Unfortunately, I do not have a running 4.0 system. Therefore I
cannot test it myself.

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

May 8, 2022, 9:13 AM

Post #3 of 13 (428 views)

On Sun, May 08, 2022 at 05:33:24PM +0200, Michael Storz wrote:
> Am 2022-05-08 06:43, schrieb bugzilla-daemon@spamassassin.apache.org:
> > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
> >
> > --- Comment #4 from Henrik Krohns <apache@hege.li> ---
> > I mostly agree with everything, great to have extra eyeballs. Can you
> > comment
> > if my previous list comment and Revision 1900667 additions cleared any
> > things
> > up for you and changes anything?
> >
>
> Unfortunately not, I'm still struggling to understand the various aspects of
> processing. However, I understand, that a call of rule_ready for a sync rule
> makes sense. It does not need a call to rule_pending in this case.
>
> Example for a problem: a rule_pending must be followed by a rule_ready or a
> got_hit. But how a got_hit should work instead of a rule_ready is a mystery
> to me. In sub do_meta_tests the query whether a rule dependency exists is
> defined as:

I noticed that got_hit doesn't do everything that it can. But I also found
many other cases that need fixing and better testing. Spent 8 hours today
trying different things and planning how to do things, trying to make bgsend
automatically mark things ready is hard, as it's so complex in itself..
this is going to take few days..

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

Michael.Storz at lrz

May 8, 2022, 10:03 AM

Post #4 of 13 (428 views)

Am 2022-05-08 18:13, schrieb Henrik K:
> On Sun, May 08, 2022 at 05:33:24PM +0200, Michael Storz wrote:
>> Am 2022-05-08 06:43, schrieb bugzilla-daemon@spamassassin.apache.org:
>> > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
>> >
>> > --- Comment #4 from Henrik Krohns <apache@hege.li> ---
>> > I mostly agree with everything, great to have extra eyeballs. Can you
>> > comment
>> > if my previous list comment and Revision 1900667 additions cleared any
>> > things
>> > up for you and changes anything?
>> >
>>
>> Unfortunately not, I'm still struggling to understand the various
>> aspects of
>> processing. However, I understand, that a call of rule_ready for a
>> sync rule
>> makes sense. It does not need a call to rule_pending in this case.
>>
>> Example for a problem: a rule_pending must be followed by a rule_ready
>> or a
>> got_hit. But how a got_hit should work instead of a rule_ready is a
>> mystery
>> to me. In sub do_meta_tests the query whether a rule dependency exists
>> is
>> defined as:
>
> I noticed that got_hit doesn't do everything that it can. But I also
> found
> many other cases that need fixing and better testing. Spent 8 hours
> today
> trying different things and planning how to do things, trying to make
> bgsend
> automatically mark things ready is hard, as it's so complex in itself..
> this is going to take few days..

Wow, maybe it would help when you try to explain what is going wrong? I
have made the experience that I have often already found the solution
when I tried to describe a problem to someone else, even though they
didn't have much of an idea about the problem.

The only other problem I found with bgsend_and_start_lookup is the
handling of end cases. When abort_remaining_lookups is called, it
triggers a callback for the lookups that are still waiting. They in turn
can register lookups with bgsend_and_start_lookup again. Therefore,
bgsend_and_start_lookup must check whether it is in the aborting state,
and instead of queuing the lookup, it should immediately abort the
lookup in the same way as abort_remaining_lookups. However, it should be
prepared to break a loop if the plugin misbehaves and schedules lookup
after lookup.

Michael

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

Michael.Storz at lrz

May 14, 2022, 10:15 AM

Post #5 of 13 (428 views)

Am 2022-05-13 08:37, schrieb bugzilla-daemon@spamassassin.apache.org:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
>
> Henrik Krohns <apache@hege.li> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> Status|NEW |RESOLVED
> Resolution|--- |FIXED
>
> --- Comment #9 from Henrik Krohns <apache@hege.li> ---
> Feel free to discuss here or on dev-list if something is still unclear.

First of all, I would like to thank Henrik for all the work he has done
on SpamAssassin. The further I get with my code review, the more I see
all the places where he has made minor or major changes that have made
the code faster, more readable and thus more maintainable.

Despite all these changes, however, I still see room for improvement. At
the moment I'm not very happy about how the do_meta_tests subroutine is
implemented. It seems to work now, but the query for the dependency of
the meta rules looks too complicated to me:

foreach my $r (@{$md->{$rulename}|[]}) {
next RULE unless exists $h->{$r} && !$tp->{$r} && !$pl{$r};
}

Instead of three dependencies, there should really only be one query at
this point for "is the rule ready or not". If we knew exactly the state
of a rule at each point in time, then it would also be possible to move
from a brute force algorithm to a deterministic algorithm, where a rule
that becomes ready automatically sets all meta rules to ready if that
rule was the last dependency analogous to the algorithm for tags.
do_meta_tests would then simply process a queue of ready meta rules. If
a new meta rule is ready it will be added to the end of the queue. If
the queue is empty, all rules of a priority class are processed.

The next possible step would be the implementation of short-circuit
behavior of && and || analogous to Perl itself. E.g. the rule

meta BITCOIN_SPAM_10 __BITCOIN_ID && ( HTML_IMAGE_ONLY_04 ||
HTML_IMAGE_ONLY_08 )

would immediately evaluate to false if __BITCOIN_ID is false.
HTML_IMAGE_ONLY_04 and HTML_IMAGE_ONLY_08 would then no longer need to
be evaluated.

Recently the question was asked why Check.pm is a plugin if it is not
optional. Check.pm is a plugin so that you can implement more than one
check plugin. Currently SpamAssassin still assumes that filtering
happens postqueue, where you can respond to the different wishes of
individual users. If you use SpamAssassin in a prequeue filtering, then
only the decision rejection or acceptance is possible for ALL
recipients. A consideration of different recipient wishes is not
possible.

Due to the possibility to switch between admin and user rules per
evaluation of an email, one accepts that the evaluation of the rules
must be rebuilt for each email. What we need instead is the possibility
to build the rules once at the start of the daemon and then use it to
evaluate any number of emails.

You can go one step further and first run a SpamAssassin instance with a
lightweight ruleset in prequeue mode to be able to decide as quickly as
possible whether to reject an email and then run another instance with a
heavyweight ruleset that makes a more precise distinction between
marking the email as spam or ham. The second instance can then also
evaluate user rules. There should be a feedback loop between the
instances so that the first instance can use the results of the second
instance, e.g. to quickly reject emails via a local blocklist using the
HashBL.pm plugin. Currently we use a feedback loop between the
SpamAssassin instance in prequeue mode and Postfix to (temporarily)
reject emails for 24 hours.

In any case, SpamAssassin should arrive in 2022 and offer a highly
optimized version of the analysis for MTAs that process millions of
emails per day.

These are my general remarks about the evaluation of rules. An email
with some minor cosmetic changes will follow.

Michael

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

May 14, 2022, 11:01 AM

Post #6 of 13 (428 views)

On Sat, May 14, 2022 at 07:15:45PM +0200, Michael Storz wrote:
>
> Despite all these changes, however, I still see room for improvement. At the
> moment I'm not very happy about how the do_meta_tests subroutine is
> implemented. It seems to work now, but the query for the dependency of the
> meta rules looks too complicated to me:
>
> foreach my $r (@{$md->{$rulename}|[]}) {
> next RULE unless exists $h->{$r} && !$tp->{$r} && !$pl{$r};
> }
>
> Instead of three dependencies, there should really only be one query at this
> point for "is the rule ready or not". If we knew exactly the state of a rule
> at each point in time, then it would also be possible to move from a brute
> force algorithm to a deterministic algorithm, where a rule that becomes
> ready automatically sets all meta rules to ready if that rule was the last
> dependency analogous to the algorithm for tags. do_meta_tests would then
> simply process a queue of ready meta rules. If a new meta rule is ready it
> will be added to the end of the queue. If the queue is empty, all rules of a
> priority class are processed.

I heartily agree with everything. The problem is that I don't have any
formal programming training or knowledge of fancy algorithms or
deterministic stuff. Frankly I'm not even interested in most of that stuff,
I just script and try to make things do stuff correctly with some common
sense and logic, would be happy if someone showed how to make it simpler.

Glad I was able to make it work with maybe 10% performance loss, but it's
balanced back by dumb metas not needing to wait for async stuff for no
reason etc.

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

Michael.Storz at lrz

May 15, 2022, 9:58 AM

Post #7 of 13 (428 views)

Am 2022-05-14 20:01, schrieb Henrik K:
> On Sat, May 14, 2022 at 07:15:45PM +0200, Michael Storz wrote:
>>
>> Despite all these changes, however, I still see room for improvement.
>> At the
>> moment I'm not very happy about how the do_meta_tests subroutine is
>> implemented. It seems to work now, but the query for the dependency of
>> the
>> meta rules looks too complicated to me:
>>
>> foreach my $r (@{$md->{$rulename}|[]}) {
>> next RULE unless exists $h->{$r} && !$tp->{$r} && !$pl{$r};
>> }
>>
>> Instead of three dependencies, there should really only be one query
>> at this
>> point for "is the rule ready or not". If we knew exactly the state of
>> a rule
>> at each point in time, then it would also be possible to move from a
>> brute
>> force algorithm to a deterministic algorithm, where a rule that
>> becomes
>> ready automatically sets all meta rules to ready if that rule was the
>> last
>> dependency analogous to the algorithm for tags. do_meta_tests would
>> then
>> simply process a queue of ready meta rules. If a new meta rule is
>> ready it
>> will be added to the end of the queue. If the queue is empty, all
>> rules of a
>> priority class are processed.
>
> I heartily agree with everything. The problem is that I don't have any
> formal programming training or knowledge of fancy algorithms or
> deterministic stuff. Frankly I'm not even interested in most of that
> stuff,
> I just script and try to make things do stuff correctly with some
> common
> sense and logic, would be happy if someone showed how to make it
> simpler.
>
> Glad I was able to make it work with maybe 10% performance loss, but
> it's
> balanced back by dumb metas not needing to wait for async stuff for no
> reason etc.

I'm glad you agree with my analysis. Then let's have one last try. I
think I found the logical error in the evaluation that makes everything
so complicated. The error already exists before 3.4.6.

Let's recap:

At the beginning of the evaluation, ALL rules are in the pending state
by definition. When rules are run, they change from state pending to
state finished. How is this state change accomplished?

- The normal case is the indirect change for the synchronous standard
rules. After they are run, they return either 0, which means it was a
miss, or a positive integer, which means it was a hit.

- Then there is the case of synchronous rules with explicit state
change. These rules call got_hit for a hit and should call got_miss if
it was a miss. got_miss is similar to rule_ready, it is just the
statement

$self->{tests_already_hit}->{$rule} ||= 0;

BTW, a better name for tests_already_hit would be
tests_already_finished or tests_finished.

- And then we have the asynchronous rules, which must be divided into
two parts: the synchronous part and the asynchronous part. The
synchronous part is the call of the eval function, which in turn submits
the asynchronous part to a queue function. At the end, the synchronous
part is supposed to return a result. And here comes the error: it
returns the value 0, which means that the rule is finished and the
result is a miss.

THIS IS WRONG.

The rule is not yet finished and the result is not yet known. Therefore,
it must NOT return 0, it must return undef. This means that it is still
pending (no state change) and the asynchronous part will decide if it is
a hit or a miss.

Now, let's test a correction:

- First, add a debug statement to the foreach loop of sub do_meta_tests
to see the values of the three conditions. The idea is to find out how
many and which rules depend not only on $h->{$deprule}, but also on
$tp->{$deprule} and $pl{$deprule}. If there are none left, we can
eliminate the two conditions.

The states of the three conditions are

result $h $tp $pl
OK 0 0 0
OK 0 1 0
OK 0 0 1
OK 0 1 1
OK 1 0 0
NO 1 1 0
NO 1 0 1
NO 1 1 1

NO is the state we do not want to have: there is a hit/the rule is
finished, but $tp and/or $pl are true which means the rule has not yet
finished.

# Meta is not ready if some dependency has not run yet
if (exists $h->{$deprule} && ($tp->{$deprule} || $pl{$deprule})) {
dbg("rules: NO: \$h exists, \tp=" . ($tp->{$deprule} ? 'true' :
'false') . "\$pl=" . ($pl->{$deprule} ? 'true' : 'false'));
} else {
dbg("rules: OK: \$h" . (exists $h->{$deprule} ? '' : ' not') . "
exists, \tp=" .
($tp->{$deprule} ? 'true' : 'false') . "\$pl=" .
($pl->{$deprule} ? 'true' : 'false'));
}
foreach my $deprule (@{$md->{$rulename}||[]}) {
if (!exists $h->{$deprule} || $tp->{$deprule} || $pl{$deprule}) {
next RULE;
}
}

Then run a test and see how many NO we get.

- Second, change sub run_eval_tests

in
# Make sure rule is marked ready for meta rules using $hitsptr
$evalstr .= '
if ($scoresptr->{q{'.$rulename.'}}) {
$hitsptr->{q{'.$rulename.'}} ||= 0;
$rulename = q#'.$rulename.'#;

delete $hitsptr->{q{'.$rulename.'}} ||= 0;

change

$evalstr .= '
if ($result) {
$self->got_hit($rulename, $prepend2desc, ruletype => "eval",
value => $result);
'.$dbgstr.'
}
}

to

$evalstr .= '
if (defined $result) {
if ($result) {
$self->got_hit($rulename, $prepend2desc, ruletype => "eval",
value => $result);
'.$dbgstr.'
} else { # got a miss, if not set by an explizit got_hit
$hitsptr->{q{'.$rulename.'}} ||= 0;
}
}

- Third, change all registered eval functions in DNSEval.pm to return
undef instead of 0

- Then make a new test and check if the eval functions have disappeared
from the debug statements with NO. If this is the case, the error is
correctly identified.

Regards,
Michael

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

May 15, 2022, 12:19 PM

Post #8 of 13 (427 views)

On Sun, May 15, 2022 at 06:58:27PM +0200, Michael Storz wrote:
>
> THIS IS WRONG.
>

LOL.

You could have just summarized it in one or two lines without all the
"error" hyperbole, and I would have understood it much faster.

Summary:

- Modify eval functions to return undef if they are async, after that
rule_pending() and {tests_pending} are redundant and can be removed,
tests_already_hit is enough to check status, as rule_ready() does not mark
it until last DNS lookup is finished.

Quite elegant solution. Will post patch to Bug 7987 soon for verifying.

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

Michael.Storz at lrz

May 15, 2022, 12:38 PM

Post #9 of 13 (427 views)

Am 2022-05-15 21:19, schrieb Henrik K:
> On Sun, May 15, 2022 at 06:58:27PM +0200, Michael Storz wrote:
>>
>> THIS IS WRONG.
>>
>
> LOL.
>
> You could have just summarized it in one or two lines without all the
> "error" hyperbole, and I would have understood it much faster.
>
> Summary:
>
> - Modify eval functions to return undef if they are async, after that
> rule_pending() and {tests_pending} are redundant and can be removed,
> tests_already_hit is enough to check status, as rule_ready() does not
> mark
> it until last DNS lookup is finished.
>
> Quite elegant solution. Will post patch to Bug 7987 soon for
> verifying.

Ok, ok, I will try to be more brief in the future ;-)

Michael

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

Michael.Storz at lrz

May 18, 2022, 2:03 AM

Post #10 of 13 (427 views)

Am 2022-05-17 17:09, schrieb bugzilla-daemon@spamassassin.apache.org:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
>
> --- Comment #27 from Michael Storz <sa-mst@lrz.de> ---
> (In reply to Henrik Krohns from comment #24)
>> (In reply to Michael Storz from comment #23)
>> > Great work! Are you now ready to switch from the brute force algorithm to a
>> > deterministic algorithm? I am not sure which of the two algorithms would be
>> > faster. That needs to be tested.
>>
>> It should not matter if I'm ready or not. If you have something, then
>> share
>> it for everyone. :-)
>
> Ok, let's see if the change leads to better performance or not. The
> idea is
> really simple, no big surprise, no fancy algorithm :-)
>
> Each meta rule needs a counter with the number of dependent rules. For
> each
> rule, an array is created with all meta rules that depend on that rule.
> When a
> rule has finished, got_hit/got_mis/rule_ready, it decrements the
> counter of
> each dependent meta rule. If the counter is 0, then the meta rule is
> moved from
> meta_pending to meta_ready queue. The meta_ready queue is then
> processed by
> do_meta_tests.
>
> Alternative: Instead of moving the meta rule, it is executed
> immediately. The
> do_meta_tests subroutine then becomes redundant (what to do with the
> reuse part
> then, I have no idea).
>
> In the end, the leftover meta-rules must then be handled with
> finish_meta_tests.
>
> A few code fragments for clarification:
>
> sub compile_meta_rules
>
> foreach my $name (keys %{$conf->{tests}}) {
> ...
> $conf->{meta_dep_count}->{$name} = scalar @{$rule_deps{$name}};
> foreach my $rulename (@{$rule_deps{$name}}) {
> push @{$conf->{rule_dependencies}->{$rulename}}, $name;
> }
> }
>
> ---------------------------------------
>
> got_hit/got_miss/rule_ready
>
> foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
> move_meta_p2r($meta_rule) unless
> --$conf->{meta_dep_count}->{$meta_rule};
> }
>
> foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
> run_meta_rule($meta_rule) unless
> --$conf->{meta_dep_count}->{$meta_rule};
> }
>
> run_meta_rule like do_meta_tests + delete
> $pms->{meta_pending}->{$meta_rule}
>
> sub do_meta_tests {
> my ($self, $pms, $priority) = @_;
> ...
> return if $self->{am_compiling}; # nothing to compile here
>
> my $mr = $pms->{meta_ready};
> my $mt = $pms->{conf}->{meta_tests};
> my $h = $pms->{tests_already_hit};
>
> while (my $rulename = shift @$mr) {
> # Metasubs look like ($_[1]->{$rulename}||($_[2]->{$rulename}?1:0))
> ...
> my $result = $mt->{$rulename}->($pms, $h, {});
> if ($result) {
> dbg("rules: ran meta rule $rulename ======> got hit ($result)");
> $pms->got_hit($rulename, '', ruletype => 'meta', value =>
> $result);
> } else {
> dbg("rules-all: ran meta rule $rulename, no hit") if
> $would_log_rules_all;
> $pms->got_miss($rulename); # mark meta done
> }
> }
> }

Performance: The standard ruleset of SpamAssassin includes about 3,000
rules. One third of these are meta rules. The performance of the
evaluation of the meta rules therefore matters.

My code fragments with move_meta_p2r and run_meta_rule would therefore
degrade performance if these were actually implemented as subroutine
calls, since about 1,000 additional subroutine calls would then be made.

The immediate execution of meta rules by run_meta_rule makes little
sense, since we do not short-circuit the boolean expressions of the meta
rules. Short-circuit would only bring something, if then in consequence
large parts of the rules would not be evaluated any more.

Therefore only move_meta_p2r comes into question. move_meta_p2r can be
implemented as the two statements add ready + delete pending.

In do_meta_tests we then have the set of ready meta rules. Instead of
processing them individually, which would mean executing 1,000
subroutine calls, we can now generate subroutines that execute many meta
rules at once, analogous to the procedure for the header tests. This
could reduce the number of calls to a handful and would boost
performance.

Michael

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

Michael.Storz at lrz

May 19, 2022, 7:11 AM

Post #11 of 13 (426 views)

Am 2022-05-19 11:53, schrieb bugzilla-daemon@spamassassin.apache.org:
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
>
> --- Comment #28 from Henrik Krohns <apache@hege.li> ---
> A quick cleanup to tidy things:
>
> - Use rule_ready() everywhere instead of direct tests_already_hit
> modify

Is this really necessary? It introduces thousands of additional
subroutine calls.

> - Simple tracking of meta dependency hits, run do_meta_tests only when
> needed
> - Do not run do_meta_tests on last priority, as finish_meta_tests will
> run
> anyway
>
> Committed revision 1901060.
>
> It already reduces unnecessary do_meta_tests calls a lot.
>
> Before:
>
> do_meta_tests -1000 ready 2
> do_meta_tests -950 ready 0
> do_meta_tests -900 ready 0
> do_meta_tests -100 ready 5
> do_meta_tests -90 ready 0
> do_meta_tests 0 ready 1234
> do_meta_tests 500 ready 0
>
> After:
>
> do_meta_tests -950 ready 2
> do_meta_tests -100 ready 5
> do_meta_tests 0 ready 1234
>
> Though in the grand total runtimes, it makes very little difference.
> It's hard
> to optimize things, as most metas will end up ready at priority 0
> anyway.
>
> Will look if there's further to improve, but I'm quite sceptical about
> the
> --$conf->{meta_dep_count} stuff, as it's really hard to guarantee that
> someone
> doesn't call rule_ready() multiple times, obviously then the counts
> will end up
> wrong and metas could be run prematurely. It's possible to track
> readiness
> perfectly with a hash from which you delete dependencies as they occur.
> I
> already tried something like that, but it gets so complex that it
> actually
> increases runtimes.

No, this is absolutely not a problem. After going through the array with
the dependencies the array should be deleted. Since it is only used once
at the first time the rule is declared finished either by a call to
got_hit or rule_ready, it can be deleted. Any further call of got_hit or
rule_ready makes no change of the counters of the meta rules.

The algorithm you have now implemented looks too complicated and does
not convince me.

Try my algorithm and do it even for the last priority. Then look which
rules must be handled by finish_meta_tests and if your algorithm is
needed for a better evaluation.

Michael

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

May 19, 2022, 8:01 AM

Post #12 of 13 (426 views)

On Thu, May 19, 2022 at 04:11:53PM +0200, Michael Storz wrote:
> Am 2022-05-19 11:53, schrieb bugzilla-daemon@spamassassin.apache.org:
> > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
> >
> > --- Comment #28 from Henrik Krohns <apache@hege.li> ---
> > A quick cleanup to tidy things:
> >
> > - Use rule_ready() everywhere instead of direct tests_already_hit modify
>
> Is this really necessary? It introduces thousands of additional subroutine
> calls.

Not sure why you even ask. Why do functions exist? Do you now prefer to
duplicate same code blocks everywhere?

> The algorithm you have now implemented looks too complicated and does not
> convince me.

Maybe for once you could post an actual tested and benchmarked patch,
instead of some random code snippets and comments? :-) There's only so much
time I can spare myself.

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready [ In reply to ]

Michael.Storz at lrz

May 20, 2022, 5:19 AM

Post #13 of 13 (423 views)

Am 2022-05-19 17:01, schrieb Henrik K:
> On Thu, May 19, 2022 at 04:11:53PM +0200, Michael Storz wrote:
>> Am 2022-05-19 11:53, schrieb bugzilla-daemon@spamassassin.apache.org:
>> > https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
>> >
>> > --- Comment #28 from Henrik Krohns <apache@hege.li> ---
>> > A quick cleanup to tidy things:
>> >
>> > - Use rule_ready() everywhere instead of direct tests_already_hit modify
>>
>> Is this really necessary? It introduces thousands of additional
>> subroutine
>> calls.
>
> Not sure why you even ask. Why do functions exist? Do you now prefer
> to
> duplicate same code blocks everywhere?
>
>> The algorithm you have now implemented looks too complicated and does
>> not
>> convince me.
>
> Maybe for once you could post an actual tested and benchmarked patch,
> instead of some random code snippets and comments? :-) There's only so
> much
> time I can spare myself.

Ok, I will see what I can do. However, it will take until next week, as
I am busy with something else at the moment.

Michael