Mailing List Archive

[Bug 7735] Meta rules need to handle missing/unrun dependencies
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

Loren Wilton <lwilton@earthlink.net> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |lwilton@earthlink.net

--- Comment #8 from Loren Wilton <lwilton@earthlink.net> ---
Just thinking off the top of my head, I think I'd consider something along the
following lines.

1. Conceptually build four prioritized tree-lists of rules:
a. non-net, non-meta
b. non-net, meta
c. net, non-meta
d. net, meta

The tree nodes are the priority order, for each priority where rules exist
there is a list of rules of that priority. The rules are evaluated in tree
order and branch order, in the order given above.

While I've described this as four trees of lists, it could equally be done with
one partitioned tree, where each partition is assigned a base rule priority
higher than any rule priority that can be manually assigned, for instance 0-99,
100-199, etc. The goal is to create the above-described rule evaluation order
constraints.

Non-meta rules are trivially assigned a priority, as it is a given for each
rule.

Meta rules may have been given priorities, but that needs to be considered as a
minimum priority, and can only be used to delay the meta evaluation.

As an implementation I think I'd initially throw all meta rules into a
bucket/list of some sort, remembering their priorities, but not assigning them
to the trees until all non-meta rules are assigned to the correct tree nodes.

Then I would pull meta rules out of the bucket and look up each referenced
rule. I would remember the highest tree node of any referenced rule.

If all referenced rules are found for a meta, it does not depend on any other
meta rules, and can be assigned to the meta tree at either the level of the
highest dependency rule found, or the level given for the meta, only if that is
higher than the highest dependency rule.

If not all dependencies are found, I would throw it at the back of the
unprocessed meta list, with a flag saying it had been seen once. (Alternately,
into a reprocessing meta list separate from the current list.)

When the current meta list is depleted, it can be replaced with the postponed
meta list, and that list again processed exactly as above. A flag can be set if
at least one meta is removed from the list into a tree. Any metas still with
unresolved dependencies can again be thrown back into an alternate pool. As
long as at least one meta is removed from the unprocessed list the reprocessing
can be repeated. If all remaining metas are passed and none are placed into the
tree, the remaining metas are either circular or depend on undefined rules. I
would throw them out as errors at that point.

Obviously there are faster ways to perform the above evaluation, but since it
only needs to be done once, simplicity may be a virtue in the implementation.

In any case, the end result is an ordered list if rules to evaluate in the
order they are found in the list. The non-net stuff gets done quickly, and the
net rules queued quickly.

Metas that depend on async net rules are a pain because the net results are
returned out of order. I can't think of a clever way to deal with this, but a
simple brute-force approach is easy. Each net rule needs a list of the metas
that depend on it. Each net meta needs a count of dependencies required to
complete, and a counter initialized to zero. As each net rule completes, it
walks it's dependent list and increments the count for the dependency. If the
incremented count matches the total dependency count the meta rule can be run.

Overall I think that will handle getting things into the right evaluation order
fairly easily and also evaluating them in the correct order without too much
pain.

(Of course I don't recall what the short-circuit flag does exactly. It probably
throws a wrench in the above evaluation mechanism.)

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #9 from Loren Wilton <lwilton@earthlink.net> ---
Minor correction to the previous post. I said there that there was a sync rule
list and a sync meta list, and the sync rules were processed in priority order,
then the sync meta rules. This isn't quite correct.

The two sync lists need to be interleaved. The sync rules of priority N are
run, then the sync meta rules of the same priority. The list assignment
processing will have seen to it that a meta rule is at a higher (later
evaluation) priority than any rules that it depends on.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

Henrik Krohns <apache@hege.li> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |quinlan@pathname.com

--- Comment #10 from Henrik Krohns <apache@hege.li> ---
*** Bug 4549 has been marked as a duplicate of this bug. ***

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #11 from Henrik Krohns <apache@hege.li> ---
Created attachment 5742
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5742&action=edit
Meta patch for trunk v1

Took too many days, but here's patch for a brute force approach.

Things done:

- Rules now keep track of when they are ready/done in $pms->{tests_already_hit}
(0 == done, but didn't hit). This is of obviously required feature for metas to
know when evaluating can be done.

- $pms->rule_pending() must be called from eval-rule function, if the result
can arrive later than when exiting the function.

- $pms->rule_done() need be called when above result has arrived. Note that if
a rule is waiting for multiple DNS queries, it is considered done when either
1) a positive hit from lookup 2) all queries are returned. ($pms->rule_done()
does automatic check with has_pending_lookups() and ignores the call if there
are still pending queries)

- This might break backwards compatibility for third party async plugins, which
do not use above functions. In some cases metas using such rules might not hit
then. Still TODO to check if it's a problem and if it can be helped (maybe try
calling rule_pending() automatically from AsyncLoop, requires some kludging as
it's independent from PMS..)

- Check.pm do_meta_tests() basically runs at end of every priority and brute
force iterates through all non-finished metas, checking if their dependent
rules are ready. A final do_meta_tests() at scan end will also accept "lookup
pending" rules as ready (not sure yet if this is desired or not?)

- Duplicate rule merging code has been removed, since it would require new
bloaty extra logic to work with all the above. It saved very little resources
anyway, makes no difference.

Some 5% overall scan time penalty from the brute force meta iteration, probably
could be improved with some extra lookup tables. But that's without network
lookups, there might be some speed up too as metas now don't need to wait for
any pending lookups, they just evaluate when it's ready.

All tests run ok, testing on my live server..

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #12 from Henrik Krohns <apache@hege.li> ---
(In reply to Henrik Krohns from comment #11)
> - Duplicate rule merging code has been removed, since it would require new
> bloaty extra logic to work with all the above. It saved very little
> resources anyway, makes no difference.

PS. While doing some tests I found a bug in merging too.. if you have duplicate
rules but set some score to 0, it will disable the other rule too. Hit my head
for 15 minutes until I realized why my test was acting strange. *groan*

body FOO1 /./
body FOO2 /./
score FOO1 0 # FOO2 will never hit

Farewell kludgy merging code, won't be missing you.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

Henrik Krohns <apache@hege.li> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #5742|0 |1
is obsolete| |

--- Comment #13 from Henrik Krohns <apache@hege.li> ---
Created attachment 5743
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5743&action=edit
Meta patch for r1889731 v2

Updated patch for current trunk.

- do_meta_tests() updated a bit, now also uses get_pending_lookups (renamed
from has_pending_lookups) to check if rule is pending, this should fix most 3rd
party async plugins (like my iXhash2)

Currently both of these are considered ready/done rules evaluating to 0:

- non-existing rules
- disabled rules (score 0)

Still considering if it's manageable to do the proposed "evaluate un-run as
both true and false".

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #14 from Henrik Krohns <apache@hege.li> ---
Ok I've finished the dual evaluation code, but I'm still wondering whether
timed out lookups should count as unrun rules.

Let's take this stock rule as example.

meta DIGEST_MULTIPLE RAZOR2_CHECK + DCC_CHECK + PYZOR_CHECK > 1

This gets problematic even when considering disabled rules. If simply one of
these is not used/loaded, or lookup times out, it won't hit even if two of them
hit.

Maybe a tflag that allows unrun rules for that rule?

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #15 from Henrik Krohns <apache@hege.li> ---
Argh never mind the previous comment, brain freeze.. that specific example
works, as both evals are true, let's say DCC_CHECK is disabled

1 + 0 + 1 > 1
1 + 1 + 1 > 1

Any other scenario to consider?

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

Henrik Krohns <apache@hege.li> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #5743|0 |1
is obsolete| |

--- Comment #16 from Henrik Krohns <apache@hege.li> ---
Created attachment 5744
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5744&action=edit
Meta patch for r1889916 v3

Hopefully the final patch version. Dual meta evaluation as described in Bug
4549, nonexistent, disabled, timed out lookups are all considered unrun.

Some examples of passed tests from basic_meta2.t:

# Non-existing rule
# Should not hit, meta is evaled twice: (!0) && (!1)
meta TEST_META_2 !NONEXISTINGRULE
# Should hit, meta is evaled twice: (!0 || 0) && (!1 || 1)
meta TEST_META_3 !NONEXISTINGRULE || NONEXISTINGRULE

# Disabled rule, same as above
body TEST_DISABLED /a/
score TEST_DISABLED 0
# Should not hit
meta TEST_META_4 !TEST_DISABLED
# Should hit
meta TEST_META_5 !TEST_DISABLED || TEST_DISABLED

# Unrun rule (due to local tests only), same as above
askdns TEST_DISABLED2 spamassassin.org TXT /./
# Should not hit
meta TEST_META_6 !TEST_DISABLED2
# Should hit
meta TEST_META_7 !TEST_DISABLED2 || TEST_DISABLED2

# Should not hit
meta TEST_META_8 __FOO_1 + NONEXISTINGRULE == 2
# Should not hit
meta TEST_META_9 __FOO_1 + NONEXISTINGRULE + __FOO_2 == 2
# Should hit (both eval checks are true thanks to >1)
meta TEST_META_A __FOO_1 + NONEXISTINGRULE + __FOO_2 > 1

For reference some repeatable benchmarks checking 100 messages:

- 3.4 55s 155MB
- trunk 50s 168MB
- trunk.meta 56s 162MB

Slight speed loss from current trunk, but on par with 3.4. Memory usage did
drop a bit from ditching merging and other stuff.

Will commit in few days unless hear otherwise.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #17 from Henrik Krohns <apache@hege.li> ---
Doing some more testing, mass check 4000 messages with -j4

trunk 138s
trunk.meta 190s

Seems the performance gap is a bit too unacceptable here. Still looking for
ways to optimize.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

Henrik Krohns <apache@hege.li> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #5744|0 |1
is obsolete| |

--- Comment #18 from Henrik Krohns <apache@hege.li> ---
Created attachment 5746
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5746&action=edit
Meta patch for r1890037 v4


Managed to close the gap to 10% as tested properly with stock ruleset (which
has much less rules than my custom setup and makes meta checks runtime
proportionally longer).

Calling eval 'foobar' anytime is very expensive, especially in such a loop. Now
the meta checks are pre-compiled as subs. A bit similarly as is done on other
ruletypes.

Will commit when it gets a bit of runtime on my live system.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #19 from Henrik Krohns <apache@hege.li> ---
Example: BITCOIN_SPAM_05 will not run in daily masschecks, because
__SPOOFED_FREEMAIL -> __NOT_SPOOFED depends on DKIM/SPF checks, which obviously
don't run with local checks only.

Solutions?

- Ignore ifplugin SPF/DKIM without network checks ? Requires hardcoding, not
nice
- "if (local_tests_only)" in .cf ?
- New rule __LOCAL_TESTS_ONLY which can help due to false/true meta-evaluation
?
- Don't run BITCOIN_SPAM_05 in daily, since it doesn't tell the whole truth ?

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #20 from John Hardin <jhardin@impsec.org> ---
(In reply to Henrik Krohns from comment #19)
> Example: BITCOIN_SPAM_05 will not run in daily masschecks, because
> __SPOOFED_FREEMAIL -> __NOT_SPOOFED depends on DKIM/SPF checks, which
> obviously don't run with local checks only.

That's in there solely as a FP avoidance. In that use case it's reasonable to
run the rule assuming __SPOOFED_FREEMAIL = 0. I don't think that would be a
valid assumption in all cases, though.

> Solutions?
>
> - Ignore ifplugin SPF/DKIM without network checks ? Requires hardcoding, not
> nice
> - "if (local_tests_only)" in .cf ?

That would be preferable, because then you could do something like:

ifplugin SPF/DKIM && !local_tests_only
{actual __SPOOFED_FREEMAIL definition}
else
meta __SPOOFED_FREEMAIL 0
endif

> - New rule __LOCAL_TESTS_ONLY which can help due to false/true
> meta-evaluation ?
> - Don't run BITCOIN_SPAM_05 in daily, since it doesn't tell the whole truth ?

Is that in there to include what we're currently doing? Or is it not inheriting
tflags net and thus it is running with the assumption I stated above?

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #21 from Henrik Krohns <apache@hege.li> ---
(In reply to John Hardin from comment #20)
> Is that in there to include what we're currently doing? Or is it not
> inheriting tflags net and thus it is running with the assumption I stated
> above?

Not sure I understand the questions. But net-flag makes no difference on when
rules run or are considered unrun.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #22 from Henrik Krohns <apache@hege.li> ---
(In reply to Henrik Krohns from comment #21)
> But net-flag makes no difference on when rules run or are considered unrun.

Ok correction, eval-rules marked as net are not run if local rules only. But
metas or any other types don't have such a check (should they?).

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #23 from Loren Wilton <lwilton@earthlink.net> ---
It seems to me that if net rules are turned off, they are unrun and the result
is indeterminate: that is, we don't know how they would have evaluated if they
had been run.

In that sense, they are then no different than a net rule that was fired but
the result didn't arrive in time. With the recent changes, meta rules are (I
believe) run twice with the result both true and false for an unreceived net
result. I would say that with net rules disabled, we would treat all meta rules
that depend on net rule results the same way: the net results weren't received
in time, so they need to be evaluated twice.

This should extend to any missing result a meta rule depends on, but I don't
know if it makes more sense to do the double evaluation based on not having the
net rule result, or based on checking tflags net for the dependency and some
flag that says net rules are disabled. Given that the case:

#body SOME_DISABLED_RULE /./
meta COMPOUND_TEST SOME_DISABLED_RULE && OTHER_RULE

should evaluate the meta based solely on OTHER_RULE, it seems to me that
disabled net rules are the same case as the undefined disabled rule, so
possibly the same logic works to ignore the results if disabled net rules.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #24 from Henrik Krohns <apache@hege.li> ---
Loren, I might be a bit tired from long work day, but I'm not sure what you are
implying/suggesting.. tried reading it thrice. :-)

Unrun rule currently has simple definition: either it was not run at all, or
result was not received (timeout). Unrun rules in metas are evaluated twice as
before mentioned. I don't think tflags should matter here, unrun is unrun.

Let me know in clear terms, if you think something should be changed there.

I will look if there is a some 3.4 compatible way to implement "if
(local_tests_only)".

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #25 from Henrik Krohns <apache@hege.li> ---
(In reply to Henrik Krohns from comment #24)
> I will look if there is a some 3.4 compatible way to implement "if
> (local_tests_only)".

Ugh it gets a bit wonky. Current conf parser considers any unknown string as
true.

# Will always apply on old versions
if (local_rules_only)
meta __SPOOFED 0
endif

# Something like this would be required, mandatory feature check
if can(Mail::SpamAssassin::Conf::feature_local_rules_only) && local_rules_only
meta __SPOOFED 0
else
meta __SPOOFED DKIM_INVALID || STUFF
endif

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #26 from Loren Wilton <lwilton@earthlink.net> ---
> Loren, I might be a bit tired from long work day, but I'm not sure what you are implying/suggesting.. tried reading it thrice. :-)

Sorry, I'm too verbose because I don't know the actual code at all well.

> Unrun rule currently has simple definition: either it was not run at all, or result was not received (timeout). Unrun rules in metas are evaluated twice as

That is what I think should happen.
I don't understand the concern expressed in Comment 19.

> Ugh it gets a bit wonky. Current conf parser considers any unknown string as true.

Does this mean that an undefined rule appearing in a meta evaluates to true
rather than to "unrun"? If so I see that as a possible problem; I'd prefer an
undefined rule to be treated as unrun, but I can see the logic of treating it
as a constant value (but I'd think false makes more sense).

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

--- Comment #27 from Henrik Krohns <apache@hege.li> ---
(In reply to Loren Wilton from comment #26)
> I don't understand the concern expressed in Comment 19.

It's been exaplained a few times. Rules that are simply intended for reducing
FPs are not run in daily masschecks.

meta __NOT_SPOOFED SPF_PASS || DKIM_VALID || !__LAST_EXTERNAL_RELAY_NO_AUTH ||
ALL_TRUSTED

SPF_PASS or DKIM_VALID are unrun. Double eval will not make the meta hit.
(0||0||0||0) && (1||1||0||0)

It makes it harder to tune rules as you need to wait a week for results. There
should be a way to bypass it, if the dependencies are not considered critical.


> Does this mean that an undefined rule appearing in a meta evaluates to true
> rather than to "unrun"? If so I see that as a possible problem; I'd prefer
> an undefined rule to be treated as unrun, but I can see the logic of
> treating it as a constant value (but I'd think false makes more sense).

Sorry, to clarify, this only applies to "if" clauses.

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

Henrik Krohns <apache@hege.li> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #5746|0 |1
is obsolete| |

--- Comment #28 from Henrik Krohns <apache@hege.li> ---
Created attachment 5747
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5747&action=edit
Meta patch for r1890185 v5

Latest version now supports feature check:

if can(Mail::SpamAssassin::Conf::feature_local_rules_only) && local_rules_only
endif

... AND pseudo-rule type check:

meta __SPOOFED DKIM_INVALID || local_tests_only

--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7735] Meta rules need to handle missing/unrun dependencies [ In reply to ]
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7735

Henrik Krohns <apache@hege.li> changed:

What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED

--- Comment #29 from Henrik Krohns <apache@hege.li> ---
Considering this ready for trunk commit and usage.

Please check and fix affected mass-check rules.

Transmitting file data .......................done
Committing transaction...
Committed revision 1890274.

--
You are receiving this mail because:
You are the assignee for the bug.