Mailing List Archive

[Bug 3096] [review] improve mass-check so that it can be run from a different directory
http://bugzilla.spamassassin.org/show_bug.cgi?id=3096

duncf@debian.org changed:

What |Removed |Added
----------------------------------------------------------------------------
Summary|RFE: improve mass-check so |[review] improve mass-check
|that it can be run from a |so that it can be run from a
|different directory |different directory



------- Additional Comments From duncf@debian.org 2004-03-27 14:02 -------
Is nobody going to comment on this??

If so, I'm going to have to commit.... ;-)



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3096] [review] improve mass-check so that it can be run from a different directory [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3096





------- Additional Comments From quinlan@pathname.com 2004-03-27 15:55 -------
> Is nobody going to comment on this??

-0.5 not quite ready yet

The patch really needs to fix all of the scripts in masses to handle the
new format, including the ones under rule-qa.

I agree with using a single output file, but I'd like the new format
(the format not the code itself) to be sufficient to handle:

- message classes other than spam and ham (both for the manual
classification as well as the result determined by SpamAssassin).

- using "unknown" or "none" for unclassified checks (like
"./mass-check --mbox file.mbox" with no --spam or --ham
argument or class specification.

I propose the following format:

<manual class> <result class> <score> <id> <rules> <value pairs>

where, for our current code, <manual class> is:

"spam" | "ham" | "none"

and <result class> is:

"spam" | "ham"

I'd also change the score to be the precise floating point score while
we're at it (that, at least, should be an easy change).




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3096] [review] improve mass-check so that it can be run from a different directory [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3096





------- Additional Comments From duncf@debian.org 2004-03-27 18:54 -------
Subject: Re: [review] improve mass-check so that it can be run from a different directory

> -0.5 not quite ready yet

True.

> The patch really needs to fix all of the scripts in masses to handle the
> new format, including the ones under rule-qa.

I plan on re-writing most of the other scripts (I'm part way
there). But as far as rule-qa, I'll probably need some help sorting
through it.

> I propose the following format:
>
> <manual class> <result class> <score> <id> <rules> <value pairs>
>
> where, for our current code, <manual class> is:
>
> "spam" | "ham" | "none"
>
> and <result class> is:
>
> "spam" | "ham"

I'd prefer that we stick with single characters, since that is what
ArchiveIterator does. (It passes "s" or "h" around instead of "spam"
or "ham") Furthermore, having it fixed width is a good thing imho.

> I'd also change the score to be the precise floating point score while
> we're at it (that, at least, should be an easy change).

Agreed, but all the scripts need to change too. Since I'm rewriting
them to share code, that won't be hard.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3096] [review] improve mass-check so that it can be run from a different directory [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3096





------- Additional Comments From jm@jmason.org 2004-03-28 20:49 -------
+1 on duncan's latest comments.

'> I propose the following format:
> <manual class> <result class> <score> <id> <rules> <value pairs>
> where, for our current code, <manual class> is:
> "spam" | "ham" | "none"
> and <result class> is:
> "spam" | "ham"
I'd prefer that we stick with single characters, since that is what
ArchiveIterator does. (It passes "s" or "h" around instead of "spam"
or "ham") Furthermore, having it fixed width is a good thing imho.'

what about:

<manual class><result class> <score> <id> <rules> <value pairs>

with one-letter classes. That gives us:

hh: manually ham, classed as ham
hs: false positive
sh: false negative
ss: manually spam, classed as spam

That's handy because (a) it's closer to what the academic lit uses (TCR
calculation in particular uses just those classes with pretty much that
nomenclature in its computation), (b) it's very logical and obvious, (c) it fits
in 2 bytes, so fixed width, (d) it fits in one non-whitespace "token" so very
little script modification will be required in rule-qa et al. where
/\S+\s+\S+etc./ is used.

The "no manual classification" type would then be

us: unknown, marked as spam
uh: unknown, marked as ham

like this:

hh 0 ...path... RULES bayes=0.001
hh 0 ...path... RULES bayes=0.001



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3096] [review] improve mass-check so that it can be run from a different directory [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3096





------- Additional Comments From duncf@debian.org 2004-03-28 22:58 -------
Subject: Re: [review] improve mass-check so that it can be run from a different directory

> what about:
>
> <manual class><result class> <score> <id> <rules> <value pairs>
>
> with one-letter classes. That gives us:
>
> hh: manually ham, classed as ham
> hs: false positive
> sh: false negative
> ss: manually spam, classed as spam

The only problem with that is it's slightly harder for backward compat
if we ever need to have multiple character classes. (With a space we
can change a (\w)\s+(\w) to a (\w+)\s+(\w+) and still support the old
format). Not really a big deal, though.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3096] [review] improve mass-check so that it can be run from a different directory [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3096





------- Additional Comments From quinlan@pathname.com 2004-03-28 23:23 -------
Subject: Re: [review] improve mass-check so that it can be run from a different directory

bugzilla-daemon <bugzilla-daemon@bugzilla.spamassassin.org> writes:

> I'd prefer that we stick with single characters, since that is what
> ArchiveIterator does. (It passes "s" or "h" around instead of "spam"
> or "ham") Furthermore, having it fixed width is a good thing imho.'

Changing ArchiveIterator is trivial, so I don't think this is a reason
worth considering. If I recall correctly, the single character thing is
my fault, I just think it was a bad design design in retrospect.

> with one-letter classes. That gives us:
>
> hh: manually ham, classed as ham
> hs: false positive
> sh: false negative
> ss: manually spam, classed as spam

I think the classes at least need to be separated by some character to
allow multiple-character classes. If not whitespace, then a ',' or a
'-' character.

I'm certain the single character stuff is hack crap. We should drop it.

Daniel





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3096] [review] improve mass-check so that it can be run from a different directory [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3096





------- Additional Comments From quinlan@pathname.com 2004-03-28 23:25 -------
Subject: Re: [review] improve mass-check so that it can be run from a different directory

> The only problem with that is it's slightly harder for backward compat
> if we ever need to have multiple character classes. (With a space we
> can change a (\w)\s+(\w) to a (\w+)\s+(\w+) and still support the old
> format). Not really a big deal, though.

I agree 100% and I think it is a moderately-sized deal.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3096] [review] improve mass-check so that it can be run from a different directory [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3096





------- Additional Comments From jm@jmason.org 2004-03-29 11:14 -------
ok, I'll let you two decide on it then, I don't really care that much either way ;)



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
[Bug 3096] [review] improve mass-check so that it can be run from a different directory [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=3096





------- Additional Comments From duncf@debian.org 2004-03-29 20:24 -------
I'm going to leave it as single characters but in the form
<manual class> <result class> <score> <id> <rules> ...

I'll also fix the score to be %05.2 rather than %2d



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.