Hello!
I've decided to compare how dspam and spamassassin
bayes implementations perform, because speed is very
important for large installations and author of dspam
says that his pure C implementation is much faster than
Perl. I've created two perl scripts for running dspam
agent and spamc over my ham corpus and measuring total,
min, max and average time of processing message. Dspam
was configured with mysql storage optimized for
speed. For spamassassin benchmark I've used spamd with
only bayes rules. Both were trained on exactly the same
spam and ham corpus. Here are the results:
# DSPAM
1630 messages processed.
Total time: 230.084 wallclock secs (30.42 cusr + 17.97 csys = 48.39 CPU)
Max message processing time: 13.3091468811035
Avg message processing time: 0.140900963947086
Min message processing time: 0.0444350242614746
# SpamAssassin
1630 messages processed.
Total time: 254.895 wallclock secs ( 3.54 cusr + 10.46 csys = 14.00 CPU)
Max message processing time: 3.65092492103577
Avg message processing time: 0.156147952606342
Min message processing time: 0.0727198123931885
It seems that SpamAssassin is not much slower than
dspam, althoug results are biased because:
1) dspam was configured with default settings which
enables two algorithms (bayes and altbayes);
2) dspam was configured to attach signatures with
tokens for re-learning
3) dspam uses chained tokens which increase volume of
data to be processed.
I'm also very surprised that dspam max message processing
time is higher.
This is mostly a toy benchmark but I would like to hear
suggestions on how results can be imroved.
Eugene
--
Email: jmv /at/ online.ru
I've decided to compare how dspam and spamassassin
bayes implementations perform, because speed is very
important for large installations and author of dspam
says that his pure C implementation is much faster than
Perl. I've created two perl scripts for running dspam
agent and spamc over my ham corpus and measuring total,
min, max and average time of processing message. Dspam
was configured with mysql storage optimized for
speed. For spamassassin benchmark I've used spamd with
only bayes rules. Both were trained on exactly the same
spam and ham corpus. Here are the results:
# DSPAM
1630 messages processed.
Total time: 230.084 wallclock secs (30.42 cusr + 17.97 csys = 48.39 CPU)
Max message processing time: 13.3091468811035
Avg message processing time: 0.140900963947086
Min message processing time: 0.0444350242614746
# SpamAssassin
1630 messages processed.
Total time: 254.895 wallclock secs ( 3.54 cusr + 10.46 csys = 14.00 CPU)
Max message processing time: 3.65092492103577
Avg message processing time: 0.156147952606342
Min message processing time: 0.0727198123931885
It seems that SpamAssassin is not much slower than
dspam, althoug results are biased because:
1) dspam was configured with default settings which
enables two algorithms (bayes and altbayes);
2) dspam was configured to attach signatures with
tokens for re-learning
3) dspam uses chained tokens which increase volume of
data to be processed.
I'm also very surprised that dspam max message processing
time is higher.
This is mostly a toy benchmark but I would like to hear
suggestions on how results can be imroved.
Eugene
--
Email: jmv /at/ online.ru