Mailing List Archive

[Spamassassin Wiki] Update of "RocGraphs" by JustinMason
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RocGraphs

The comment on the change is:
add a page about ROC graphs

New page:
= ROC Graphs =

[http://en.wikipedia.org/wiki/Receiver-Operator_Characteristic Wikipedia says]:

In signal detection theory, a receiver operating characteristic (ROC) is a graphical plot of the sensitivity vs. 1-specificity for a binary classifier system as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the fraction of true positives (TP) vs. the fraction of false positives (FP). The usage receiver operator characteristic is also common.

ROC curves are used to evaluate the results of a prediction and were first employed in the study of discriminator systems for the detection of radio signals in the presence of noise in the 1940s. In the 1960s they began to be used in psychophysics, to assess human (and occasionally animal) detection of weak signals. They also proved to be useful for the evaluation of machine learning results, such as the evaluation of Internet search engines. They are also used extensively in epidemiology and medical research.

In SpamAssassin terms, a ROC graph is the FpFnPercentages values, graphed over multiple thresholds. There's a script in SVN used to measure this at {{{masses/mk-roc-graphs}}}; here's sample output:

http://taint.org/xfer/2005/roc_curves_with_3045.png

See also MeasuringAccuracy for other methods used to measure spamfilter accuracy.