This question appears very trivial and might just be meaningless, so I put it up at the risk of embarrassing myself. However, since it is a genuine question. I still put it up. All thoughts are welcome.
Question: When trying to quantify the performance of a classifier. What advantages does RMSE offer over the Area under the Curve under the ROC metric? And what does the AUC offer that the RMSE does not? I find AUC very intuitive and prefer using it for classification tasks. But can I give a theoretical reason for using it above RMSE and the vice versa? Review committees have different preferences, some journals prefer reporting the RMSE while some prefer the AUC, some ask for both. Another example being – The 2010 KDD Cup used RMSE while the 2010 UCSD data mining competition used AUC.
Or is this a bad question to ask?
To paraphrase my question – What can be instances in which a classifier is deemed as “good” by the AUC measure and “not so good” by the RMSE measure. What would be the exact reason for such a different “opinion”? And in what situations should I use AUC and in what situations should I use RMSE?
Some Background : If they are equivalent, then you would expect a strong linear relationship (with a negative correlation). That means that for a perfect classifier RMSE would be zero and the AUC 1.
I always use both for all purposes. Here is a sample graph.
This is actually a very typical graph, and there are no surprises with it. If you leave out some “bad examples” such as those at (0.4, 0.65) and (0.38, 0.7), the graph has a good negative correlation (as measured by the line fit).
So, the question remains for me. What are the advantages and disadvantages of both?
Recommendations :
1. ROC Graphs : Notes and Practical Considerations for Researchers – Tom Fawcett
This paper (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.73.587) looks at relationship between AUC and 0-1 loss, perhaps some things are relevant to RMSE. As far as which one to use, IMHO it should always come down to the “real loss” behind the learning task, ie, actual amount of time, money or energy wasted due to poor performance. For instance, in Netflix challenge, RMSE was the real loss (of money) because it corresponded directly to odds of winning the prize. 0-1 loss corresponds to real loss (of time) when classifier errors need to be manually corrected by a human. To be honest I can’t think of any real-life losses similar to AUC.
Yaroslav,
Thank you so much for your comment. I will say I have got “terms/keywords” for my little problem. :)
I have just returned to school, and was at a data mining startup for a while. At both places, I never really got an answer to this question. I always thought it was too stupid to ask as RMSE was a measure of some kind of loss while AUC was just a measure to rank classifiers based on the loss. I never really got how to measure how good a metric was. And the pros and cons of some of them around.
And thank you for the paper by C Cortes (the SVM superstar!). I actually perused through ALL the papers that cite it.
I found these papers very interesting. And will take time next weekend to read them.
1. Generalization Bounds for the Area under the ROC Curve – Shivani Agarwal et al.
2. A Data-Dependent Generalization Error Bound for the AUC – Nicolas Usunier et al.
I also found this paper by Mark Reid very interesting – Information, Divergence and Risk for Binary Experiments. And these rule of the thumb articles titled ROC vs Accuracy vs AROC and Choice of Metrics by John Langford.
Thanks Again,
Shubhendu
Sir, your postings are really very informative. Its my first time visit to your blog & I was amazed by the depth of knowledge. Do post more relevant stuffs…..
It may help to consider situations in which the two measures disagree. Within the range of values one usually encounters, I’d expect there to be some correlation between RMSE and AUC. The question is: under what circumstances does a change cause one to improve, while the other deteriorates?
Yaroslav,
Funny that you speak of “real” loss.
As a commercial marketing data miner, I rather speak of “Real Gains” when using a model. For that you take into account : the fixed cost of a campaign, the variable cost (contact cost) which is function of the number of prospects targeted and of cause the cumulated net gains (=buy probability x unit profit), as compared to a rather random (or experienced guessed) selection of prospects.
See my blog post on this subject for a more elaborated discussion (http://bit.ly/3F99A6)
Zyxo.