Third IEEE International Conference on Data Mining
Download PDF

Abstract

This paper investigates the factors leading to producing suboptimal models when training and test class distributions (or misclassification costs) are matched. Our result shows that model stability plays a key role in determining whether the algorithm produces an optimal model from a matching distribution (cost). The performance difference between a model trained from the matching distribution (cost) and the optimal model generally increases as the degree of model stability decreases. The practical implication of our result is that one should only follow the conventional wisdom of using a training class distribution (cost) that matches the test class distribution (cost) to train a classifier if the learning algorithm is known to be stable.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles