Abstract
Diversity among the members of a team of classifiers is deemed to be a key issue in classifier combination. However, measuring diversity is not straightforward because there is no generally accepted formal definition. We have found and studied ten statistics which can measure diversity among binary classifier outputs (correct or incorrect vote for the class label): four averaged pairwise measures (the Q statistic, the correlation, the disagreement and the double fault) and six non-pairwise measures (the entropy of the votes, the difficulty index, the Kohavi-Wolpert variance, the interrater agreement, the generalized diversity, and the coincident failure diversity). Four experiments have been designed to examine the relationship between the accuracy of the team and the measures of diversity, and among the measures themselves. Although there are proven connections between diversity and accuracy in some special cases, our results raise some doubts about the usefulness of diversity measures in building classifier ensembles in real-life pattern recognition problems.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Afifi, A., & Azen, S. (1979). Statistical analysis. A computer oriented approach. New York: Academic Press.
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–142.
Breiman, L. (1996). Bagging predictors. Machine Learning, 26:2, 123–140.
Breiman, L. (1999). Combining predictors. In A. Sharkey (Ed.), Combining artificial neural nets (pp. 31–50). London: Springer-Verlag.
Cunningham, P.,& Carney, J. (2000). Diversity versus quality in classification ensembles based on feature selection. Technical Report TCD-CS-2000-02, Department of Computer Science, Trinity College Dublin.
Dietterich, T. (2000a). Ensemble methods in machine learning. In J. Kittler, & F. Roli (Eds.), Multiple classifier systems, Vol. 1857 of Lecture Notes in Computer Science (pp. 1–15). Cagliari, Italy, Springer.
Dietterich, T. (2000b). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 40:2, 139–157.
Drucker, H., Cortes, C., Jackel, L., LeCun, Y., & Vapnik, V. (1994). Boosting and other ensemble methods. Neural Computation, 6, 1289–1301.
Duin, R. (1997). PRTOOLS (Version 2). A Matlab toolbox for pattern recognition. Pattern Recognition Group, Delft University of Technology.
Fleiss, J. (1981). Statistical methods for rates and proportions. John Wiley & Sons.
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119–139.
Giacinto, G., & Roli, F. (2001). Design of effective neural network ensembles for image classification processes. Image Vision and Computing Journal, 19:9/10, 699–707.
Hansen, L.,& Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12:10, 993–1001.
Hashem, S. (1999). Treating harmful collinearity in neural network ensembles. In A. Sharkey (Ed.), Combining artificial neural nets (pp. 101–125). London: Springer-Verlag.
Ho, T. (1998). The random space method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:8, 832–844.
Huang, Y., & Suen, C. (1995). A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 90–93.
Ji, C., & Ma, S. (1997). Combination of weak classifiers. IEEE Transactions on Neural Networks, 8:1, 32–42.
Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In L. Saitta (Ed.), Machine Learning: Proc. 13th International Conference (pp. 275–283). Morgan Kaufmann.
Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation and active learning. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 231–238). Cambridge, MA: MIT Press.
Kuncheva, L. (2000). Fuzzy classifier design. Studies in Fuzziness and Soft Computing. Heidelberg: Springer Verlag.
Kuncheva, L., Bezdek, J., & Duin, R. (2001). Decision templates for multiple classifier fusion: An experimental comparison. Pattern Recognition, 34:2, 299–314.
Kuncheva, L., Whitaker, C., Shipp, C.,& Duin, R. (2000). Limits on the majority vote accuracy in classifier fusion. Pattern Analysis and Applications. accepted.
Lam, L. (2000). Classifier combinations: Implementations and theoretical issues. In J. Kittler, & F. Roli (Eds.), Multiple classifier systems, Vol. 1857 of Lecture Notes in Computer Science (pp. 78–86). Cagliari, Italy, Springer.
Littlewood, B., & Miller, D. (1989). Conceptual modeling of coincident failures in multiversion software. IEEE Transactions on Software Engineering, 15:12, 1596–1614.
Liu, Y., & Yao, X. (1999). Ensemble learning via negative correlation. Neural Networks, 12, 1399–1404.
Looney, S. (1988). A statistical technique for comparing the accuracies of several classifiers. Pattern Recognition Letters, 8, 5–9.
Opitz, D.,& Shavlik, J. (1999).A genetic algorithm approach for creating neural network ensembles. In A. Sharkey (Ed.), Combining artificial neural nets (pp. 79–99). London: Springer-Verlag.
Parmanto, B., Munro, P., & Doyle, H. (1996). Reducing variance of committee prediction with resampling techniques. Connection Science, 8:3/4, 405–425.
Partridge, D., & Krzanowski, W. J. (1997). Software diversity: Practical statistics for its measurement and exploitation. Information & Software Technology, 39, 707–717.
Rosen, B. (1996). Ensemble learning using decorrelated neural networks. Connection Science, 8:3/4, 373–383.
Ruta, D., & Gabrys, B. (2001). Application of the evolutionary algorithms for classifier selection in multiple classifier systems with majority voting. In J. Kittler, & F. Roli (Eds.), Proc. Second International Workshop on Multiple Classifier Systems, Vol. 2096 of Lecture Notes in Computer Science (pp. 399–408). Cambridge, UK. Springer-Verlag.
Schapire, R. (1999). Theoretical views of boosting. In Proc. 4th European Conference on Computational Learning Theory (pp. 1–10).
Sharkey, A., & Sharkey, N. (1997). Combining diverse neural nets. The Knowledge Engineering Review, 12:3, 231–247.
Skalak, D. (1996). The sources of increased accuracy for two proposed boosting algorithms. In Proc. American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop.
Sneath, P., & Sokal, R. (1973). Numerical Taxonomy. W.H. Freeman & Co.
Tumer, K., & Ghosh, J. (1996a). Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition, 29:2, 341–348.
Tumer, K., & Ghosh, J. (1996b). Error correlation and error reduction in ensemble classifiers. Connection Science, 8:3/4, 385–404.
Tumer, K.,& Ghosh, J. (1999). Linear and order statistics combiners for pattern classification. In A. Sharkey (Ed.), Combining artificial neural nets (pp. 127–161). London: Springer-Verlag.
Yule, G. (1900). On the association of attributes in statistics. Phil. Trans., A, 194, 257–319.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kuncheva, L.I., Whitaker, C.J. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy. Machine Learning 51, 181–207 (2003). https://doi.org/10.1023/A:1022859003006
Issue Date:
DOI: https://doi.org/10.1023/A:1022859003006