Statistical Entropy Measures in C4.5 Trees | IGI Global Scientific Publishing
Reference Hub8
Statistical Entropy Measures in C4.5 Trees

Statistical Entropy Measures in C4.5 Trees

Aldo Ramirez Arellano, Juan Bory-Reyes, Luis Manuel Hernandez-Simon
Copyright: © 2018 |Volume: 14 |Issue: 1 |Pages: 14
ISSN: 1548-3924|EISSN: 1548-3932|EISBN13: 9781522542643|DOI: 10.4018/IJDWM.2018010101
Cite Article Cite Article

MLA

Arellano, Aldo Ramirez, et al. "Statistical Entropy Measures in C4.5 Trees." IJDWM vol.14, no.1 2018: pp.1-14. https://doi.org/10.4018/IJDWM.2018010101

APA

Arellano, A. R., Bory-Reyes, J., & Hernandez-Simon, L. M. (2018). Statistical Entropy Measures in C4.5 Trees. International Journal of Data Warehousing and Mining (IJDWM), 14(1), 1-14. https://doi.org/10.4018/IJDWM.2018010101

Chicago

Arellano, Aldo Ramirez, Juan Bory-Reyes, and Luis Manuel Hernandez-Simon. "Statistical Entropy Measures in C4.5 Trees," International Journal of Data Warehousing and Mining (IJDWM) 14, no.1: 1-14. https://doi.org/10.4018/IJDWM.2018010101

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

The main goal of this article is to present a statistical study of decision tree learning algorithms based on the measures of different parametric entropies. Partial empirical evidence is presented to support the conjecture that the parameter adjusting of different entropy measures might bias the classification. Here, the receiver operating characteristic (ROC) curve analysis, precisely, the area under the ROC curve (AURC) gives the best criterion to evaluate decision trees based on parametric entropies. The authors emphasize that the improvement of the AURC relies on of the type of each dataset. The results support the hypothesis that parametric algorithms are useful for datasets with numeric and nominal, but not for mixed, attributes; thus, four hybrid approaches are proposed. The hybrid algorithm, which is based on Renyi entropy, is suitable for nominal, numeric, and mixed datasets. Moreover, it requires less time when the number of nodes is reduced, when the AURC is maintaining or increasing, thus it is preferable in large datasets.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global Scientific Publishing bookstore.