Ordinal Decision-Tree-Based Ensemble Approaches: The Case of Controlling the Daily Local Growth Rate of the COVID-19 Epidemic
Abstract
:1. Introduction
1.1. Conventional Approaches for Predicting the Dynamic Spread of an Epidemic
1.2. Classification Methods for the Evaluation of Different Factors Affecting the Spread of an Epidemic
2. Materials and Methods
2.1. An Objective-Based Information Gain Measure for Ordinal Decision-Tree-Based Algorithms
2.2. Incorporating the Objective-Based Information Gain Measure into Ensemble Methods
2.2.1. Ordinal Random Forest
2.2.2. Ordinal AdaBoost
2.2.3. Ensemble Approach Based on Decision-Tree-Based Algorithms
2.3. The Dataset and Data Preparation
3. Results
3.1. A Comparison between Ordinal CART Classifiers and the Popular Non-Ordinal CART Classifier
3.2. A Comparison between Ordinal and Non-Ordinal Ensemble-Based Classifiers
3.3. A Comparison between the Predictions of Ordinal Classifiers and Non-Ordinal Classifiers
3.4. A Comparison between Ordinal Classifiers and Non-Ordinal Classifiers
3.5. Ensemble Majority Voting Approach Based on Ordinal Classifiers
4. Conclusions and Discussion
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A. Full Dataset Used as an Input to the Classification Models
Category | Fields | Field Type(s) |
Key fields | date, region | Date, Categorical |
Italy’s COVID-19 patient data | latitude, longitude, hospitalized with symptoms, intensive care patients, total hospitalized patients, people in home isolation, current positive cases, change in total positive, new currently positive, discharged people healed, deceased people, total positive cases, tests performed | Numerical |
Weather | max temperature (°F), average (avg) temperature (°F), min temperature (°F), max dew point (°F), avg dew point (°F), min dew point (°F), max humidity (%), avg humidity (%), min humidity (%), max wind speed (mph), avg wind speed (mph), min wind speed (mph), max pressure (Hg), avg pressure (Hg), min pressure (Hg) | Numerical |
Containment | change in guidelines for reporting cases, basketball league suspension, public transport cleaning, proactively checking symptomatic patients, decree to lock down, outdoor gatherings banned, religious activity cancellation, massive cluster isolation, massive school closure, domestic travel limitation, sports cancellation, university closure, social distancing, limiting overload of supermarkets by customers, massive police patrol | Boolean |
Target | daily growth rate | Categorical (ordinal) |
References
- Yang, Z.; Zeng, Z.; Wang, K.; Wong, S.S.; Liang, W.; Zanin, M.; Liu, P.; Cao, X.; Gao, Z.; Mai, Z.; et al. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thorac. Dis. 2020, 12, 165–174. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Wang, Z.; Dong, Y.; Chang, R.; Xu, C.; Yu, X.; Zhang, S.; Tsamlag, L.; Shang, M.; Huang, J.; et al. Phase-adjusted estimation of the number of coronavirus disease 2019 cases in Wuhan, China. Cell Discov. 2020, 6, 1–8. [Google Scholar]
- Chen, T.M.; Rui, J.; Wang, Q.P.; Zhao, Z.Y.; Cui, J.A.; Yin, L. A mathematical model for simulating the phase-based transmissibility of a novel coronavirus. Infect. Dis. Poverty 2020, 9, 1–8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, T.; Rui, J.; Wang, Q.; Zhao, Z.; Cui, J.A.; Yin, L. A mathematical model for simulating the transmission of Wuhan novel Coronavirus. bioRxiv 2020. [Google Scholar] [CrossRef] [Green Version]
- Getz, W.M.; Salter, R.; Mgbara, W. Adequacy of SEIR models when epidemics have spatial structure: Ebola in Sierra Leone. Philos. Trans. Royal Soc. B 2019, 374, 20180282. [Google Scholar] [CrossRef] [Green Version]
- Kramer, A.M.; Tomlin Pulliam, J.; Alexander, L.W.; Park, A.W.; Rohani, P.; Drake, J.M. Spatial spread of the West Africa Ebola epidemic. Open Sci. 2016, 3, 160294. [Google Scholar] [CrossRef] [Green Version]
- Getz, W.M.; Salter, R.; Lyons, A.J.; Sippl-Swezey, N. Panmictic and clonal evolution on a single patchy resource produces polymorphic foraging guilds. PLoS ONE 2015, 10, e0133732. [Google Scholar] [CrossRef]
- Mecenas, P.; Bastos, R.; Vallinoto, A.; Normando, D. Effects of temperature and humidity on the spread of COVID-19: A systematic review. MedRxiv 2020. [Google Scholar] [CrossRef]
- Pedersen, M.G.; Meneghini, M. Quantifying undetected COVID-19 cases and effects of containment measures in Italy. Preprint 2020. [Google Scholar] [CrossRef]
- Mastrandrea, R.; Barrat, A. How to estimate epidemic risk from incomplete contact diaries data? PLoS Comput. Biol. 2016, 12, e1005002. [Google Scholar] [CrossRef] [Green Version]
- Feng, Y.; Wang, B.C. A unified framework of epidemic spreading prediction by empirical mode decomposition-based ensemble learning techniques. IEEE Trans. Comput. Soc. Syst. 2019, 6, 660–669. [Google Scholar] [CrossRef]
- Shi, B.; Zhong, J.; Bao, Q.; Qiu, H.; Liu, J. EpiRep: Learning node representations through epidemic dynamics on networks. In Proceedings of the 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI), Thessaloniki, Greece, 14–17 October 2019; pp. 486–492. [Google Scholar]
- Teng, Y.; Bi, D.; Guo, X.; Paul, R. Predicting the Epidemic Potential and Global Diffusion of Mosquito-Borne Diseases Using Machine Learning. Available online: http://dx.doi.org/10.2139/ssrn.3260785 (accessed on 10 March 2018).
- Chekol, B.E.; Hagras, H. Employing machine learning techniques for the malaria epidemic prediction in Ethiopia. In Proceedings of the 10th Computer Science and Electronic Engineering (CEEC), Colchester, UK, 19–21 September 2018; pp. 89–94. [Google Scholar]
- Ma, J. Estimating epidemic exponential growth rate and basic reproduction number. Infect. Dis. Model. 2020, 5, 129–141. [Google Scholar] [CrossRef] [PubMed]
- Frank, E.; Hall, M. A simple approach to ordinal classification. In Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany, 5–7 September 2001; Springer: Berlin/Heidelberg, Germany, 2001; pp. 145–156. [Google Scholar]
- Gaudette, L.; Japkowicz, N. Evaluation methods for ordinal classification. In Proceedings of the Canadian Conference on Artificial Intelligence, Kelowna, BC, Canada, 25–27 May 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 207–210. [Google Scholar]
- Cardoso, J.S.; Sousa, R. Measuring the performance of ordinal classification. Int. J. Pattern Recognit. Artif. Intell. 2011, 25, 1173–1195. [Google Scholar] [CrossRef] [Green Version]
- Destercke, S.; Yang, G. Cautious ordinal classification by binary decomposition. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Nancy, France, 15–19 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 323–337. [Google Scholar]
- Gutierrez, P.A.; Perez-Ortiz, M.; Sanchez-Monedero, J.; Fernandez-Navarro, F.; Hervas-Martinez, C. Ordinal regression methods: Survey and experimental study. IEEE Trans. Knowl. Data Eng. 2015, 28, 127–146. [Google Scholar] [CrossRef] [Green Version]
- Verbeke, W.; Martens, D.; Baesens, B. RULEM: A novel heuristic rule learning approach for ordinal classification with monotonicity constraints. Appl. Soft Comput. 2017, 60, 858–873. [Google Scholar] [CrossRef]
- Ben-David, A.; Sterling, L.; Pao, Y.-H. Learning and classification of monotonic ordinal concepts. Comput. Intell. 1989, 5, 45–49. [Google Scholar] [CrossRef]
- Ben-David, A. Monotonicity maintenance in information-theoretic machine learning algorithms. Mach. Learn. 1995, 19, 29–43. [Google Scholar]
- Christophe, M.; Petturiti, D. Monotone classification with decision trees. In Proceedings of the 8th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT-13), Milan, Italy, 11–13 September 2013; Atlantis Press: Paris, France, 2013. [Google Scholar]
- Zhu, H.; Tsang, E.C.; Wang, X.-Z.; Ashfaq, R.A.R. Monotonic classification extreme learning machine. Neurocomputing 2017, 225, 205–213. [Google Scholar] [CrossRef]
- Ben-David, A.; Sterling, L.; Tran, T. Adding monotonicity to learning algorithms may impair their accuracy. Expert Syst. Appl. 2009, 36, 6627–6634. [Google Scholar] [CrossRef]
- Singer, G.; Anuar, R.; Ben-Gal, I. A weighted information-gain measure for ordinal classification trees. Expert Syst. Appl. 2020, 152, 113375. [Google Scholar] [CrossRef]
- Singer, G.; Cohen, I. An objective-based entropy approach for interpretable models in support of human resource management: The case of absenteeism at work. Entropy 2020, 22, 821. [Google Scholar] [CrossRef]
- Singer, G.; Golan, M.; Rabin, N.; Kleper, D. Evaluation of the effect of learning disabilities and accommodations on the prediction of the stability of academic behaviour of undergraduate engineering students using decision trees. Eur. J. Eng. Educ. 2020, 45, 614–630. [Google Scholar] [CrossRef]
- Singer, G.; Golan, M. Identification of subgroups of terror attacks with shared characteristics for the purpose of preventing mass-casualty attacks: A data-mining approach. Crime Sci. 2019, 8, 14. [Google Scholar] [CrossRef]
- Moral-García, S.; Castellano, J.G.; Mantas, C.J.; Montella, A.; Abellán, J. Decision tree ensemble method for analyzing traffic accidents of novice drivers in urban areas. Entropy 2019, 21, 360. [Google Scholar] [CrossRef] [Green Version]
- Zhou, Z.H. Ensemble Learning. Encycl. Biom. 2009, 1, 270–273. [Google Scholar]
- Bishop, C.M. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
- Kittler, J.; Hatef, M.; Duin, R.P.W.; Matas, J. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 226–239. [Google Scholar] [CrossRef] [Green Version]
- Yıldırım, P.; Birant, U.K.; Birant, D. EBOC: Ensemble-based ordinal classification in transportation. J. Adv. Transp. 2019, 2019, 4145353. [Google Scholar] [CrossRef]
- Liang, D.; Tsai, C.F.; Dai, A.J.; Eberle, W. A novel classifier ensemble approach for financial distress prediction. Knowl. Inf. Syst. 2018, 54, 437–462. [Google Scholar] [CrossRef]
- Sathyadevan, S.; Nair, R.R. Comparative analysis of decision tree algorithms: ID3, C4.5 and random forest. In Computational Intelligence in Data Mining—Volume 1, Proceedings of the International Conference on CIDM, 5–6 December 2015; Behera, H.S., Mohapatra, D.P., Eds.; Springer: New Delhi, India, 2016; pp. 549–562. [Google Scholar]
- Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
- Masetic, Z.; Subasi, A. Congestive heart failure detection using random forest classifier. Comput. Methods Programs Biomed. 2016, 130, 54–64. [Google Scholar] [CrossRef]
- Wang, Y.; Han, P.; Lu, X.; Wu, R.; Huang, J. The performance comparison of Adaboost and SVM applied to SAR ATR. In Proceedings of the 2006 CIE International Conference on Radar, Shanghai, China, 16–19 October 2006; pp. 1–4. [Google Scholar]
- Vezhnevets, A.; Vezhnevets, V. Modest AdaBoost—Teaching AdaBoost to generalize better. Graphicon 2005, 12, 987–997. [Google Scholar]
- Sun, B.; Chen, S.; Wang, J.; Chen, H. A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl.-Based Syst. 2016, 102, 87–102. [Google Scholar] [CrossRef]
- Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
- Kumar, S. Covid19 in Italy. Available online: https://www.kaggle.com/sudalairajkumar/covid19-in-italy (accessed on 12 July 2020).
- The Weather Channel; Wunderground. The Weather Company, an IBM Business. Available online: https://www.wunderground.com (accessed on 12 July 2020).
- Epidemic Forecasting Global NPI (EFGNPI). Available online: http://epidemicforecasting.org/ (accessed on 12 July 2020).
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Cardoso, J.S.; Costa, J.F. Learning to classify ordinal data: The data replication method. J. Mach. Learn. Res. 2007, 8, 1393–1429. [Google Scholar]
Performance Measures for Classification | Performance Measures for Ordinal Classification | ||||
---|---|---|---|---|---|
F-Score | Accuracy | AUC | MSE | ||
Non-ordinal classifier | |||||
CART | 0.361 | 0.379 | 0.493 | 1.537 | −0.079 |
Ordinal classifiers | |||||
Ordinal CART–OBE() | 0.391 | 0.389 | 0.567 | 1.684 | −0.011 |
Ordinal CART–OBE() | 0.366 | 0.358 | 0.518 | 1.211 | 0.068 |
Ordinal CART–OBE() | 0.385 | 0.389 | 0.526 | 1.274 | 0.090 |
Ordinal CART–OBE() | 0.409 | 0.442 | 0.535 | 1.316 | 0.016 |
Performance Measures for Classification | Performance Measures for Ordinal Classification | ||||
---|---|---|---|---|---|
F-Score | Accuracy | AUC | MSE | ||
Non-ordinal ensemble classifier | |||||
ADABoost | 0.380 | 0.411 | 0.507 | 1.253 | 0.006 |
Ordinal AdaBoost classifies | |||||
Ordinal AdaBoost–OBE() | 0.377 | 0.381 | 0.504 | 1.56 | −0.093 |
Ordinal AdaBoost–OBE() | 0.475 | 0.526 | 0.578 | 1.137 | 0.163 |
Ordinal AdaBoost–OBE() | 0.405 | 0.484 | 0.538 | 1.242 | 0.072 |
Ordinal AdaBoost–OBE() | 0.447 | 0.474 | 0.563 | 1.221 | 0.122 |
Performance Measures for Classification | Performance Measures for Ordinal Classification | ||||
---|---|---|---|---|---|
F-Score | Accuracy | AUC | MSE | ||
Non-ordinal ensemble classifier | |||||
Random forest (RF) | 0.405 | 0.400 | 0.540 | 1.421 | 0.035 |
Ordinal random forest classifiers | |||||
Ordinal RF–OBE() | 0.439 | 0.484 | 0.570 | 1.147 | 0.185 |
Ordinal RF–OBE() | 0.407 | 0.453 | 0.559 | 1.400 | 0.126 |
Ordinal RF–OBE() | 0.425 | 0.484 | 0.557 | 1.211 | 0.131 |
Ordinal RF–OBE() | 0.437 | 0.442 | 0.555 | 1.411 | 0.026 |
Paired t-Test p-Value | |||
---|---|---|---|
Ordinal CART—OBE | Ordinal AdaBoost—OBE() | Ordinal RF—OBE() | |
Non-ordinal counterpart | 0.14 | 0.0057 | 0.0015 |
Performance Measures for Classification | Performance Measures for Ordinal Classification | ||||
---|---|---|---|---|---|
F-Score | Accuracy | AUC | MSE | ||
Non-ordinal classifiers | |||||
Naïve Bayes | 0.246 | 0.305 | 0.478 | 0.916 | −0.065 |
Logistic regression | 0.453 | 0.505 | 0.560 | 1.189 | 0.060 |
Gradient boosting | 0.347 | 0.356 | 0.481 | 1.611 | −0.117 |
XGBoost | 0.378 | 0.389 | 0.506 | 1.558 | −0.077 |
K-nearest neighbor | 0.433 | 0.453 | 0.543 | 1.305 | 0.013 |
AdaBoost | 0.380 | 0.411 | 0.507 | 1.253 | 0.006 |
Random forest | 0.405 | 0.400 | 0.540 | 1.421 | 0.035 |
CART | 0.361 | 0.379 | 0.493 | 1.537 | −0.079 |
Ordinal classifiers | |||||
Ordinal CART–OBE() | 0.409 | 0.442 | 0.535 | 1.316 | 0.016 |
Ordinal AdaBoost—OBE() | 0.475 | 0.526 | 0.578 | 1.137 | 0.163 |
Ordinal RF—OBE() | 0.439 | 0.484 | 0.570 | 1.147 | 0.185 |
Performance Measure | Majority Voting Model Based on Ordinal and Non-Ordinal Classifiers | Best Non-Ordinal Classifier: Logistic Regression | Best ordinal classifier: ordinal AdaBoost based on OBE |
---|---|---|---|
F-score | 0.501 | 0.453 | 0.475 |
Accuracy | 0.558 | 0.505 | 0.526 |
AUC | 0.596 | 0.560 | 0.578 |
MSE | 1.074 | 1.189 | 1.137 |
0.200 | 0.060 | 0.016 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Singer, G.; Marudi, M. Ordinal Decision-Tree-Based Ensemble Approaches: The Case of Controlling the Daily Local Growth Rate of the COVID-19 Epidemic. Entropy 2020, 22, 871. https://doi.org/10.3390/e22080871
Singer G, Marudi M. Ordinal Decision-Tree-Based Ensemble Approaches: The Case of Controlling the Daily Local Growth Rate of the COVID-19 Epidemic. Entropy. 2020; 22(8):871. https://doi.org/10.3390/e22080871
Chicago/Turabian StyleSinger, Gonen, and Matan Marudi. 2020. "Ordinal Decision-Tree-Based Ensemble Approaches: The Case of Controlling the Daily Local Growth Rate of the COVID-19 Epidemic" Entropy 22, no. 8: 871. https://doi.org/10.3390/e22080871
APA StyleSinger, G., & Marudi, M. (2020). Ordinal Decision-Tree-Based Ensemble Approaches: The Case of Controlling the Daily Local Growth Rate of the COVID-19 Epidemic. Entropy, 22(8), 871. https://doi.org/10.3390/e22080871