Abstract
Machine learning can help predict critical educational outcomes, but its “black-box” nature is a significant challenge for its broad adoption in educational settings. This study employs a variety of supervised learning algorithms applied to data from Burkina Faso’s 2019 Program for the Analysis of CONFEMEN Education Systems. Shapley Additive Explanation (SHAP) is then used on the selected algorithms to identify the most significant factors influencing student learning outcomes. The objectives of the study are to (1) to apply and evaluate supervised learning models (classification and regression) to achieve the highest performance in predicting student learning outcomes; (2) to apply Shapley Additive Explanation (SHAP) to extract the features with the highest predictive power of students’ learning outcomes. Results showed that K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) have the best predictive power for classification tasks. Likewise, the Random Forest Regressor showed the best predictive accuracy for the regression task. SHAP values were then utilized to determine feature contribution to predictions. The key predictive features identified are “local development,” “community involvement,” “school infrastructure,” and “teacher years of experience.” These findings suggest that learning outcomes are significantly influenced by community and infrastructural factors and teacher experience. The implications of this study are substantial for educational policymakers and practitioners. Emphasizing “local development” and “community involvement” underscores the necessity of community engagement programs and partnerships. Prioritizing investments in school infrastructure can enhance the learning environment, while recognizing the impact of teacher years of experience highlights the need for professional development and retention strategies for educators. These insights advocate for a comprehensive approach to improving educational outcomes through targeted investments and strategic community collaborations.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from Program for the Analysis of CONFEMEN Education Systems (PASEC), but restrictions apply to the availability of these data, which were used under license for the current study and so are not publicly available. The data are, however, available from the author upon reasonable request and with the permission of Program for the Analysis of CONFEMEN Education Systems (PASEC).
References
Afrin, F., Hamilton, M., & Thevathyan, C. (2022). On the Explanation of AI-Based Student Success Prediction. In D. Groen, C. de Mulatier, M. Paszynski, V. V. Krzhizhanovskaya, J. J. Dongarra, & P. M. A. Sloot (Eds.), Computational Science– ICCS 2022 (pp. 252–258). Springer International Publishing. https://doi.org/10.1007/978-3-031-08754-7_34
Al-Alawi, L., Shaqsi, A., Tarhini, J., A., & Al-Busaidi, A. S. (2023). Using machine learning to predict factors affecting academic performance: The case of college students on academic probation. Education and Information Technologies. https://doi.org/10.1007/s10639-023-11700-0
Al-Shehri, H., Al-Qarni, A., Al-Saati, L., Batoaq, A., Badukhen, H., Alrashed, S., Alhiyafi, J., & Olatunji, S. O. (2017). Student performance prediction using Support Vector Machine and K-Nearest Neighbor. 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), 1–4. https://doi.org/10.1109/CCECE.2017.7946847
Baker, R. S. J. (2009). K. Yacef (Ed.), The state of Educational Data Mining in 2009: A review and future visions. Journal of Educational Data Mining 1 1 Article1 https://doi.org/10.5281/zenodo.3554657.
Belmonte, A., Bove, V., D’Inverno, G., & Modica, M. (2020). School infrastructure spending and educational outcomes: Evidence from the 2012 earthquake in Northern Italy. Economics of Education Review, 75, 101951. https://doi.org/10.1016/j.econedurev.2019.101951
Berrar, D. (2019). Cross-Validation. In S. Ranganathan, M. Gribskov, K. Nakai, & C. Schönbach (Eds.), Encyclopedia of Bioinformatics and Computational Biology (pp. 542–545). Academic Press. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152. https://doi.org/10.1145/130385.130401
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Chaushi, B. A., Selimi, B., Chaushi, A., & Apostolova, M. (2023). Explainable Artificial Intelligence in Education: A Comprehensive Review. In L. Longo (Ed.), Explainable Artificial Intelligence (pp. 48–71). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-44067-0_3
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Curia, F. (2023). Explainable and transparency machine learning approach to predict diabetes develop. Health and Technology, 13(5), 769–780. https://doi.org/10.1007/s12553-023-00781-z
de Amorim, L. B. V., Cavalcanti, G. D. C., & Cruz, R. M. O. (2023). The choice of scaling technique matters for classification performance. Applied Soft Computing, 133, 109924. https://doi.org/10.1016/j.asoc.2022.109924
Er, E. (2023). An Explainable Machine Learning Approach to Predicting and understanding dropouts in MOOCs. Kastamonu Eğitim Dergisi, 153–164. https://doi.org/10.24106/kefdergi.1246458
Gayen, B. (2024). ARTIFICIAL INTELLIGENCE IN EDUCATION. Redshine Archive. https://doi.org/10.25215/9392917589.06
Goodman, B., & Flaxman, S. (2017). European Union regulations on algorithmic decision-making and a right to explanation. AI Magazine, 38(3). https://doi.org/10.1609/aimag.v38i3.2741
Guleria, P., & Sood, M. (2023). Explainable AI and machine learning: Performance evaluation and explainability of classifiers on educational data mining inspired career counseling. Education and Information Technologies, 28(1), 1081–1116. https://doi.org/10.1007/s10639-022-11221-2
Gull, H., Saqib, M., Iqbal, S. Z., & Saeed, S. (2020). Improving Learning Experience of Students by Early Prediction of Student Performance using Machine Learning. 2020 IEEE International Conference for Innovation in Technology (INOCON), 1–4. https://doi.org/10.1109/INOCON50539.2020.9298266
Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., Scardapane, S., Spinelli, I., Mahmud, M., & Hussain, A. (2023). Interpreting Black-Box models: A review on explainable Artificial Intelligence. Cognitive Computation. https://doi.org/10.1007/s12559-023-10179-8
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Springer. https://doi.org/10.1007/978-0-387-84858-7
He, H., & Ma, Y. (2013). Imbalanced Learning: Foundations, algorithms, and applications. Wiley.
Ingersoll, R. M., & Strong, M. (2011). The Impact of Induction and Mentoring Programs for beginning teachers: A critical review of the Research. Review of Educational Research, 81(2), 201–233. https://doi.org/10.3102/0034654311403323
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning: With Applications in R. Springer US. https://doi.org/10.1007/978-1-0716-1418-1
Jang, Y., Choi, S., Jung, H., & Kim, H. (2022). Practical early prediction of students’ performance using machine learning and eXplainable AI. Education and Information Technologies, 27(9), 12855–12889. https://doi.org/10.1007/s10639-022-11120-6
Joseph, V. R. (2022). Optimal ratio for data splitting. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15. https://doi.org/10.1002/sam.11583
Kelleher, J. D., Namee, B. M., & D’Arcy, A. (2015). Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and Case studies. MIT Press.
Khosravi, H., Shum, S. B., Chen, G., Conati, C., Tsai, Y. S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S., & Gašević, D. (2022). Explainable Artificial Intelligence in education. Computers and Education: Artificial Intelligence, 3, 100074. https://doi.org/10.1016/j.caeai.2022.100074
Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221–232. https://doi.org/10.1007/s13748-016-0094-0
Kuhn, M., & Johnson, K. (2013). Applied Predictive modeling. Springer. https://doi.org/10.1007/978-1-4614-6849-3
Kukkar, A., Mohana, R., Sharma, A., & Nayyar, A. (2024). A novel methodology using RNN + LSTM + ML for predicting student’s academic performance. Education and Information Technologies. https://doi.org/10.1007/s10639-023-12394-0
Lamooki, S. R., Hajifar, S., Hannan, J., Sun, H., Megahed, F., & Cavuoto, L. (2022). Classifying tasks performed by electrical line workers using a wrist-worn sensor: A data analytic approach. PLOS ONE, 17(12), e0261765. https://doi.org/10.1371/journal.pone.0261765
Lundberg, S., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions (arXiv:1705.07874). arXiv. https://doi.org/10.48550/arXiv.1705.07874
McCarren, L. (2023). Machine learning to predict student performance based on well-being data: A technical and ethical discussion. KTH Royal Institute of Technology.
Miao, J., & Niu, L. (2016). A survey on feature selection. Procedia Computer Science, 91, 919–926. https://doi.org/10.1016/j.procs.2016.07.111
Pardo, A., Jovanovic, J., Dawson, S., Gašević, D., & Mirriahi, N. (2019). Using learning analytics to scale the provision of personalised feedback. British Journal of Educational Technology, 50(1), 128–138. https://doi.org/10.1111/bjet.12592
Pardos, Z. A., Baker, R. S. J. D., Pedro, M. S., Gowda, S. M., & Gowda, S. M. (2014). Affective States and State tests: Investigating how affect and Engagement during the School Year Predict End-of-year learning outcomes. Journal of Learning Analytics, 1(1). https://doi.org/10.18608/jla.2014.11.6
Pawlicka, A., Pawlicki, M., Kozik, R., Kurek, W., & Choraś, M. (2024). How Explainable Is Explainability? Towards Better Metrics for Explainable AI. In A. Visvizi, O. Troisi, & V. Corvello (Eds.), Research and Innovation Forum 2023 (pp. 685–695). Springer International Publishing. https://doi.org/10.1007/978-3-031-44721-1_52
Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and Predicting Students’ performance by means of machine learning: A review. Applied Sciences, 10(3). https://doi.org/10.3390/app10031042
Rezk, S. S., & Selim, K. S. (2024). Comparing nine machine learning classifiers for school-dropouts using a revised performance measure. Journal of Computational Social Science. https://doi.org/10.1007/s42001-024-00281-8
Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146. https://doi.org/10.1016/j.eswa.2006.04.005
Taniguchi, K., & Hirakawa, Y. (2016). Dynamics of community participation, student achievement and school management: The case of primary schools in a rural area of Malawi. Compare: A Journal of Comparative and International Education, 46(3), 479–502. https://doi.org/10.1080/03057925.2015.1038500
Taylor, I. (2024). Is explainable AI responsible AI? AI & SOCIETY. https://doi.org/10.1007/s00146-024-01939-7
Valentin, Y., Fail, G., & Pavel, U. (2022). Shapley values to explain machine learning models of school student’s academic performance during COVID-19. 3 C TIC: Cuadernos de Desarrollo Aplicados a Las TIC, 11(2), 136–144. https://doi.org/10.17993/3ctic.2022.112.136-144
Venkatesan, R. G., Karmegam, D., & Mappillairaju, B. (2023). Exploring statistical approaches for predicting student dropout in education: A systematic review and meta-analysis. Journal of Computational Social Science. https://doi.org/10.1007/s42001-023-00231-w
Wang, S., Wang, F., Zhu, Z., Wang, J., Tran, T., & Du, Z. (2024). Artificial intelligence in education: A systematic literature review. Expert Systems with Applications, 252, 124167. https://doi.org/10.1016/j.eswa.2024.124167
Xiao, K., Pan, X., Zhang, Y., Tao, X., & Huang, Z. (2023). Predicting Learners’ Performance Using MOOC Clickstream. In X. Yang, H. Suhartanto, G. Wang, B. Wang, J. Jiang, B. Li, H. Zhu, & N. Cui (Eds.), Advanced Data Mining and Applications (pp. 607–619). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-46674-8_42
Yassine, S., Kadry, S., & Sicilia, M. A. (2016). Learning Analytics and Learning Objects Repositories: Overview and Future Directions. In M. J. Spector, B. B. Lockee, & M. D. Childress (Eds.), Learning, Design, and Technology: An International Compendium of Theory, Research, Practice, and Policy (pp. 1–29). Springer International Publishing. https://doi.org/10.1007/978-3-319-17727-4_13-1
Yu, S., & Lu, Y. (2021). An Introduction to Artificial Intelligence in Education. Springer Singapore. https://link.springer.com/book/10.1007/978-981-16-2770-5
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
SANFO, JB.M. Application of explainable artificial intelligence approach to predict student learning outcomes. J Comput Soc Sc 8, 9 (2025). https://doi.org/10.1007/s42001-024-00344-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42001-024-00344-w