Abstract
Learning analytics have proved promising capabilities and opportunities to many aspects of academic research and higher education studies. Data-driven insights can significantly contribute to provide solutions for curbing costs and improving education quality. This paper adopts a two-phase machine learning approach, which utilizes both unsupervised and supervised learning techniques for predicting outcomes of students following Higher Education programs of studies. The approach has been applied in a case-study which has been performed in the context of an undergraduate Computer Science curriculum offered by the University of Thessaly in Greece. Students involved in the case study were initially grouped based on the similarity of specific education-related factors and metrics. Using the K-Means algorithm, our clustering experiments revealed the presence of three coherent clusters of students. Subsequently, the discovered clusters were utilized to train prediction models for addressing each particular cluster of students individually. In this regard, two machine learning models were trained for every cluster of students in order to predict the time to degree completion and student enrollment in the offered educational programs. The developed models are claimed to produce predictions with relatively high accuracy. Finally, the paper discusses the potential usefulness of the clustering-aided approach for learning analytics in Higher Education.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abidi, S. M. R., Hussain, M., Xu, Y., & Zhang, W. (2018). Prediction of confusion attempting algebra homework in an intelligent tutoring system through machine learning techniques for educational sustainable development. Sustainability (Switzerland), 11(1). https://doi.org/10.3390/su11010105.
Abubakar, Y., & Ahmad, N. B. H. (2017). Prediction of students ’ performance in E- learning environment using random Forest. International Journal of Innovative Computing, 7(2), 1–5.
Aldowah, H., Al-Samarraie, H., & Fauzy, W. M. (2019). Educational data mining and learning analytics for 21st century higher education: A review and synthesis. Telematics and Informatics, 37(April 2018), 13–49. https://doi.org/10.1016/j.tele.2019.01.007.
Al-Shehri, H., Al-Qarni, A., Al-Saati, L., Batoaq, A., Badukhen, H., Alrashed, S., … Olatunji, S. (2017). Student performance prediction using support vector machine and K-nearest neighbor. Canadian Conference on Electrical and Computer Engineering, 1–4. https://doi.org/10.1109/CCECE.2017.7946847.
Anand, V. K., Abdul Rahiman, S. K., Ben George, E., & Huda, A. S. (2018). Recursive clustering technique for students’ performance evaluation in programming courses. Proceedings of Majan international conference: Promoting entrepreneurship and technological skills: National Needs, global trends, MIC 2018, 1–5. https://doi.org/10.1109/MINTC.2018.8363153.
Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers in Education, 113, 177–194. https://doi.org/10.1016/j.compedu.2017.05.007.
Bharara, S., Sabitha, S., & Bansal, A. (2018). Application of learning analytics using clustering data Mining for Students’ disposition analysis. Education and Information Technologies, 23(2), 957–984. https://doi.org/10.1007/s10639-017-9645-7.
Bhogan, S., Sawant, K., Naik, P., Shaikh, R., Diukar, O., & Dessai, S. (2017). Predicting student performance based on clustering and classification. IOSR Journal of Computer Engineering, 19(03), 49–52. https://doi.org/10.9790/0661-1903054952.
Breiman, L. (2001). Random forests. Machine Learning, 1–122. https://doi.org/10.1201/9780367816377-11.
Burgos, C., Campanario, M. L., de la Peña, D., Lara, J. A., Lizcano, D., & Martínez, M. A. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers and Electrical Engineering, 66, 541–556. https://doi.org/10.1016/j.compeleceng.2017.03.005.
Cardona, T. A., & Cudney, E. a. (2019). Predicting student retention using support vector machines. Procedia Manufacturing, 39, 1827–1833. https://doi.org/10.1016/j.promfg.2020.01.256.
Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. https://doi.org/10.1145/1143844.1143865.
Chatti, M. A., Dyckhoff, A. L., Schroeder, U., & Thüs, H. (2012). A reference model for learning analytics. International Journal of Technology Enhanced Learning, 4(5–6), 318–331. https://doi.org/10.1504/IJTEL.2012.051815.
Chung, J. Y., & Lee, S. (2019). Dropout early warning systems for high school students using machine learning. Children and Youth Services Review, 96, 346–353. https://doi.org/10.1016/j.childyouth.2018.11.030.
Fan, Z., & Sun, Y. (2017). Clustering of college students based on improved K-means algorithm. Proceedings - 2016 International Computer Symposium, ICS 2016, 676–679. https://doi.org/10.1109/ICS.2016.0139.
Fisher, A., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(vi).
Francis, B. K., & Babu, S. S. (2019). Predicting academic performance of students using a hybrid data mining approach. Journal of Medical Systems, 43(6). https://doi.org/10.1007/s10916-019-1295-4.
Gray, C. C., & Perkins, D. (2019). Utilizing early engagement and machine learning to predict student outcomes. Computers in Education, 131(July 2018), 22–32. https://doi.org/10.1016/j.compedu.2018.12.006.
HQA. (2017). Higher education quality report - 2017. HQA (Vol. 1).
Hussain, M., Zhu, W., Zhang, W., Abidi, S. M. R., & Ali, S. (2019). Using machine learning to predict student difficulties from learning session data. Artificial Intelligence Review, 52(1), 381–407. https://doi.org/10.1007/s10462-018-9620-8.
Iatrellis, O., Kameas, A., & Fitsilis, P. (2017). Academic advising systems: A systematic literature review of empirical evidence. Education in Science, 7(4), 90. https://doi.org/10.3390/educsci7040090.
Iatrellis, O., Kameas, A., & Fitsilis, P. (2019a). A novel integrated approach to the execution of personalized and self-evolving learning pathways. Education and Information Technologies (2019) 24:781-803, 24(ISSN 1360-2357). https://doi.org/10.1007/s10639-018-9802-7.
Iatrellis, O., Kameas, A., & Fitsilis, P. (2019b). EDUC8 pathways: Executing self-evolving and personalized intra-organizational educational processes. Evolving Systems, 11, 227–240. https://doi.org/10.1007/s12530-019-09287-4.
Iatrellis, O., Savvas, I. K., Kameas, A., & Fitsilis, P. (2020). Integrated learning pathways in higher education: A framework enhanced with machine learning and semantics. Education and Information Technologies, 21. https://doi.org/10.1007/s10639-020-10105-7.
Kappe, R., & Van Der Flier, H. (2012). Predicting academic success in higher education: What’s more important than being smart? European Journal of Psychology of Education, 27(4), 605–619. https://doi.org/10.1007/s10212-011-0099-9.
Kizilcec, R. F., Piech, C., & Schneider, E. (2013). Deconstructing disengagement: Analyzing learner subpopulations in massive open online courses. ACM international conference proceeding series, 170–179. https://doi.org/10.1145/2460296.2460330.
Lee, K. (2018). Machine learning approaches for learning analytics: Collaborative filtering or regression with experts ? Korea, 1–11.
MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14), 281–297.
Malhotra, R. (2014). Comparative analysis of statistical and machine learning methods for predicting faulty modules. Applied Soft Computing Journal, 21, 286–297. https://doi.org/10.1016/j.asoc.2014.03.032.
Mason, C., Twomey, J., Wright, D., & Whitman, L. (2018). Predicting engineering student attrition risk using a probabilistic neural network and comparing results with a Backpropagation neural network and logistic regression. Research in Higher Education, 59(3), 382–400. https://doi.org/10.1007/s11162-017-9473-z.
McKenzie, K., & Schweitzer, R. (2001). Who succeeds at university? Factors predicting academic performance in first year Australian university students. Higher Education Research and Development, 20(1), 21–33. https://doi.org/10.1080/07924360120043621.
Muñoz-Merino, P. J., González Novillo, R., & Delgado Kloos, C. (2018). Assessment of skills and adaptive learning for parametric exercises combining knowledge spaces and item response theory. Applied Soft Computing Journal, 68, 110–124. https://doi.org/10.1016/j.asoc.2018.03.045.
Nájera, A. B. U., de la Calleja, J., & Medina, M. A. (2017). Associating students and teachers for tutoring in higher education using clustering and data mining. Computer Applications in Engineering Education, 25(5), 823–832. https://doi.org/10.1002/cae.21839.
Nauta, M. M. (2010). The development, evolution, and status of Holland’s theory of vocational personalities: Reflections and future directions for counseling psychology. Journal of Counseling Psychology, 57(1), 11–22. https://doi.org/10.1037/a0018213.
Oyelade, O. J., Oladipupo, O. O., & Obagbuwa, I. C. (2010). Application of k Means Clustering algorithm for prediction of Students Academic Performance, 7, 292–295. Retrieved from http://arxiv.org/abs/1002.2425
Pang, Y., Judd, N., O’Brien, J., & Ben-Avie, M. (2017). Predicting students’ graduation outcomes through support vector machines. Proceedings - Frontiers in Education Conference, FIE, 2017-Octob, 1–8. https://doi.org/10.1109/FIE.2017.8190666.
Papamitsiou, Z., & Economides, A. A. (2014). Learning analytics and educational data mining in practice: A systemic literature review of empirical evidence. Educational Technology & Society, 17(4), 49–64.
Pasina, I., Bayram, G., Labib, W., Abdelhadi, A., & Nurunnabi, M. (2019). Clustering students into groups according to their learning style. MethodsX, 6, 2189–2197. https://doi.org/10.1016/j.mex.2019.09.026.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572. https://doi.org/10.1080/14786440109462720.
Pliakos, K., Joo, S. H., Park, J. Y., Cornillie, F., Vens, C., & Van den Noortgate, W. (2019). Integrating machine learning into item response theory for addressing the cold start problem in adaptive learning systems. Computers in Education, 137, 91–103. https://doi.org/10.1016/j.compedu.2019.04.009.
Ruiperez-Valiente, J. A., Munoz-Merino, P. J., Alexandron, G., & Pritchard, D. E. (2019). Using machine learning to detect “multiple-account” cheating and analyze the influence of student and problem features. IEEE Transactions on Learning Technologies, 12(1), 112–122. https://doi.org/10.1109/TLT.2017.2784420.
Umair, S., & Majid Sharif, M. (2018). Predicting students grades using artificial neural networks and support vector machine. Encyclopedia of Information Science and Technology, Fourth Edition. https://doi.org/10.4018/978-1-5225-2255-3.ch449.
Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98, 166–173. https://doi.org/10.1016/j.chb.2019.04.015.
Yang, F., & Li, F. W. B. (2018). Study on student performance estimation, student progress analysis, and student potential prediction based on data mining. Computers in Education, 123, 97–108. https://doi.org/10.1016/j.compedu.2018.04.006.
Yang, T. Y., Brinton, C. G., Joe-Wong, C., & Chiang, M. (2017). Behavior-based grade prediction for MOOCs via time series neural networks. IEEE Journal on Selected Topics in Signal Processing, 11(5), 716–728. https://doi.org/10.1109/JSTSP.2017.2700227.
Yue, H., & Fu, X. (2017). Rethinking graduation and time to degree: A fresh perspective. Research in Higher Education, 58(2), 184–213. https://doi.org/10.1007/s11162-016-9420-4.
Zhang, H., Huang, T., Lv, Z., Liu, S., & Yang, H. (2019). MOOCRC: A highly accurate resource recommendation model for use in MOOC environments. Mobile Networks and Applications, 24(1), 34–46. https://doi.org/10.1007/s11036-018-1131-y.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Iatrellis, O., Savvas, I.Κ., Fitsilis, P. et al. A two-phase machine learning approach for predicting student outcomes. Educ Inf Technol 26, 69–88 (2021). https://doi.org/10.1007/s10639-020-10260-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-020-10260-x