Abstract
Ph.D. dropout is a persistent and challenging issue in higher education, with significant implications for individual students, academic institutions, and the broader society. This research paper aims to explore the factors contributing to Ph.D. students' decision to dropout and their interrelationships. For this purpose, we employed the hybrid topic modeling Bidirectional Encoder Representations from Transformers – Latent Dirichlet Allocation (BERT-LDA) algorithm and Numerical Association Rule Mining (NARM) using a genetic algorithm in QuantMiner. We identified and analyzed individual, institutional, and social factors that affect Ph.D. students in leaving their current degrees. The results suggest that financial constraints, inadequate academic preparation, poor mentoring, social isolation, lack of social support, family responsibilities, and work-life balance are significant elements responsible for dropout. These findings also reveal that these factors are interrelated, and their effects can be mitigated by the academic institution's policies and culture. The outcomes of the study have implications for academic institutions, policymakers, and researchers, who can use them to develop evidence-based strategies and interventions that enhance Ph.D. students' retention and success.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
All the datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.
Code availability
Not applicable.
Change history
06 December 2023
Springer Nature’s version of this paper was updated to rectify the incorrect version of the article PDF proof.
References
Ali, A., & Gregg Kohun, F. (2006). Dealing with isolation feelings in is doctoral programs. International Journal of Doctoral Studies, 1, 021–033. 10.28945/58
Ampaw, F. D., & Jaeger, A. J. (2011). Understanding the factors affecting degree completion of doctoral women in the Science and Engineering Fields. New Directions for Institutional Research, 2011(152), 59–73. https://doi.org/10.1002/ir.409
Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2016). Predicting student dropout in higher education. arXiv. https://doi.org/10.48550/arXiv.1606.06364
Austin, A. E. (2002). Preparing the next generation of faculty: Graduate school as socialization to the academic career. The journal of higher education, 73(1), 94–122. https://doi.org/10.1080/00221546.2002.11777132
Bair, C. R., & Haworth, J. G. (2005). Doctoral student attrition and persistence: A meta-synthesis of research. Higher education: Handbook of theory and research, 481-534. https://doi.org/10.1007/1-4020-2456-8
Baker, V. L., & Lattuca, L. R. (2010). Developmental networks and learning: Toward an interdisciplinary perspective on identity development during doctoral study. Studies in higher education, 35(7), 807–827. https://doi.org/10.1080/03075070903501887
Bean, J. P., & Eaton, S. B. (2000). A psychological model of college student retention. Reworking the student departure puzzle, 1, 48–61.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993–1022.
Blyth, S. (1994). Karl Pearson and the correlation curve. International Statistical Review/Revue Internationale de Statistique, 393-403. https://doi.org/10.2307/1403769
Castellanos, J., & Gloria, A. M. (2007). Research considerations and theoretical application for best practices in higher education: Latina/os achieving success. Journal of Hispanic Higher Education, 6(4), 378–396. https://doi.org/10.1177/1538192707305347
Chai, X., Tang, G., Wang, S., Peng, R., Chen, W., & Li, J. (2020). Deep learning for regularly missing data reconstruction. IEEE Transactions on Geoscience and Remote Sensing, 58(6), 4406–4423. https://doi.org/10.1109/TGRS.2020.2963928
Chen, Y., Dong, G., Han, J., Wah, B. W., & Wang, J. (2002). Multi-dimensional regression analysis of time-series data streams. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases (pp. 323–334). Morgan Kaufmann. https://doi.org/10.1016/B978-155860869-6/50036-6
Cyranoski, D., Gilbert, N., Ledford, H., Nayar, A., & Yahia, M. (2011). Education: The phd factory. Nature, 472(7343), 276–279. https://doi.org/10.1038/472276a276-280.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
Fuhrmann, C. N., Halme, D. G., O’sullivan, P. S., & Lindstaedt, B. (2011). Improving graduate education to support a branching career pipeline: Recommendations based on a survey of doctoral students in the basic biomedical sciences. CBE—Life Sciences Education, 10(3), 239–249. https://doi.org/10.1187/cbe.11-02-0013
Gardner, S. K. (2009a). Conceptualizing success in doctoral education: Perspectives of faculty in seven disciplines. The Review of Higher Education, 32(3), 383–406. https://doi.org/10.1353/rhe.0.0075
Gardner, S. K. (2009b). Student and faculty attributions of attrition in high and low-completing doctoral programs in the United States. Higher education, 58, 97–112. https://doi.org/10.1007/s10734-008-9184-7
George, L., & Sumathy, P. (2023). An integrated clustering and BERT framework for improved topic modeling. International Journal of Information Technology, 1–9. https://doi.org/10.1007/s41870-023-01268-w
Golde, C. M. (2000). Should I Stay or Should I Go? Student Descriptions of the Doctoral Attrition Process. The Review of Higher Education, 23(2), 199–227. https://doi.org/10.1353/rhe.2000.0004
Golde, C. M. (2005). The Role of the Department and Discipline in Doctoral Student Attrition: Lessons from Four Departments. The Journal of Higher Education, 76(6), 669–700. https://doi.org/10.1080/00221546.2005.11772304
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl_1), 5228–5235. https://doi.org/10.1073/pnas.030775210
Hauke, J., & Kossowski, T. (2011). Comparison of values of Pearson's and Spearman's correlation coefficients on the same sets of data. Quaestiones geographicae, 30(2), 87–93. https://doi.org/10.2478/v10117-011-0021-1
Heckman, J. J., & LaFontaine, P. A. (2010). The American high school graduation rate: Trends and levels. The review of economics and statistics, 92(2), 244–262. https://doi.org/10.1162/rest.2010.12366
Huang, G. (2021). Missing data filling method based on linear interpolation and lightgbm. In Journal of Physics: Conference Series (Vol. 1754, No. 1, p. 012187). IOP Publishing. https://doi.org/10.1088/1742-6596/1754/1/012187
Hunter, K. H., & Devine, K. (2016). Doctoral students’ emotional exhaustion and intentions to leave academia. International Journal of doctoral studies, 11(2), 35–61. https://doi.org/10.28945/3396
Jaromczyk, J. W., & Toussaint, G. T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502–1517. https://doi.org/10.1109/5.163414
Kaur, M., & Saini, M. (2023). Indian government initiatives on cyberbullying: A case study on cyberbullying in Indian higher education institutions. Education and Information Technologies, 28(1), 581–615. https://doi.org/10.1007/s10639-022-11168-4
Kotsiantis, S. (2009). Educational data mining: a case study for predicting dropout-prone students. International Journal of Knowledge Engineering and Soft Data Paradigms, 1(2), 101–111. https://doi.org/10.1504/IJKESDP.2009.022718
Koufakou, A. (2023). Deep learning for opinion mining and topic classification of course reviews. Education and Information Technologies, 1-25. https://doi.org/10.1007/s10639-023-11736-2
Latif, A., Choudhary, A. I., & Hammayun, A. A. (2015). Economic effects of student dropouts: A comparative study. Journal of global economics. https://doi.org/10.4172/2375-4389.1000137
Levecque, K., Anseel, F., De Beuckelaer, A., Van der Heyden, J., & Gisle, L. (2017). Work organization and mental health problems in PhD students. Research policy, 46(4), 868–879. https://doi.org/10.1016/j.respol.2017.02.008
Litalien, D., & Guay, F. (2015). Dropout intentions in PhD studies: A comprehensive model based on interpersonal relationships and motivational resources. Contemporary Educational Psychology, 41, 218–231. https://doi.org/10.1016/j.cedpsych.2015.03.004
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
Lovitts, B. E. (2002). Leaving the ivory tower: The causes and consequences of departure from doctoral study. Rowman & Littlefield Publishers.
Lovitts, B. E., & Nelson, C. (2000). The hidden crisis in graduate education: Attrition from Ph. D. programs. Academe, 86(6), 44. https://doi.org/10.2307/40251951
Li, J., Koedel, C., & Zhang, L. (2021). Mitigating Ph.D. dropout rates: Evidence from a randomized control trial. Economics of Education Review, 80, 102129. https://doi.org/10.1016/j.econedurev.2020.102129
Luengo, J., García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2020). Big data preprocessing. Springer. https://doi.org/10.1007/978-3-030-39105-8
Lunardi, A. (2009). Interpolation theory (Vol. 9). Pisa: Edizioni della normale.
Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings. https://doi.org/10.1016/j.gltp.2022.04.020
Mason, M. A., & Goulden, M. (2004). Do babies matter (part II)? closing the baby gap. Academe, 90(6), 10. https://doi.org/10.2307/40252699
Mata, J., Alvarez, J. L., & Riquelme, J. C. (2002). Discovering numeric association rules via evolutionary algorithm. In Pacific-Asia conference on knowledge discovery and data mining (pp. 40-51). Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_5
Moslehi, F., & Haeri, A. (2020). A genetic algorithm-based framework for mining quantitative association rules without specifying minimum support and minimum confidence. Scientia Iranica, 27(3), 1316–1332.
Nagao, M., & Seki, H. (2016). On mining quantitative association rules from multi-relational data with FCA. In 2016 IEEE 9th International Workshop on Computational Intelligence and Applications (IWCIA) (pp. 81–86). IEEE. https://doi.org/10.1109/IWCIA.2016.7805753
Noor, M. N., Yahaya, A. S., Ramli, N. A., & Al Bakri, A. M. M. (2014). Filling missing data using interpolation methods: Study on the effect of fitting distribution (Vol. 594, pp. 889–895). Trans Tech Publications Ltd. https://doi.org/10.4028/www.scientific.net/KEM.594-595.889
Nerad, M., & Cerny, J. (1999). Postdoctoral patterns, career advancement, and problems. Science, 285(5433), 1533–1535. https://doi.org/10.1126/science.285.5433.1533
Nettleton, D. F., Orriols-Puig, A., & Fornells, A. (2010). A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial intelligence review, 33, 275–306. https://doi.org/10.1007/s10462-010-9156-z
Pal, S. (2012). Mining educational data using classification to decrease dropout rate of students. arXiv preprint arXiv:1206.3078. https://doi.org/10.48550/arXiv.1206.3078
Paul, P. C., Uddin, M. S., Ahmed, M. T., Hoque, M. M., & Rahman, M. (2022). Semantic Topic Extraction from Bangla News Corpus Using LDA and BERT-LDA. In 2022 25th International Conference on Computer and Information Technology (ICCIT) (pp. 512-516). IEEE. https://doi.org/10.1109/ICCIT57492.2022.10055173
Pérez, B., Castellanos, C., & Correal, D. (2018). Predicting student drop-out rates using data mining techniques: A case study. In Applications of Computational Intelligence: First IEEE Colombian Conference, ColCACI 2018, Medellín, Colombia, May 16-18, 2018, Revised Selected Papers 1 (pp. 111-125). Springer International Publishing. https://doi.org/10.1007/978-3-030-03023-0_10
Pipalia, K., Bhadja, R., & Shukla, M. (2020, December). Comparative analysis of different transformer based architectures used in sentiment analysis. In 2020 9th International Conference System Modeling and Advancement in Research Trends (SMART) (pp. 411-415). IEEE. https://doi.org/10.1109/SMART50582.2020.9337081
Saini, M., Adebayo, S. O., Singh, H., Singh, H., & Sharma, S. (2023a). Sustainable development goals for gender equality: Extracting associations among the indicators of SDG 5 using numerical association rule mining. Journal of Intelligent & Fuzzy Systems, (Preprint), 1-12. https://doi.org/10.3233/JIFS-222384
Saini, M., Sengupta, E., Singh, M., Singh, H., & Singh, J. (2023b). Sustainable Development Goal for Quality Education (SDG 4): A study on SDG 4 to extract the pattern of association among the indicators of SDG 4 employing a genetic algorithm. Education and Information Technologies, 28(2), 2031–2069. https://doi.org/10.1007/s10639-022-11265-4
Salkind, N. J. (2010). Encyclopedia of Research Design. Encyclopedia of Research Design. https://doi.org/10.4135/9781412961288.n100.
Salleb-Aouissi, A., Vrain, C., Nortet, C., Kong, X., Rathod, V., & Cassard, D. (2013). QuantMiner for mining quantitative association rules. The Journal of Machine Learning Research, 14(1), 3153–3157.
Setiawan, R., Budiharto, W., Kartowisastro, I. H., & Prabowo, H. (2020). Finding a model through a latent semantic approach to reveal the topic of discussion in the discussion forum. Education and Information Technologies, 25(1), 31–50. https://doi.org/10.1007/s10639-019-09901-7
Shah, M., Shenoy, R., & Shankarmani, R. (2021). Natural language to Python source code using transformers. In 2021 International Conference on Intelligent Technologies (CONIT) (pp. 1-4). IEEE. https://doi.org/10.1109/CONIT51480.2021.9498268
Singh, M., Saini, M., Adebayo, S. O., Singh, J., & Kaur, M. (2023). Comparative analysis of education policies: A study on analyzing the evolutionary changes and technical advancement in the education system. Education and Information Technologies, 28(6), 7461–7486. https://doi.org/10.1007/s10639-022-11494-7
Sivakumar, S., Venkataraman, S., & Selvaraj, R. (2016). Predictive modeling of student dropout indicators in educational data mining using improved decision tree. Indian Journal of Science and Technology, 9(4), 1–5. https://doi.org/10.17485/ijst/2016/v9i4/87032
Sverdlik, A., Hall, N. C., McAlpine, L., & Hubbard, K. (2018). The PhD experience: A review of the factors influencing doctoral students’ completion, achievement, and well-being. International Journal of Doctoral Studies, 13, 361–388. https://doi.org/10.28945/4113
Rastogi, R., & Shim, K. (2002). Mining optimized association rules with categorical and numeric attributes. IEEE Transactions on Knowledge and Data Engineering, 14(1), 29–50. https://doi.org/10.1109/69.979971
Tinto, V. (2006). Research and practice of student retention: What next? Journal of college student retention: Research. Theory & Practice, 8(1), 1–19. https://doi.org/10.2190/4YNU-4TMB-22DJ-AN4W
Vilser, M., Rauh, S., Mausz, I., & Frey, D. (2022). The Effort-Reward-Imbalance Among PhD Students–A Qualitative Study. International Journal of Doctoral Studies, 17, 401–432. https://doi.org/10.28945/5020
Walker, G. E., Golde, C. M., Jones, L., Conklin Bueschel, A., & Hutchins, P. (2008). The formation of scholars. JosseyBass.
Wendler, C., Bridgeman, B., Markle, R., Cline, F., Bell, N., McAllister, P., & Kent, J. (2012). Pathways through graduate school and into Careers. Distributed by ERIC Clearinghouse.
Willging, P. A., & Johnson, S. D. (2009). Factors that influence students' decision to dropout of online courses. Journal of Asynchronous Learning Networks, 13(3), 115–127. https://doi.org/10.24059/olj.v13i3.1659
Wollast, R., Boudrenghien, G., Van der Linden, N., Galand, B., Roland, N., Devos, C., De Clercq, M., Klein, O., Azzi, A., & Frenay, M. (2018). Who are the doctoral students who drop out? factors associated with the rate of doctoral degree completion in universities. International Journal of Higher Education, 7(4), 143. https://doi.org/10.5430/ijhe.v7n4p143
Yan, X., Guo, J., Lan, Y., & Cheng, X. (2013, May). A biterm topic model for short texts. In Proceedings of the 22nd international conference on World Wide Web (pp. 1445-1456). https://doi.org/10.1145/2488388.2488514
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
Zaki, M. J. (1999). Parallel and distributed association mining: A survey. IEEE concurrency, 7(4), 14–25. https://doi.org/10.1109/4434.806975
Zuur, A. F., Ieno, E. N., & Elphick, C. S. (2010). A protocol for data exploration to avoid common statistical problems. Methods in ecology and evolution, 1(1), 3–14. https://doi.org/10.1111/j.2041-210X.2009.00001.x
Author information
Authors and Affiliations
Contributions
All authors contributed to the data collection, analysis, and writing of the manuscript.
Corresponding author
Ethics declarations
Ethics approval
The research meets all applicable standards concerning the ethics of experimentation and research integrity, and the following is being certified/declared true. As an expert scientist and along with co-authors of the concerned field, the paper has been submitted with full responsibility, following the due ethical procedure, and there is no duplicate publication, fraud, plagiarism, or concerns about animal or human experimentation.
Competing interests
It is to specifically state on behalf of all authors that “No Competing interests are at stake and there is No Conflict of Interest” with other people or organizations that could inappropriately influence or bias the content of the paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kaur, M., Singh, M. & Saini, M. Analyzing the relation among different factors leading to Ph.D. dropout using numerical association rule mining. Educ Inf Technol 29, 375–399 (2024). https://doi.org/10.1007/s10639-023-12260-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-023-12260-z