Analyzing the relation among different factors leading to Ph.D. dropout using numerical association rule mining | Education and Information Technologies Skip to main content
Log in

Analyzing the relation among different factors leading to Ph.D. dropout using numerical association rule mining

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

This article has been updated

Abstract

Ph.D. dropout is a persistent and challenging issue in higher education, with significant implications for individual students, academic institutions, and the broader society. This research paper aims to explore the factors contributing to Ph.D. students' decision to dropout and their interrelationships. For this purpose, we employed the hybrid topic modeling Bidirectional Encoder Representations from Transformers – Latent Dirichlet Allocation (BERT-LDA) algorithm and Numerical Association Rule Mining (NARM) using a genetic algorithm in QuantMiner. We identified and analyzed individual, institutional, and social factors that affect Ph.D. students in leaving their current degrees. The results suggest that financial constraints, inadequate academic preparation, poor mentoring, social isolation, lack of social support, family responsibilities, and work-life balance are significant elements responsible for dropout. These findings also reveal that these factors are interrelated, and their effects can be mitigated by the academic institution's policies and culture. The outcomes of the study have implications for academic institutions, policymakers, and researchers, who can use them to develop evidence-based strategies and interventions that enhance Ph.D. students' retention and success.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

All the datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.

Code availability

Not applicable.

Change history

  • 06 December 2023

    Springer Nature’s version of this paper was updated to rectify the incorrect version of the article PDF proof.

Notes

  1. https://figshare.com/s/74a5ea79d76ad66a8af8

  2. https://pypi.org/project/googletrans/

References

  • Ali, A., & Gregg Kohun, F. (2006). Dealing with isolation feelings in is doctoral programs. International Journal of Doctoral Studies, 1, 021–033. 10.28945/58

  • Ampaw, F. D., & Jaeger, A. J. (2011). Understanding the factors affecting degree completion of doctoral women in the Science and Engineering Fields. New Directions for Institutional Research, 2011(152), 59–73. https://doi.org/10.1002/ir.409

  • Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2016). Predicting student dropout in higher education. arXivhttps://doi.org/10.48550/arXiv.1606.06364

  • Austin, A. E. (2002). Preparing the next generation of faculty: Graduate school as socialization to the academic career. The journal of higher education, 73(1), 94–122. https://doi.org/10.1080/00221546.2002.11777132

    Article  Google Scholar 

  • Bair, C. R., & Haworth, J. G. (2005). Doctoral student attrition and persistence: A meta-synthesis of research. Higher education: Handbook of theory and research, 481-534. https://doi.org/10.1007/1-4020-2456-8

  • Baker, V. L., & Lattuca, L. R. (2010). Developmental networks and learning: Toward an interdisciplinary perspective on identity development during doctoral study. Studies in higher education, 35(7), 807–827. https://doi.org/10.1080/03075070903501887

    Article  Google Scholar 

  • Bean, J. P., & Eaton, S. B. (2000). A psychological model of college student retention. Reworking the student departure puzzle, 1, 48–61.

    Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993–1022.

    Google Scholar 

  • Blyth, S. (1994). Karl Pearson and the correlation curve. International Statistical Review/Revue Internationale de Statistique, 393-403. https://doi.org/10.2307/1403769

  • Castellanos, J., & Gloria, A. M. (2007). Research considerations and theoretical application for best practices in higher education: Latina/os achieving success. Journal of Hispanic Higher Education, 6(4), 378–396. https://doi.org/10.1177/1538192707305347

    Article  Google Scholar 

  • Chai, X., Tang, G., Wang, S., Peng, R., Chen, W., & Li, J. (2020). Deep learning for regularly missing data reconstruction. IEEE Transactions on Geoscience and Remote Sensing, 58(6), 4406–4423. https://doi.org/10.1109/TGRS.2020.2963928

    Article  Google Scholar 

  • Chen, Y., Dong, G., Han, J., Wah, B. W., & Wang, J. (2002). Multi-dimensional regression analysis of time-series data streams. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases (pp. 323–334). Morgan Kaufmann. https://doi.org/10.1016/B978-155860869-6/50036-6

  • Cyranoski, D., Gilbert, N., Ledford, H., Nayar, A., & Yahia, M. (2011). Education: The phd factory. Nature, 472(7343), 276–279. https://doi.org/10.1038/472276a276-280.

  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805

  • Fuhrmann, C. N., Halme, D. G., O’sullivan, P. S., & Lindstaedt, B. (2011). Improving graduate education to support a branching career pipeline: Recommendations based on a survey of doctoral students in the basic biomedical sciences. CBE—Life Sciences Education, 10(3), 239–249. https://doi.org/10.1187/cbe.11-02-0013

    Article  Google Scholar 

  • Gardner, S. K. (2009a). Conceptualizing success in doctoral education: Perspectives of faculty in seven disciplines. The Review of Higher Education, 32(3), 383–406. https://doi.org/10.1353/rhe.0.0075

    Article  Google Scholar 

  • Gardner, S. K. (2009b). Student and faculty attributions of attrition in high and low-completing doctoral programs in the United States. Higher education, 58, 97–112. https://doi.org/10.1007/s10734-008-9184-7

    Article  Google Scholar 

  • George, L., & Sumathy, P. (2023). An integrated clustering and BERT framework for improved topic modeling. International Journal of Information Technology, 1–9. https://doi.org/10.1007/s41870-023-01268-w

  • Golde, C. M. (2000). Should I Stay or Should I Go? Student Descriptions of the Doctoral Attrition Process. The Review of Higher Education, 23(2), 199–227. https://doi.org/10.1353/rhe.2000.0004

    Article  Google Scholar 

  • Golde, C. M. (2005). The Role of the Department and Discipline in Doctoral Student Attrition: Lessons from Four Departments. The Journal of Higher Education, 76(6), 669–700. https://doi.org/10.1080/00221546.2005.11772304

    Article  Google Scholar 

  • Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl_1), 5228–5235. https://doi.org/10.1073/pnas.030775210

    Article  Google Scholar 

  • Hauke, J., & Kossowski, T. (2011). Comparison of values of Pearson's and Spearman's correlation coefficients on the same sets of data. Quaestiones geographicae, 30(2), 87–93. https://doi.org/10.2478/v10117-011-0021-1

    Article  Google Scholar 

  • Heckman, J. J., & LaFontaine, P. A. (2010). The American high school graduation rate: Trends and levels. The review of economics and statistics, 92(2), 244–262. https://doi.org/10.1162/rest.2010.12366

    Article  Google Scholar 

  • Huang, G. (2021). Missing data filling method based on linear interpolation and lightgbm. In Journal of Physics: Conference Series (Vol. 1754, No. 1, p. 012187). IOP Publishing. https://doi.org/10.1088/1742-6596/1754/1/012187

  • Hunter, K. H., & Devine, K. (2016). Doctoral students’ emotional exhaustion and intentions to leave academia. International Journal of doctoral studies, 11(2), 35–61. https://doi.org/10.28945/3396

    Article  Google Scholar 

  • Jaromczyk, J. W., & Toussaint, G. T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502–1517. https://doi.org/10.1109/5.163414

    Article  Google Scholar 

  • Kaur, M., & Saini, M. (2023). Indian government initiatives on cyberbullying: A case study on cyberbullying in Indian higher education institutions. Education and Information Technologies, 28(1), 581–615. https://doi.org/10.1007/s10639-022-11168-4

    Article  Google Scholar 

  • Kotsiantis, S. (2009). Educational data mining: a case study for predicting dropout-prone students. International Journal of Knowledge Engineering and Soft Data Paradigms, 1(2), 101–111. https://doi.org/10.1504/IJKESDP.2009.022718

    Article  Google Scholar 

  • Koufakou, A. (2023). Deep learning for opinion mining and topic classification of course reviews. Education and Information Technologies, 1-25. https://doi.org/10.1007/s10639-023-11736-2

  • Latif, A., Choudhary, A. I., & Hammayun, A. A. (2015). Economic effects of student dropouts: A comparative study. Journal of global economics. https://doi.org/10.4172/2375-4389.1000137

  • Levecque, K., Anseel, F., De Beuckelaer, A., Van der Heyden, J., & Gisle, L. (2017). Work organization and mental health problems in PhD students. Research policy, 46(4), 868–879. https://doi.org/10.1016/j.respol.2017.02.008

    Article  Google Scholar 

  • Litalien, D., & Guay, F. (2015). Dropout intentions in PhD studies: A comprehensive model based on interpersonal relationships and motivational resources. Contemporary Educational Psychology, 41, 218–231. https://doi.org/10.1016/j.cedpsych.2015.03.004

    Article  Google Scholar 

  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692

  • Lovitts, B. E. (2002). Leaving the ivory tower: The causes and consequences of departure from doctoral study. Rowman & Littlefield Publishers.

    Google Scholar 

  • Lovitts, B. E., & Nelson, C. (2000). The hidden crisis in graduate education: Attrition from Ph. D. programs. Academe, 86(6), 44. https://doi.org/10.2307/40251951

    Article  Google Scholar 

  • Li, J., Koedel, C., & Zhang, L. (2021). Mitigating Ph.D. dropout rates: Evidence from a randomized control trial. Economics of Education Review, 80, 102129. https://doi.org/10.1016/j.econedurev.2020.102129

    Article  Google Scholar 

  • Luengo, J., García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2020). Big data preprocessing. Springer. https://doi.org/10.1007/978-3-030-39105-8

    Book  Google Scholar 

  • Lunardi, A. (2009). Interpolation theory (Vol. 9). Pisa: Edizioni della normale.

  • Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings. https://doi.org/10.1016/j.gltp.2022.04.020

  • Mason, M. A., & Goulden, M. (2004). Do babies matter (part II)? closing the baby gap. Academe, 90(6), 10. https://doi.org/10.2307/40252699

  • Mata, J., Alvarez, J. L., & Riquelme, J. C. (2002). Discovering numeric association rules via evolutionary algorithm. In Pacific-Asia conference on knowledge discovery and data mining (pp. 40-51). Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_5

  • Moslehi, F., & Haeri, A. (2020). A genetic algorithm-based framework for mining quantitative association rules without specifying minimum support and minimum confidence. Scientia Iranica, 27(3), 1316–1332.

  • Nagao, M., & Seki, H. (2016). On mining quantitative association rules from multi-relational data with FCA. In 2016 IEEE 9th International Workshop on Computational Intelligence and Applications (IWCIA) (pp. 81–86). IEEE. https://doi.org/10.1109/IWCIA.2016.7805753

  • Noor, M. N., Yahaya, A. S., Ramli, N. A., & Al Bakri, A. M. M. (2014). Filling missing data using interpolation methods: Study on the effect of fitting distribution (Vol. 594, pp. 889–895). Trans Tech Publications Ltd. https://doi.org/10.4028/www.scientific.net/KEM.594-595.889

  • Nerad, M., & Cerny, J. (1999). Postdoctoral patterns, career advancement, and problems. Science, 285(5433), 1533–1535. https://doi.org/10.1126/science.285.5433.1533

    Article  Google Scholar 

  • Nettleton, D. F., Orriols-Puig, A., & Fornells, A. (2010). A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial intelligence review, 33, 275–306. https://doi.org/10.1007/s10462-010-9156-z

    Article  Google Scholar 

  • Pal, S. (2012). Mining educational data using classification to decrease dropout rate of students. arXiv preprint arXiv:1206.3078. https://doi.org/10.48550/arXiv.1206.3078

  • Paul, P. C., Uddin, M. S., Ahmed, M. T., Hoque, M. M., & Rahman, M. (2022). Semantic Topic Extraction from Bangla News Corpus Using LDA and BERT-LDA. In 2022 25th International Conference on Computer and Information Technology (ICCIT) (pp. 512-516). IEEE. https://doi.org/10.1109/ICCIT57492.2022.10055173

  • Pérez, B., Castellanos, C., & Correal, D. (2018). Predicting student drop-out rates using data mining techniques: A case study. In Applications of Computational Intelligence: First IEEE Colombian Conference, ColCACI 2018, Medellín, Colombia, May 16-18, 2018, Revised Selected Papers 1 (pp. 111-125). Springer International Publishing. https://doi.org/10.1007/978-3-030-03023-0_10

  • Pipalia, K., Bhadja, R., & Shukla, M. (2020, December). Comparative analysis of different transformer based architectures used in sentiment analysis. In 2020 9th International Conference System Modeling and Advancement in Research Trends (SMART) (pp. 411-415). IEEE. https://doi.org/10.1109/SMART50582.2020.9337081

  • Saini, M., Adebayo, S. O., Singh, H., Singh, H., & Sharma, S. (2023a). Sustainable development goals for gender equality: Extracting associations among the indicators of SDG 5 using numerical association rule mining. Journal of Intelligent & Fuzzy Systems, (Preprint), 1-12. https://doi.org/10.3233/JIFS-222384

  • Saini, M., Sengupta, E., Singh, M., Singh, H., & Singh, J. (2023b). Sustainable Development Goal for Quality Education (SDG 4): A study on SDG 4 to extract the pattern of association among the indicators of SDG 4 employing a genetic algorithm. Education and Information Technologies, 28(2), 2031–2069. https://doi.org/10.1007/s10639-022-11265-4

    Article  Google Scholar 

  • Salkind, N. J. (2010). Encyclopedia of Research Design. Encyclopedia of Research Design. https://doi.org/10.4135/9781412961288.n100.

  • Salleb-Aouissi, A., Vrain, C., Nortet, C., Kong, X., Rathod, V., & Cassard, D. (2013). QuantMiner for mining quantitative association rules. The Journal of Machine Learning Research, 14(1), 3153–3157.

  • Setiawan, R., Budiharto, W., Kartowisastro, I. H., & Prabowo, H. (2020). Finding a model through a latent semantic approach to reveal the topic of discussion in the discussion forum. Education and Information Technologies, 25(1), 31–50. https://doi.org/10.1007/s10639-019-09901-7

    Article  Google Scholar 

  • Shah, M., Shenoy, R., & Shankarmani, R. (2021). Natural language to Python source code using transformers. In 2021 International Conference on Intelligent Technologies (CONIT) (pp. 1-4). IEEE. https://doi.org/10.1109/CONIT51480.2021.9498268

  • Singh, M., Saini, M., Adebayo, S. O., Singh, J., & Kaur, M. (2023). Comparative analysis of education policies: A study on analyzing the evolutionary changes and technical advancement in the education system. Education and Information Technologies, 28(6), 7461–7486. https://doi.org/10.1007/s10639-022-11494-7

    Article  Google Scholar 

  • Sivakumar, S., Venkataraman, S., & Selvaraj, R. (2016). Predictive modeling of student dropout indicators in educational data mining using improved decision tree. Indian Journal of Science and Technology, 9(4), 1–5. https://doi.org/10.17485/ijst/2016/v9i4/87032

    Article  Google Scholar 

  • Sverdlik, A., Hall, N. C., McAlpine, L., & Hubbard, K. (2018). The PhD experience: A review of the factors influencing doctoral students’ completion, achievement, and well-being. International Journal of Doctoral Studies, 13, 361–388. https://doi.org/10.28945/4113

    Article  Google Scholar 

  • Rastogi, R., & Shim, K. (2002). Mining optimized association rules with categorical and numeric attributes. IEEE Transactions on Knowledge and Data Engineering, 14(1), 29–50. https://doi.org/10.1109/69.979971

    Article  Google Scholar 

  • Tinto, V. (2006). Research and practice of student retention: What next? Journal of college student retention: Research. Theory & Practice, 8(1), 1–19. https://doi.org/10.2190/4YNU-4TMB-22DJ-AN4W

    Article  Google Scholar 

  • Vilser, M., Rauh, S., Mausz, I., & Frey, D. (2022). The Effort-Reward-Imbalance Among PhD Students–A Qualitative Study. International Journal of Doctoral Studies, 17, 401–432. https://doi.org/10.28945/5020

    Article  Google Scholar 

  • Walker, G. E., Golde, C. M., Jones, L., Conklin Bueschel, A., & Hutchins, P. (2008). The formation of scholars. JosseyBass.

    Google Scholar 

  • Wendler, C., Bridgeman, B., Markle, R., Cline, F., Bell, N., McAllister, P., & Kent, J. (2012). Pathways through graduate school and into Careers. Distributed by ERIC Clearinghouse.

  • Willging, P. A., & Johnson, S. D. (2009). Factors that influence students' decision to dropout of online courses. Journal of Asynchronous Learning Networks, 13(3), 115–127. https://doi.org/10.24059/olj.v13i3.1659

    Article  Google Scholar 

  • Wollast, R., Boudrenghien, G., Van der Linden, N., Galand, B., Roland, N., Devos, C., De Clercq, M., Klein, O., Azzi, A., & Frenay, M. (2018). Who are the doctoral students who drop out? factors associated with the rate of doctoral degree completion in universities. International Journal of Higher Education, 7(4), 143. https://doi.org/10.5430/ijhe.v7n4p143

  • Yan, X., Guo, J., Lan, Y., & Cheng, X. (2013, May). A biterm topic model for short texts. In Proceedings of the 22nd international conference on World Wide Web (pp. 1445-1456). https://doi.org/10.1145/2488388.2488514

  • Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.

  • Zaki, M. J. (1999). Parallel and distributed association mining: A survey. IEEE concurrency, 7(4), 14–25. https://doi.org/10.1109/4434.806975

    Article  Google Scholar 

  • Zuur, A. F., Ieno, E. N., & Elphick, C. S. (2010). A protocol for data exploration to avoid common statistical problems. Methods in ecology and evolution, 1(1), 3–14. https://doi.org/10.1111/j.2041-210X.2009.00001.x

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the data collection, analysis, and writing of the manuscript.

Corresponding author

Correspondence to Madanjit Singh.

Ethics declarations

Ethics approval

The research meets all applicable standards concerning the ethics of experimentation and research integrity, and the following is being certified/declared true. As an expert scientist and along with co-authors of the concerned field, the paper has been submitted with full responsibility, following the due ethical procedure, and there is no duplicate publication, fraud, plagiarism, or concerns about animal or human experimentation.

Competing interests

It is to specifically state on behalf of all authors that “No Competing interests are at stake and there is No Conflict of Interest” with other people or organizations that could inappropriately influence or bias the content of the paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, M., Singh, M. & Saini, M. Analyzing the relation among different factors leading to Ph.D. dropout using numerical association rule mining. Educ Inf Technol 29, 375–399 (2024). https://doi.org/10.1007/s10639-023-12260-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10639-023-12260-z

Keywords