Imputation techniques on missing values in breast cancer treatment and fertility data | Health Information Science and Systems Skip to main content

Advertisement

Log in

Imputation techniques on missing values in breast cancer treatment and fertility data

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript

Abstract

Clinical decision support using data mining techniques offers more intelligent way to reduce the decision error in the last few years. However, clinical datasets often suffer from high missingness, which adversely impacts the quality of modelling if handled improperly. Imputing missing values provides an opportunity to resolve the issue. Conventional imputation methods adopt simple statistical analysis, such as mean imputation or discarding missing cases, which have many limitations and thus degrade the performance of learning. This study examines a series of machine learning based imputation methods and suggests an efficient approach to in preparing a good quality breast cancer (BC) dataset, to find the relationship between BC treatment and chemotherapy-related amenorrhoea, where the performance is evaluated with the accuracy of the prediction. To this end, the reliability and robustness of six well-known imputation methods are evaluated. Our results show that imputation leads to a significant boost in the classification performance compared to the model prediction based on listwise deletion. Furthermore, the results reveal that most methods gain strong robustness and discriminant power even the dataset experiences high missing rate (> 50%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Acuna E, Rodriguez C. The treatment of missing values and its effect on classifier accuracy., Classification, clustering, and data mining applicationsNew York: Springer; 2004. p. 639–47.

    Google Scholar 

  2. Barakat MS, Field M, Ghose A, Stirling D, Holloway L, Vinod S, Dekker A, Thwaites D. The effect of imputing missing clinical attribute values on training lung cancer survival prediction model performance. Health Inf Sci Syst. 2017;5(1):16.

    Article  Google Scholar 

  3. Batista GE, Monard MC, et al. A study of k-nearest neighbour as an imputation method. HIS. 2002;87(251–260):48.

    Google Scholar 

  4. Buuren SV, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw. 2010. https://doi.org/10.18637/jss.v045.i03.

  5. de Goeij MC, van Diepen M, Jager KJ, Tripepi G, Zoccali C, Dekker FW. Multiple imputation: dealing with missing data. Nephrol Dial Transplant. 2013;28(10):2415–20.

    Article  Google Scholar 

  6. Ives A, Saunders C, Bulsara M, Semmens J. Pregnancy after breast cancer: population based study. BMJ. 2007;334(7586):194.

    Article  Google Scholar 

  7. Jerez JM, Molina I, García-Laencina PJ, Alba E, Ribelles N, Martín M, Franco L. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med. 2010;50(2):105–15.

    Article  Google Scholar 

  8. Johnson N, Bagrie E, Coomarasamy A, Bhattacharya S, Shelling A, Jessop S, Farquhar C, Khan K. Ovarian reserve tests for predicting fertility outcomes for assisted reproductive technology: the international systematic collaboration of ovarian reserve evaluation protocol for a systematic review of ovarian reserve test accuracy. BJOG. 2006;113(12):1472–80.

    Article  Google Scholar 

  9. Kalton G, Kish L. Some efficient random imputation methods. Commun Stat Theory Methods. 1984;13(16):1919–39.

    Article  Google Scholar 

  10. Lee S, Kil WJ, Chun M, Jung YS, Kang SY, Kang SH, Oh YT. Chemotherapy-related amenorrhea in premenopausalwomen with breast cancer. Menopause. 2009;16(1):98–103.

    Article  Google Scholar 

  11. Lee G, Rubinfeld I, Syed Z. Adapting surgical models to individual hospitals using transfer learning. In: 2012 IEEE 12th international conference on data mining workshops; 2012. pp. 57–63.

  12. Liem GS, Mo FK, Pang E, Suen JJ, Tang NL, Lee KM, Yip CH, Tam WH, Ng R, Koh J, et al. Chemotherapy-related amenorrhea and menopause in young chinese breast cancer patients: analysis on incidence, risk factors and serum hormone profiles. PloS ONE. 2015;10(10):e0140842.

    Article  Google Scholar 

  13. Lin WC, Tsai CF. Missing value imputation: a review and analysis of the literature (2006–2017). Artif Intell Rev. 2019. https://doi.org/10.1007/s10462-019-09709-4.

  14. Little RJ, Rubin DB. Statistical analysis with missing data, vol. 793. Hoboken: Wiley; 2019.

    MATH  Google Scholar 

  15. Moon TK. The expectation-maximization algorithm. IEEE Signal Process Mag. 1996;13(6):47–60.

    Article  Google Scholar 

  16. Nelwamondo FV, Mohamed S, Marwala T. Missing data: a comparison of neural network and expectation maximization techniques. Curr Sci. 2007;93:1514–21.

    Google Scholar 

  17. Peate M, Edib Z. Fertility after cancer predictor (forecast) study. 2019. https://medicine.unimelb.edu.au/research-groups/obstetrics-and-gynaecology-research/psychosocial-health-wellbeing-research/fertility-after-cancer-predictor-forecast-study. Accessed 15 Apr 2019.

  18. Peate M, Meiser B, Friedlander M, Zorbas H, Rovelli S, Sansom-Daly U, Sangster J, Hadzi-Pavlovic D, Hickey M. It’s now or never: fertility-related knowledge, decision-making preferences, and treatment intentions in young women with breast cancer–an australian fertility decision aid collaborative group study. J Clin Oncol. 2011;29(13):1670–7.

    Article  Google Scholar 

  19. Peate M, Stafford L, Hickey M. Fertility after breast cancer and strategies to help women achieve pregnancy. Cancer Forum. 2017;41:32.

    Google Scholar 

  20. Purwar A, Singh SK. Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl. 2015;42(13):5621–31.

    Article  Google Scholar 

  21. Rubin DB. Multiple imputation for nonresponse in surveys, vol. 81. Hoboken: Wiley; 2004.

    MATH  Google Scholar 

  22. Ruddy KJ, Gelber S, Tamimi RM, Schapira L, Come SE, Meyer ME, Winer EP, Partridge AH. Breast cancer presentation and diagnostic delays in young women. Cancer. 2014;120(1):20–5.

    Article  Google Scholar 

  23. Schafer JL. Analysis of incomplete multivariate data. New York: Chapman and Hall/CRC; 1997.

    Book  Google Scholar 

  24. Stekhoven DJ, Bühlmann P. Missforest: non-parametric missing value imputation for mixed-type data. Bioinformatics. 2011;28(1):112–8.

    Article  Google Scholar 

  25. Van Rossum G, Drake FL Jr. Python tutorial. Amsterdam: Centrum voor Wiskunde en Informatica; 1995.

    Google Scholar 

  26. Wilson DR, Martinez TR. Improved heterogeneous distance functions. J Artif Intell Res. 1997;6:1–34.

    Article  MathSciNet  Google Scholar 

Download references

Funding

This work is fully funded by Melbourne Research Scholarships (MRS), Grant No. 385545 and partially supported by Fertility After Cancer Predictor (FoRECAsT) Study. Michelle Peate is currently supported by an MDHS Fellowship, University of Melbourne. The FoRECAsT study is supported by the FoRECAsT consortium and Victorian Government through a Victorian Cancer Agency (Early Career Seed Grant) awarded to Michelle Peate.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuetong Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, X., Akbarzadeh Khorshidi, H., Aickelin, U. et al. Imputation techniques on missing values in breast cancer treatment and fertility data. Health Inf Sci Syst 7, 19 (2019). https://doi.org/10.1007/s13755-019-0082-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13755-019-0082-4

Keywords

Navigation