Abstract
Predicting event occurrence at an early stage in longitudinal studies is an important problem which has high practical value. As opposed to the standard classification and regression problems where a domain expert can provide the labels for the data in a reasonably short period of time, training data in such longitudinal studies must be obtained only by waiting for the occurrence of sufficient number of events. The main objective of this work is to predict the event occurrence in the future for a particular subject in the study using the data collected at the initial stages of a longitudinal study. In this paper, we propose a novel Early Stage Prediction (ESP) framework for building event prediction models which are trained at early stages of longitudinal studies. More specifically, we develop two probabilistic algorithms based on Naive Bayes and Tree-Augmented Naive Bayes (TAN), called ESP-NB and ESP-TAN, respectively, for early stage event prediction by modifying the posterior probability of event occurrence using different extrapolations that are based on Weibull and Lognormal distributions. The proposed framework is evaluated using a wide range of synthetic and real-world benchmark datasets. Our extensive set of experiments show that the proposed ESP framework is able to more accurately predict future event occurrences using only a limited amount of training data compared to the other alternative approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bandyopadhyay, S., Wolfson, J., Vock, D.M., Vazquez-Benitez, G., Adomavicius, G., Elidrisi, M., Johnson, P.E., O’Connor, P.J.: Data mining for censored time-to-event data: a bayesian network model for predicting cardiovascular risk from electronic health record data. Data Min. Knowl. Disc. 29(4), 1033–1069 (2015)
Bender, R., Augustin, T., Blettner, M.: Generating survival times to simulate Cox proportional hazards models. Stat. Med. 25, 1978–1979 (2006)
Carroll, K.J.: On the use and utility of the Weibull model in the analysis of survival data. Control. Clin. Trials 24(6), 682–701 (2003)
Dawber, T.R., Kannel, W.B., Lyell, L.P.: An approach to longitudinal studies in a community: the Framingham study. Ann. N.Y. Acad. Sci. 107(2), 539–556 (1963)
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A., Leisch, M.F.: Package e1071. R Software package (2009). http://cran.rproject.org/web/packages/e1071/index.html
Donovan, M.J., Donovan, M.J., Hamann, S., Clayton, M., et al.: Systems pathology approach for the prediction of prostate cancer progression after radical prostatectomy. J. Clin. Oncol.: Off. J. Am. Soc. Clin. Oncol. 26(24), 3923–3929 (2008)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)
Gordon, L., Plshen, R.: Tree-structured survival analysis. Cancer Treat Rep. 69(10), 1065–1074 (1985)
Hosmer, D.W., Lemeshow, S.: Applied Survival Analysis: Regression Modeling of Time to Event Data. Wiley, New York (1999)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)
Khan, F.M., Zubek, V.B.: Support vector regression for censored data (SVRc): a novel tool for survival analysis. In: 8th IEEE International Conference on Data Mining, pp. 863–868 (2008)
Lavrac, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16, 3–23 (1999)
Lee, E.T., Wang, J.: Statistical Methods for Survival Data Analysis, vol. 476. Wiley, New York (2003)
Lucas, P.J.F., van der Gaag, L.C., Abu-Hanna, A.: Bayesian networks in biomedicine and health-care. Artif. Intell. Med. 30(3), 201–214 (2004)
Reddy, C.K., Li, Y.: A review of clinical prediction models. In: Reddy, C.K., Aggarwal, C.C. (eds.) Healthcare Data Analytics. Chapman and Hall/CRC Press, Boca Raton (2015)
Royston, P.: The lognormal distribution as a model for survival time in cancer, with an emphasis on prognostic factors. Stat. Neerl. 55(1), 89–104 (2001)
Segal, M.R.: Regression trees for censored data. Biometrics 44(1), 35–47 (1988)
Shiao, H.-T., Cherkassky, V.: Learning using privileged information (LUPI) for modeling survival data. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1042–1049, July 2014
Štajduhar, I., Dalbelo-Bašić, B.: Uncensoring censored data for machine learning: a likelihood-based approach. Expert Syst. Appl. 39(8), 7226–7234 (2012)
Wolfson, J., Bandyopadhyay, S., Elidrisi, M., Vazquez-Benitez, G., Vock, D.M., Musgrove, D., Adomavicius, G., Johnson, P.E., O’Connor, P.J.: A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data. Stat. Med. 34(21), 2941–2957 (2015)
Zupan, B., DemšAr, J., Kattan, M.W., Beck, J.R., Bratko, I.: Machine learning for survival analysis: a case study on recurrence of prostate cancer. Artif. Intell. Med. 20(1), 59–75 (2000)
Acknowledgements
This work was supported in part by the National Science Foundation grants IIS-1527827 and IIS-1231742.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Fard, M.J., Chawla, S., Reddy, C.K. (2016). Early-Stage Event Prediction for Longitudinal Data. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9651. Springer, Cham. https://doi.org/10.1007/978-3-319-31753-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-31753-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31752-6
Online ISBN: 978-3-319-31753-3
eBook Packages: Computer ScienceComputer Science (R0)