Abstract
We uncover latent topics embedded in the management discussion and analysis (MD&A) of financial reports from the listed companies in the US, and we examine the evolution of topics found by a dynamic topic modelling method - Dynamic Embedding Topic Model. Using more than 203k reports with 40M sentences ranging from 1997 to 2017, we find 30 interpretable topics. The evolution of topics follows economics cycles and major industrial events. We validate the significance of these latent topics by the state-of-the-art performance of a simple bankruptcy ensemble classifier trained on both novel features - topical distributed representation of the MD&A, and accounting features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Electronic Data Gathering, Analysis, and Retrieval system - SEC.
- 2.
- 3.
- 4.
We did not use the modal weak words because the uncertainty wordlist includes all words in that list, and the modal moderate list is also excluded since it does not have words with strong sentiment modification.
- 5.
In Statement of Recommended Accounting Standards No. 15, Financial Accounting Standards Board (FASB).
- 6.
“U.S. coal production dropped by more than 10% in 2015 to 897 million short tons, the lowest production level since 1986”, US Energy Information Administration, https://www.eia.gov/todayinenergy/detail.php?id=28732, Retrieved 3rd Mar, 2020.
References
Altman, E.I., Iwanicz-Drozdowska, M., Laitinen, E.K., Suvas, A.: Financial distress prediction in an international context: a review and empirical analysis of Altman’s Z- score model. J. Int. Financ. Manag. Acc. 28(2), 131–171 (2017)
Altman, E.I., Sabato, G.: Modelling credit risk for SMEs: evidence from the U.S. market. Abacus 43(3), 332–357 (2007)
Bao, Y., Datta, A.: Simultaneously discovering and quantifying risk types from textual risk disclosures. Manag. Sci. 60(6), 1371–1391 (2014)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning - ICML 2006, pp. 113–120. ACM Press, Pittsburgh (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM (2006)
Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: The dynamic embedded topic model. arXiv:1907.05545 [cs, stat] (2019)
Dieng, A.B., Ruiz, F.J., Blei, D.M.: Topic modeling in embedding spaces. arXiv preprint arXiv:1907.04907 (2019)
Gandhi, P., Loughran, T., McDonald, B.: Using annual report sentiment as a proxy for financial distress in U.S. banks. J. Behav. Finance 20(4), 424–436 (2019)
García, V., Marqués, A.I., Sánchez, J.S.: Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Inf. Fusion 47, 88–101 (2019)
Guan, L., He, S.D., McEldowney, J.: Window dressing in reported earnings. Com. Lending Rev. 23, 26 (2008)
Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 856–864. Curran Associates, Inc. (2010)
Huang, A.H., Lehavy, R., Zang, A.Y., Zheng, R.: Analyst information discovery and interpretation roles: a topic modeling approach. Manag. Sci. 64(6), 2833–2855 (2018)
Huang, K.W., Li, Z.: A multilabel text classification algorithm for labeling risk factors in SEC form 10-K. ACM Trans. Manag. Inf. Syst. 2(3), 1–19 (2011)
Jiang, F., Lee, J., Martin, X., Zhou, G.: Manager sentiment and stock returns. J. Financ. Econ. 132(1), 126–149 (2019)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning (2014)
Loughran, T., Mcdonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Finance 66(1), 35–65 (2011)
Mai, F., Tian, S., Lee, C., Ma, L.: Deep learning models for bankruptcy prediction using textual disclosures. Eur. J. Oper. Res. 274(2), 743–758 (2019)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013)
Nguyen, H.B., Huynh, V.N.: On sampling techniques for corporate credit scoring. J. Adv. Comput. Intell. Intell. Inform. 24(1), 48–57 (2020)
Nguyen, T.H., Shirai, K., Velcin, J.: Sentiment analysis on social media for stock movement prediction. Expert Syst. Appl. 42(24), 9603–9611 (2015)
Zhou, G.: Measuring investor sentiment. Ann. Rev. Financ. Econ. 10, 239–259 (2018)
Acknowledgments
We appreciate the fruitful discussions with Professor Jonathan Crook, Professor Galina Andreeva, Professor Raffaella Calabrese and other researchers at Business School, University of Edinburgh when Hung Ba was supported by JAIST Research Grant and DRF Grant No. 238003 to work as a visiting scholar. Other errors retain our own.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen, BH., Kiyoaki, S., Huynh, VN. (2021). Topics in Financial Filings and Bankruptcy Prediction with Distributed Representations of Textual Data. In: Dong, Y., Ifrim, G., Mladenić, D., Saunders, C., Van Hoecke, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12461. Springer, Cham. https://doi.org/10.1007/978-3-030-67670-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-67670-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67669-8
Online ISBN: 978-3-030-67670-4
eBook Packages: Computer ScienceComputer Science (R0)