Multi-task Learning for Features Extraction in Financial Annual Reports | SpringerLink
Skip to main content

Multi-task Learning for Features Extraction in Financial Annual Reports

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2022)

Abstract

For assessing various performance indicators of companies, the focus is shifting from strictly financial (quantitative) publicly disclosed information to qualitative (textual) information. This textual data can provide valuable weak signals, for example through stylistic features, which can complement the quantitative data on financial performance or on Environmental, Social and Governance (ESG) criteria. In this work, we use various multi-task learning methods for financial text classification with the focus on financial sentiment, objectivity, forward-looking sentence prediction and ESG-content detection. We propose different methods to combine the information extracted from training jointly on different tasks; our best-performing method highlights the positive effect of explicitly adding auxiliary task predictions as features for the final target task during the multi-task training. Next, we use these classifiers to extract textual features from annual reports of FTSE350 companies and investigate the link between ESG quantitative scores and these features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 9151
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 11439
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://ec.europa.eu/info/business-economy-euro/company-reporting-and-auditing/company-reporting/corporate-sustainability-reporting_en.

  2. 2.

    For a sample of 19,426 PDF annual reports published by 3252 firms listed on the London Stock Exchange.

  3. 3.

    https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-2022/shared-task-finsim4-esg.

  4. 4.

    Code and details to re-create the dataset are available at osf.io/rqgp4.

  5. 5.

    Note that the N/As can only appear in the joint and weighted settings, where there is no explicit final target task.

  6. 6.

    Note that the results for each method reported in Table 3 are lower than the results reported in Table 2, since here we report the average method’s performance across all task combinations, while in Table 2 we only report results for the best ranked task combinations for each specific method.

References

  1. Aghajanyan, A., Gupta, A., Shrivastava, A., Chen, X., Zettlemoyer, L., Gupta, S.: Muppet: massive multi-task representations with pre-finetuning. arXiv preprint arXiv:2101.11038 (2021)

  2. Amir, E., Lev, B.: Value-relevance of nonfinancial information: the wireless communications industry. J. Account. Econ. 22(1), 3–30 (1996). https://doi.org/10.1016/S0165-4101(96)00430-2

  3. Aribandi, V., et al.: Ext5: towards extreme multi-task scaling for transfer learning. arXiv preprint arXiv:2111.10952 (2021)

  4. Armbrust, F., Schäfer, H., Klinger, R.: A computational analysis of financial and environmental narratives within financial reports and its value for investors. In: Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, pp. 181–194, COLING, Barcelona, Spain, December 2020. https://aclanthology.org/2020.fnp-1.31

  5. Bobicev, V., Sokolova, M.: Inter-annotator agreement in sentiment analysis: machine learning perspective. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 97–102, INCOMA Ltd., Varna, Bulgaria, September 2017. https://doi.org/10.26615/978-954-452-049-6_015

  6. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    Article  Google Scholar 

  7. Chen, C.C., Huang, H.H., Takamura, H., Chen, H.H.: An overview of financial technology innovation (2022)

    Google Scholar 

  8. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20(1), 37–46 (1960). https://doi.org/10.1177/001316446002000104

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, Minneapolis, Minnesota, June 2019

    Google Scholar 

  10. Dyer, T., Lang, M., Stice-Lawrence, L.: The evolution of 10-k textual disclosure: evidence from latent dirichlet allocation. J. Account. Econ. 64(2), 221–245 (2017). https://EconPapers.repec.org/RePEc:eee:jaecon:v:64:y:2017:i:2:p:221-245

  11. Halder, K., Akbik, A., Krapac, J., Vollgraf, R.: Task-aware representation of sentences for generic text classification. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 3202–3213 (2020)

    Google Scholar 

  12. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339, Melbourne, Australia, July 2018

    Google Scholar 

  13. Keith, K., Jensen, D., O’Connor, B.: Text and causal inference: a review of using text to remove confounding from causal estimates. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5332–5344, Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.474, https://aclanthology.org/2020.acl-main.474

  14. Lev, B., Thiagarajan, S.R.: Fundamental information analysis. J. Account. Res. 31(2), 190–215 (1993). https://doi.org/10.2307/2491270, http://dx.doi.org/10.2307/2491270

  15. Lewis, C., Young, S.: Fad or future? automated analysis of financial text and its implications for corporate reporting. Account. Bus. Res. 49(5), 587–615 (2019). https://doi.org/10.1080/00014788.2019.1611730

  16. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  17. Lydenberg, S., Rogers, J., Wood, D.: From transparency to performance: Industry-based sustainability reporting on key issues. Technical Report, Hauser Center for Nonprofit Organizations at Harvard University (2010). https://iri.hks.harvard.edu/links/transparency-performance-industry-based-sustainability-reporting-key-issues

  18. Masson, C., Montariol, S.: Detecting omissions of risk factors in company annual reports. In: Proceedings of the Second Workshop on Financial Technology and Natural Language Processing, pp. 15–21 (2020)

    Google Scholar 

  19. McCann, B., Keskar, N.S., Xiong, C., Socher, R.: The natural language decathlon: multitask learning as question answering. arXiv preprint arXiv:1806.08730 (2018)

  20. McHugh, M.L.: Interrater reliability: the kappa statistic. Biochemia medica 22(3), 276–282 (2012)

    Article  Google Scholar 

  21. Merkl-Davies, D.M., Brennan, N.M., McLeay, S.J.: Impression management and retrospective sense-making in corporate narratives: a social psychology perspective. Account. Audit. Account. J. 24(3), 315–344 (2011), https://doi.org/10.1108/09513571111124036

  22. Montariol, S., Allauzen, A., Kitamoto, A.: Variations in word usage for the financial domain. In: Proceedings of the Second Workshop on Financial Technology and Natural Language Processing, pp. 8–14, Kyoto, Japan, 5 January 2020. https://aclanthology.org/2020.finnlp-1.2

  23. Peng, B., Chersoni, E., Hsu, Y.Y., Huang, C.R.: Is domain adaptation worth your investment? comparing BERT and FinBERT on financial tasks. In: Proceedings of the Third Workshop on Economics and Natural Language Processing, pp. 37–44, Association for Computational Linguistics, Punta Cana, Dominican Republic, November 2021. https://doi.org/10.18653/v1/2021.econlp-1.5, https://aclanthology.org/2021.econlp-1.5

  24. Phang, J., Févry, T., Bowman, S.R.: Sentence encoders on stilts: supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088 (2018)

  25. Pruksachatkun, Y., et al.: Intermediate-task transfer learning with pretrained language models: When and why does it work? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5231–5247, Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.467, https://aclanthology.org/2020.acl-main.467

  26. Purver, M., et al.: Tracking changes in ESG representation: initial investigations in UK annual reports. In: LREC 2022 Workshop Language Resources and Evaluation Conference 20–25 June 2022, pp. 9–14 (2022)

    Google Scholar 

  27. Reverte, C.: Corporate social responsibility disclosure and market valuation: evidence from Spanish listed firms. Rev. Manage. Sci. 10(2), 411–435 (2016)

    Article  Google Scholar 

  28. Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)

  29. SEC: Securities exchange act of 1934. Securities Exchange Act of 1934 (2012)

    Google Scholar 

  30. Slattery, D.: The power of language in corporate financial reports. Commun. Lang. Work 3(3), 53–63 (2014). https://doi.org/10.7146/claw.v1i3.16555

    Article  Google Scholar 

  31. Standley, T., Zamir, A., Chen, D., Guibas, L., Malik, J., Savarese, S.: Which tasks should be learned together in multi-task learning? In: International Conference on Machine Learning, pp. 9120–9132, PMLR (2020)

    Google Scholar 

  32. Stepišnik-Perdih, T., Pelicon, A., Škrlj, B., Žnidaršič, M., Lončarski, I., Pollak, S.: Sentiment classification by incorporating background knowledge from financial ontologies. In: Proceedings of the 4th FNP Workshop, 2022, to appear

    Google Scholar 

  33. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  34. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: analyzing and interpreting Neural Networks for NLP, pp. 353–355, Association for Computational Linguistics, Brussels, Belgium, November 2018. https://doi.org/10.18653/v1/W18-5446, https://aclanthology.org/W18-5446

  35. Worsham, J., Kalita, J.: Multi-task learning for natural language processing in the 2020s: where are we going? Pattern Recogn. Lett. 136, 120–126 (2020)

    Article  Google Scholar 

  36. Zhang, Y., Yang, Q.: A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 01, 1 (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Slovenian Research Agency (ARRS) grants for the core programme Knowledge technologies (P2-0103) and the project quantitative and qualitative analysis of the unregulated corporate financial reporting (J5-2554). We also want to thank the students of the SBE for their effort in data annotation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Syrielle Montariol .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Montariol, S. et al. (2023). Multi-task Learning for Features Extraction in Financial Annual Reports. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1753. Springer, Cham. https://doi.org/10.1007/978-3-031-23633-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23633-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23632-7

  • Online ISBN: 978-3-031-23633-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics