Head Tail Open: Open Tailed Classification of Imbalanced Document Data | SpringerLink
Skip to main content

Head Tail Open: Open Tailed Classification of Imbalanced Document Data

  • Conference paper
  • First Online:
Intelligent Computing (SAI 2024)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 1016))

Included in the following conference series:

  • 248 Accesses

Abstract

Deep learning models for scanned document image classification and form understanding have made significant progress in the last few years. High accuracy can be achieved by a model with the help of copious amounts of labelled training data for closed-world classification. However, very little work has been done in the domain of fine-grained and head-tailed (class imbalance with some classes having high numbers of data points and some having a low number of data points) open-world classification for documents. Our proposed method achieves a better classification results than the baseline of the head-tail-novel/open dataset. Our techniques include separating the head-tail classes and transferring the knowledge from head data to the tail data. This transfer of knowledge also improves the capability of recognizing a novel category by 15% as compared to the baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 20591
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
JPY 31459
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Afzal, M.Z., Kölsch, A., Ahmed, S., Liwicki, M.: Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 883–888. IEEE (2017)

    Google Scholar 

  2. Bendale, A., Boult, T.: Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1893–1902 (2015)

    Google Scholar 

  3. Bendale, A., Boult, T.E.: Towards open set deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1563–1572 (2016)

    Google Scholar 

  4. Claassen, L., et al.: Using family history information to promote healthy lifestyles and prevent diseases; a discussion of the evidence. BMC Public Health 10(1), 1–7 (2010)

    Article  Google Scholar 

  5. Das, A., Roy, S., Bhattacharya, U., Parui, S.K.: Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3180–3185. IEEE (2018)

    Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995. IEEE (2015)

    Google Scholar 

  8. Huang, Y., Lv, T., Cui, L., Lu, Y., Wei, F.: Layoutlmv3: pre-training for document ai with unified text and image masking. arXiv preprint arXiv:2204.08387 (2022)

  9. Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

  10. Kölsch, A., Afzal, M.Z., Ebbecke, M., Liwicki, M.: Real-time document image classification using deep CNN and extreme learning machines. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1318–1323. IEEE (2017)

    Google Scholar 

  11. Lewis, D., Agam, G., Argamon, S., Frieder, O., Grossman, D., Heard, J.: Building a test collection for complex document information processing. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 665–666 (2006)

    Google Scholar 

  12. Lima, A.: Family history and genealogy: the benefits for the listener, the storyteller and the community. J. Cape Verdean Stud. 4(1), 5 (2019)

    Google Scholar 

  13. Scheirer, W.J., de Rezende Rocha, A., Sapkota, A., Boult, T.E.: Toward open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(7), 1757–1772 (2012)

    Article  Google Scholar 

  14. Scheirer, W.J., Jain, L.P., Boult, T.E.: Probability models for open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2317–2324 (2014)

    Article  Google Scholar 

  15. Shu, L., Xu, H., Liu, B.: Doc: deep open classification of text documents. arXiv preprint arXiv:1709.08716 (2017)

  16. Shu, L., Xu, H., Liu, B.: Unseen class discovery in open-world classification. arXiv preprint arXiv:1801.05609 (2018)

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  18. Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 1–11 (2017)

    Google Scholar 

  19. Xu, Y., et al.: Layoutlmv2: multi-modal pre-training for visually-rich document understanding. arXiv preprint arXiv:2012.14740 (2020)

  20. Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1192–1200 (2020)

    Google Scholar 

  21. Xu, Y., et al.: Layoutxlm: multimodal pre-training for multilingual visually-rich document understanding. arXiv preprint arXiv:2104.08836 (2021)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chetan Joshi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Joshi, C. et al. (2024). Head Tail Open: Open Tailed Classification of Imbalanced Document Data. In: Arai, K. (eds) Intelligent Computing. SAI 2024. Lecture Notes in Networks and Systems, vol 1016. Springer, Cham. https://doi.org/10.1007/978-3-031-62281-6_14

Download citation

Publish with us

Policies and ethics