GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation | SpringerLink
Skip to main content

GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2024 (ICDAR 2024)

Abstract

Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: github.com/ayanban011/GraphKD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aditya, S., Saha, R., Yang, Y., Baral, C.: Spatial knowledge distillation to aid visual reasoning. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 227–235 (2019)

    Google Scholar 

  2. Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9163–9171 (2019)

    Google Scholar 

  3. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019)

    Google Scholar 

  4. Asi, A., Cohen, R., Kedem, K., El-Sana, J.: Simplifying the reading of historical manuscripts. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 826–830. IEEE (2015)

    Google Scholar 

  5. Banerjee, A., Biswas, S., Lladós, J., Pal, U.: SwinDocSegmenter: An end-to-end unified domain adaptive transformer for document instance segmentation. arXiv preprint arXiv:2305.04609 (2023)

  6. Biswas, S., Banerjee, A., Lladós, J., Pal, U.: DocSegTr: an instance-level end-to-end document image segmentation transformer. arXiv preprint arXiv:2201.11438 (2022)

  7. Chawla, A., Yin, H., Molchanov, P., Alvarez, J.: Data-free knowledge distillation for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3289–3298 (2021)

    Google Scholar 

  8. Chen, D., Mei, J.P., Zhang, H., Wang, C., Feng, Y., Chen, C.: Knowledge distillation with the reused teacher classifier. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)

    Google Scholar 

  9. Chen, D., et al.: Cross-layer distillation with semantic calibration. In: Proceedings of the AAAI Conference on Artificial Intelligence (2021)

    Google Scholar 

  10. Chen, J., Lopresti, D.: Table detection in noisy off-line handwritten documents. In: 2011 International Conference on Document Analysis and Recognition, pp. 399–403. IEEE (2011)

    Google Scholar 

  11. Chen, K., Seuret, M., Hennebert, J., Ingold, R.: Convolutional neural networks for page segmentation of historical document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 965–970. IEEE (2017)

    Google Scholar 

  12. Chen, K., Seuret, M., Liwicki, M., Hennebert, J., Ingold, R.: Page segmentation of historical document images with convolutional autoencoders. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1011–1015. IEEE (2015)

    Google Scholar 

  13. Chen, P., Liu, S., Zhao, H., Jia, J.: Distilling knowledge via knowledge review. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  14. Chen, Y., Chen, P., Liu, S., Wang, L., Jia, J.: Deep structured instance graph for distilling object detectors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4359–4368 (2021)

    Google Scholar 

  15. Chi, Z., et al: NormKD: normalized logits for knowledge distillation. arXiv preprint arXiv:2308.00520 (2023)

  16. Clausner, C., Antonacopoulos, A., Pletschacher, S.: ICDAR2019 competition on recognition of documents with complex layouts-rdcl2019. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1521–1526 (2019)

    Google Scholar 

  17. Coquenet, D., Chatelain, C., Paquet, T.: DAN: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45, 8227–8243 (2023)

    Google Scholar 

  18. Da, C., Luo, C., Zheng, Q., Yao, C.: Vision grid transformer for document layout analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19462–19472 (2023)

    Google Scholar 

  19. Dai, X., et al.: General instance distillation for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7842–7851 (2021)

    Google Scholar 

  20. De Rijk, P., Schneider, L., Cordts, M., Gavrila, D.: Structural knowledge distillation for object detection. Adv. Neural. Inf. Process. Syst. 35, 3858–3870 (2022)

    Google Scholar 

  21. Deng, Q., Ibrayim, M., Hamdulla, A., Zhang, C.: The yolo model that still excels in document layout analysis, pp. 1–10. Signal, Image and Video Processing pp (2023)

    Google Scholar 

  22. Douzon, T., Duffner, S., Garcia, C., Espinas, J.: Long-range transformer architectures for document understanding. In: Coustaty, M., Fornés, A. (eds.) International Conference on Document Analysis and Recognition, vol. 14194, pp. 47–64. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41501-2_4

  23. Fang, J., Gao, L., Bai, K., Qiu, R., Tao, X., Tang, Z.: A table detection method for multipage pdf documents via visual seperators and tabular structures. In: 2011 International Conference on Document Analysis and Recognition, pp. 779–783. IEEE (2011)

    Google Scholar 

  24. Fateh, A., Fateh, M., Abolghasemi, V.: Enhancing optical character recognition: efficient techniques for document layout analysis and text line detection. Eng. Rep., e12832 (2023)

    Google Scholar 

  25. Gong, L., et al.: Adaptive hierarchy-branch fusion for online knowledge distillation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 7731–7739 (2023)

    Google Scholar 

  26. Gou, J., Xiong, X., Yu, B., Du, L., Zhan, Y., Tao, D.: Multi-target knowledge distillation via student self-reflection. Int. J. Comput. Vis. 131, 1857–1874 (2023). https://doi.org/10.1007/s11263-023-01792-z

    Article  Google Scholar 

  27. Heo, B., Lee, M., Yun, S., Choi, J.Y.: Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3779–3787 (2019)

    Google Scholar 

  28. Huang, Y., Lv, T., Cui, L., Lu, Y., Wei, F.: LayoutLMv3: pre-training for document AI with unified text and image masking. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4083–4091 (2022)

    Google Scholar 

  29. Journet, N., Eglin, V., Ramel, J.Y., Mullot, R.: Text/graphic labelling of ancient printed documents. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), pp. 1010–1014. IEEE (2005)

    Google Scholar 

  30. Kang, Z., Zhang, P., Zhang, X., Sun, J., Zheng, N.: Instance-conditional knowledge distillation for object detection. Adv. Neural. Inf. Process. Syst. 34, 16468–16480 (2021)

    MATH  Google Scholar 

  31. Kise, K., Sato, A., Iwata, M.: Segmentation of page images using the area voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)

    Article  MATH  Google Scholar 

  32. Li, J., Xu, Y., Lv, T., Cui, L., Zhang, C., Wei, F.: DiT: self-supervised pre-training for document image transformer. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 3530–3539 (2022)

    Google Scholar 

  33. Li, K., et al.: Cross-domain document object detection: Benchmark suite and method. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12915–12924 (2020)

    Google Scholar 

  34. Li, X.-H., Yin, F., Liu, C.-L.: Page segmentation using convolutional neural network and graphical model. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 231–245. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_17

    Chapter  MATH  Google Scholar 

  35. Li, Z., et al.: When object detection meets knowledge distillation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 10555–10579 (2023)

    Google Scholar 

  36. Liao, H., et al.: DocTr: document transformer for structured information extraction in documents. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19584–19594 (2023)

    Google Scholar 

  37. Lin, G.S., Tu, J.C., Lin, J.Y.: Keyword detection based on RetinaNet and transfer learning for personal information protection in document images. Appl. Sci. 11(20), 9528 (2021)

    Article  MATH  Google Scholar 

  38. Lin, H., Han, G., Ma, J., Huang, S., Lin, X., Chang, S.F.: Supervised masked knowledge distillation for few-shot transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19649–19659 (2023)

    Google Scholar 

  39. Markewich, L., et al.: Segmentation for document layout analysis: not dead yet. Int. J. Doc. Anal. Recogn. (IJDAR) 25, 67–77 (2021). https://doi.org/10.1007/s10032-021-00391-3

    Article  Google Scholar 

  40. Mathur, P., et al et al.: LayerDoc: layer-wise extraction of spatial hierarchical structure in visually-rich documents. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3610–3620 (2023)

    Google Scholar 

  41. Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5191–5198 (2020)

    Google Scholar 

  42. Mooney, R.J., Bunescu, R.: Mining knowledge from text using information extraction. ACM SIGKDD Explorations Newsl 7(1), 3–10 (2005)

    Article  MATH  Google Scholar 

  43. Negrinho, R., Gormley, M., Gordon, G.J.: Learning beam search policies via imitation learning. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  44. Oliveira, S.A., Seguin, B., Kaplan, F.: dhSegment: a generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12. IEEE (2018)

    Google Scholar 

  45. Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.: DocLayNet: a large human-annotated dataset for document-layout segmentation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3743–3751 (2022)

    Google Scholar 

  46. Powalski, R., Borchmann, Ł, Jurkiewicz, D., Dwojak, T., Pietruszka, M., Pałka, G.: Going full-TILT boogie on document understanding with text-image-layout transformer. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 732–747. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_47

    Chapter  Google Scholar 

  47. Rahal, N., Vögtlin, L., Ingold, R.: Layout analysis of historical document images using a light fully convolutional network. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds.) International Conference on Document Analysis and Recognition, vol. 14191, pp. 325–341. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41734-4_20

  48. Saabni, R., El-Sana, J.: Language-independent text lines extraction using seam carving. In: 2011 International Conference on Document Analysis and Recognition, pp. 563–568. IEEE (2011)

    Google Scholar 

  49. Saha, R., Mondal, A., Jawahar, C.: Graphical object detection in document images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 51–58. IEEE (2019)

    Google Scholar 

  50. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017)

    Google Scholar 

  51. Shen, Z., Zhang, K., Dell, M.: A large dataset of historical Japanese documents with complex layouts. In: Proceedings of the IEEE Conference on CVPRW, pp. 548–549 (2020)

    Google Scholar 

  52. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)

    Google Scholar 

  53. Stanisławek, T., et al.: Kleister: key information extraction datasets involving long documents with complex layouts. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 564–579. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_36

    Chapter  MATH  Google Scholar 

  54. Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1314–1319. IEEE (2019)

    Google Scholar 

  55. Tang, Z., et al.: Unifying vision, text, and layout for universal document processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19254–19264 (2023)

    Google Scholar 

  56. Wang, Y., Weng, X., Kitani, K.: Joint detection and multi-object tracking with graph neural networks. arXiv preprint arXiv:2006.13164 (2020)

  57. Wu, A., Deng, C.: Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 847–856 (2022)

    Google Scholar 

  58. Wu, D., Chen, P., Yu, X., Li, G., Han, Z., Jiao, J.: Spatial self-distillation for object detection with inaccurate bounding boxes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6855–6865 (2023)

    Google Scholar 

  59. Wu, X., et al.: A region-based document VQA. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4909–4920 (2022)

    Google Scholar 

  60. Yang, H., Hsu, W.: Transformer-based approach for document layout understanding. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 4043–4047. IEEE (2022)

    Google Scholar 

  61. Yang, H., Hsu, W.H.: Vision-based layout detection from scientific literature using recurrent convolutional neural networks. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 6455–6462. IEEE (2021)

    Google Scholar 

  62. Yang, X., et al.: Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. Adv. Neural. Inf. Process. Syst. 34, 18381–18394 (2021)

    Google Scholar 

  63. Yang, Z., Zeng, A., Li, Z., Zhang, T., Yuan, C., Li, Y.: From knowledge distillation to self-knowledge distillation: a unified approach with normalized loss and customized soft labels. arXiv preprint arXiv:2303.13005 (2023)

  64. Zhang, L., Ma, K.: Structured knowledge distillation for accurate and efficient object detection. IEEE Trans. Pattern Anal. Mach. Intell. (2023)

    Google Scholar 

  65. Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  66. Zheng, Z., et al.: Localization distillation for dense object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9407–9416 (2022)

    Google Scholar 

  67. Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: largest dataset ever for document layout analysis. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1015–1022 (2019)

    Google Scholar 

  68. Zhong, Z., et al.: A hybrid approach to document layout analysis for heterogeneous document images. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds.) International Conference on Document Analysis and Recognition, vol. 14191, pp. 189–206. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41734-4_12

Download references

Acknowledgment

This work has been partially supported by the Spanish project PID2021-126808OB-I00, the Catalan project 2021 SGR 01559 and the PhD Scholarship from AGAUR (2021FIB-10010). The Computer Vision Center is part of the CERCA Program/Generalitat de Catalunya.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayan Banerjee .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 50408 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Banerjee, A., Biswas, S., Lladós, J., Pal, U. (2024). GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14806. Springer, Cham. https://doi.org/10.1007/978-3-031-70543-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70543-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70542-7

  • Online ISBN: 978-3-031-70543-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics