Graphical-character-based shredded Chinese document reconstruction | Multimedia Tools and Applications Skip to main content
Log in

Graphical-character-based shredded Chinese document reconstruction

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Paper documents are shredded into pieces by a shredder in what is currently a common means of ensuring text information security. Because such pieces have certain characteristics, such as being of large number and low discrimination, shredded document reconstruction by a reverse operation represents a challenge. However, recovering shredded documents is an important research aspect of digital forensics and has broad applicability in information security and judicial investigations. Researchers have proposed various feasible algorithms to restore shredded documents; however, most such algorithms are aimed at western language documents. Because of large differences between languages, these algorithms are difficult to apply to other language document reconstruction tasks directly. The Chinese language is used worldwide. Chinese documents are also widely used; accordingly, there are great demands for Chinese document reconstruction. This paper presents a complete shredded Chinese document reconstruction algorithm. According to the structural features of the characters, we apply graphics processing to the texts in pieces, the pieces are matched by graph assembling, and the shredded document is restored. We test the algorithm’s performance using an actual sample, and the experimental results show that the proposed method can effectively restore the shredded document. The average obtained accuracy is 85.78 %. Moreover, the algorithm is highly intelligent; a human only participates in the step that involves scanning the pieces, and the other calculation steps are automatically completed by the computer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Punctuation symbols described in this paper are defined by General Rules for Punctuation ( GB/T 15834–2011) promulgated by the General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China in 2011.

References

  1. Biswas A, Bhowmick P, Bhattacharya BB (2005) Reconstruction of torn documents using contour maps. In: Proc. IEEE International Conference on Image Processing, Volume 3. IEEE, Los Alamitos, CA, pp III-517-520. doi:10.1109/ICIP.2005.1530442

  2. Butler P, Chakraborty P, Ramakrishan N (2012) The Deshredder: A visual analytic approach to reconstructing shredded documents. In: Proceedings of the 2012 IEEE conference on visual analytics science and technology. IEEE. In: Los Alamitos, pp 113–122. doi:10.1109/VAST.2012.6400560

  3. Chan AH, Tsang SN, Ng AW (2014) Effects of line length, line spacing, and line number on proofreading performance and scrolling of Chinese text. Hum Factors 56:521–534. doi:10.1177/0018720813499368

    Article  Google Scholar 

  4. Cheng H (2015) Effects of font size and spacing on Chinese reading the newspaper material in urban low-age senior citizens. Masters Thesis. Tianjin Normal University

  5. De Smet P, De Bock J, Philips W (2005) Semiautomatic reconstruction of strip-shredded documents. Proc SPIE 5685:239–248. doi:10.1117/12.586340

    Article  Google Scholar 

  6. Freeman H (1961) On the encoding of arbitrary geometric configurations. IEEE Trans Electron Comput 2:260–268. doi:10.1109/TEC.1961.5219197

    Article  MathSciNet  Google Scholar 

  7. Freeman H, Garder L (1964) Apictorial jigsaw puzzles: the computer solution of a problem in pattern recognition. IEEE Trans Electron 2:118–127. doi:10.1109/PGEC.1964.263781

    Article  Google Scholar 

  8. Harwood D, Subbarao M, Hakalahti H, Davis LS (1987) A new class of edge-preserving smoothing filters. Pattern Recogn Lett 6:155–162. doi:10.1016/0167-8655(87)90002-X

    Article  Google Scholar 

  9. Justino E, Oliveira LS, Freitas C (2006) Reconstructing shredded documents through feature matching. Forensic Sci Int 160:140–147. doi:10.1016/j.forsciint.2005.09.001

    Article  Google Scholar 

  10. Li P, Fang X, Pan L, Piao Y, Jiao M (2014) Reconstruction of shredded paper documents by feature matching. Math Probl Eng 2014:514748. doi:10.1155/2014/514748

    Google Scholar 

  11. Lin HN (2009) The study of the length and width’s proportion of Chinese characters. Masters Thesis. National Chiao Tung University

  12. Lin H, Fan-Chiang W (2012) Reconstruction of shredded document based on image feature matching. Expert Syst Appl 39:3324–3332. doi:10.1016/j.eswa.2011.09.019

    Article  Google Scholar 

  13. Ng H (2006) Automatic thresholding for defect detection. Pattern Recogn Lett 27:1644–1649. doi:10.1016/j.patrec.2006.03.009

    Article  Google Scholar 

  14. Pan G (2006) Research on the Chinese punctuation since the twentieth Century. PhD Thesis. Central China Normal University

  15. Perl J, Diem M, Kleber F, Sablatnig R (2011) Strip shredded document reconstruction using optical character recognition. In: Proceedings of the 4th international conference on imaging for crime detection and prevention. IET, London, pp. 35–41. doi:10.1049/ic.2011.0132

    Google Scholar 

  16. Pimenta A, Justino E, Oliveira LS, Sabourin R (2009) Document reconstruction using dynamic programing. In: Proc. IEEE international conference on acoustics, speech, and signal processing. IEEE, Los Alamitos, CA. pp 1393–1396. doi:10.1109/ICASSP.2009.4959853

  17. Rumelhart DE, McClelland JL, the PDP Research Group (1986) Parallel distributed processing: explorations in the microstructure of cognition, vol 1: Foundations. MIT Press, Cambridge

    Google Scholar 

  18. Skeoch A (2006) An investigation into automated shredded document reconstruction using heuristic search algorithms. PhD Thesis. University of Bath, UK

    Google Scholar 

  19. Tsamoura E, Pitas I (2010) Automatic color based reassembly of fragmented images and paintings. IEEE Trans Image Process 19:680–690. doi:10.1109/TIP.2009.2035840

    Article  MathSciNet  Google Scholar 

  20. Ukovich A (2007) Image processiong for security applications: document reconstruction and video enhancement. PhD Thesis. University of Trieste

  21. Ukovich A, Ramponi G (2008) Feature extraction and clustering for the computer-aided reconstruction of strip-cut shredded documents. J Electron Imaging 17:013008–013013. doi:10.1117/1.2898551

    Article  Google Scholar 

  22. Ukovich A, Ramponi G, Doulaverakis H, Kompatsiaris Y, Strintzis MG (2004) Shredded document reconstruction using MPEG-7 standard descriptors. In: Proceedings of the 4th IEEE international symposium on signal processing and information technology. IEEE, Los Alamitos, CA, pp 334–337. doi:10.1109/ISSPIT.2004.1433788

  23. Zhang H, Lai JK, Bächer M (2012) Hallucination: A mixed-initiative approach for efficient document reconstruction. In: Proceedings of the Workshops at the 26th AAAI Conference on Artificial Intelligence, pp 121–128

  24. Zhao B, Zhou Y, Zhang Z, Na Y, Ma T (2014) Information quantity based automatic reconstruction of shredded Chinese documents. Proc IEEE 26th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, Los Alamitos, CA, pp 1016–1020

  25. Zhu L, Zhou Z, Hu D (2008) Globally consistent reconstruction of ripped-up documents. IEEE Trans Pattern Anal Mach Intell 30:1–13. doi:10.1109/TPAMI.2007.1163

    Article  Google Scholar 

Download references

Acknowledgments

Support for this program is provided by Xidian University. Additional support has been provided by Xi’an University of Technology.

I thank Professor Hong Zhu for her valuable suggestions and Dr. Pei Liu for his comments on the manuscript. I would also like to thank Yi Zhou and Jing Zhang for their technical assistance in the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Xing.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xing, N., Zhang, J. Graphical-character-based shredded Chinese document reconstruction. Multimed Tools Appl 76, 12871–12891 (2017). https://doi.org/10.1007/s11042-016-3685-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3685-7

Keywords