Abstract
Paper documents are shredded into pieces by a shredder in what is currently a common means of ensuring text information security. Because such pieces have certain characteristics, such as being of large number and low discrimination, shredded document reconstruction by a reverse operation represents a challenge. However, recovering shredded documents is an important research aspect of digital forensics and has broad applicability in information security and judicial investigations. Researchers have proposed various feasible algorithms to restore shredded documents; however, most such algorithms are aimed at western language documents. Because of large differences between languages, these algorithms are difficult to apply to other language document reconstruction tasks directly. The Chinese language is used worldwide. Chinese documents are also widely used; accordingly, there are great demands for Chinese document reconstruction. This paper presents a complete shredded Chinese document reconstruction algorithm. According to the structural features of the characters, we apply graphics processing to the texts in pieces, the pieces are matched by graph assembling, and the shredded document is restored. We test the algorithm’s performance using an actual sample, and the experimental results show that the proposed method can effectively restore the shredded document. The average obtained accuracy is 85.78 %. Moreover, the algorithm is highly intelligent; a human only participates in the step that involves scanning the pieces, and the other calculation steps are automatically completed by the computer.












Similar content being viewed by others
Notes
Punctuation symbols described in this paper are defined by General Rules for Punctuation ( GB/T 15834–2011) promulgated by the General Administration of Quality Supervision, Inspection and Quarantine of the People’s Republic of China in 2011.
References
Biswas A, Bhowmick P, Bhattacharya BB (2005) Reconstruction of torn documents using contour maps. In: Proc. IEEE International Conference on Image Processing, Volume 3. IEEE, Los Alamitos, CA, pp III-517-520. doi:10.1109/ICIP.2005.1530442
Butler P, Chakraborty P, Ramakrishan N (2012) The Deshredder: A visual analytic approach to reconstructing shredded documents. In: Proceedings of the 2012 IEEE conference on visual analytics science and technology. IEEE. In: Los Alamitos, pp 113–122. doi:10.1109/VAST.2012.6400560
Chan AH, Tsang SN, Ng AW (2014) Effects of line length, line spacing, and line number on proofreading performance and scrolling of Chinese text. Hum Factors 56:521–534. doi:10.1177/0018720813499368
Cheng H (2015) Effects of font size and spacing on Chinese reading the newspaper material in urban low-age senior citizens. Masters Thesis. Tianjin Normal University
De Smet P, De Bock J, Philips W (2005) Semiautomatic reconstruction of strip-shredded documents. Proc SPIE 5685:239–248. doi:10.1117/12.586340
Freeman H (1961) On the encoding of arbitrary geometric configurations. IEEE Trans Electron Comput 2:260–268. doi:10.1109/TEC.1961.5219197
Freeman H, Garder L (1964) Apictorial jigsaw puzzles: the computer solution of a problem in pattern recognition. IEEE Trans Electron 2:118–127. doi:10.1109/PGEC.1964.263781
Harwood D, Subbarao M, Hakalahti H, Davis LS (1987) A new class of edge-preserving smoothing filters. Pattern Recogn Lett 6:155–162. doi:10.1016/0167-8655(87)90002-X
Justino E, Oliveira LS, Freitas C (2006) Reconstructing shredded documents through feature matching. Forensic Sci Int 160:140–147. doi:10.1016/j.forsciint.2005.09.001
Li P, Fang X, Pan L, Piao Y, Jiao M (2014) Reconstruction of shredded paper documents by feature matching. Math Probl Eng 2014:514748. doi:10.1155/2014/514748
Lin HN (2009) The study of the length and width’s proportion of Chinese characters. Masters Thesis. National Chiao Tung University
Lin H, Fan-Chiang W (2012) Reconstruction of shredded document based on image feature matching. Expert Syst Appl 39:3324–3332. doi:10.1016/j.eswa.2011.09.019
Ng H (2006) Automatic thresholding for defect detection. Pattern Recogn Lett 27:1644–1649. doi:10.1016/j.patrec.2006.03.009
Pan G (2006) Research on the Chinese punctuation since the twentieth Century. PhD Thesis. Central China Normal University
Perl J, Diem M, Kleber F, Sablatnig R (2011) Strip shredded document reconstruction using optical character recognition. In: Proceedings of the 4th international conference on imaging for crime detection and prevention. IET, London, pp. 35–41. doi:10.1049/ic.2011.0132
Pimenta A, Justino E, Oliveira LS, Sabourin R (2009) Document reconstruction using dynamic programing. In: Proc. IEEE international conference on acoustics, speech, and signal processing. IEEE, Los Alamitos, CA. pp 1393–1396. doi:10.1109/ICASSP.2009.4959853
Rumelhart DE, McClelland JL, the PDP Research Group (1986) Parallel distributed processing: explorations in the microstructure of cognition, vol 1: Foundations. MIT Press, Cambridge
Skeoch A (2006) An investigation into automated shredded document reconstruction using heuristic search algorithms. PhD Thesis. University of Bath, UK
Tsamoura E, Pitas I (2010) Automatic color based reassembly of fragmented images and paintings. IEEE Trans Image Process 19:680–690. doi:10.1109/TIP.2009.2035840
Ukovich A (2007) Image processiong for security applications: document reconstruction and video enhancement. PhD Thesis. University of Trieste
Ukovich A, Ramponi G (2008) Feature extraction and clustering for the computer-aided reconstruction of strip-cut shredded documents. J Electron Imaging 17:013008–013013. doi:10.1117/1.2898551
Ukovich A, Ramponi G, Doulaverakis H, Kompatsiaris Y, Strintzis MG (2004) Shredded document reconstruction using MPEG-7 standard descriptors. In: Proceedings of the 4th IEEE international symposium on signal processing and information technology. IEEE, Los Alamitos, CA, pp 334–337. doi:10.1109/ISSPIT.2004.1433788
Zhang H, Lai JK, Bächer M (2012) Hallucination: A mixed-initiative approach for efficient document reconstruction. In: Proceedings of the Workshops at the 26th AAAI Conference on Artificial Intelligence, pp 121–128
Zhao B, Zhou Y, Zhang Z, Na Y, Ma T (2014) Information quantity based automatic reconstruction of shredded Chinese documents. Proc IEEE 26th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, Los Alamitos, CA, pp 1016–1020
Zhu L, Zhou Z, Hu D (2008) Globally consistent reconstruction of ripped-up documents. IEEE Trans Pattern Anal Mach Intell 30:1–13. doi:10.1109/TPAMI.2007.1163
Acknowledgments
Support for this program is provided by Xidian University. Additional support has been provided by Xi’an University of Technology.
I thank Professor Hong Zhu for her valuable suggestions and Dr. Pei Liu for his comments on the manuscript. I would also like to thank Yi Zhou and Jing Zhang for their technical assistance in the experiments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Xing, N., Zhang, J. Graphical-character-based shredded Chinese document reconstruction. Multimed Tools Appl 76, 12871–12891 (2017). https://doi.org/10.1007/s11042-016-3685-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3685-7