{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,25]],"date-time":"2024-08-25T08:36:56Z","timestamp":1724575016246},"reference-count":48,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2020,10,13]],"date-time":"2020-10-13T00:00:00Z","timestamp":1602547200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"Digital libraries offer access to a large number of handwritten historical documents. These documents are available as raw images and therefore their content is not searchable. A fully manual transcription is time-consuming and expensive while a fully automatic transcription is cheaper but not comparable in terms of accuracy. The performance of automatic transcription systems is strictly related to the composition of the training set. We propose a multi-step procedure that exploits a Keyword Spotting system and human validation for building up a training set in a time shorter than the one required by a fully manual procedure. The multi-step procedure was tested on a data set made up of 50 pages extracted from the Bentham collection. The palaeographer that transcribed the data set with the multi-step procedure instead of the fully manual procedure had a time gain of 52.54%. Moreover, a small size training set that allowed the keyword spotting system to show a precision value greater than the recall value was built with the multi-step procedure in a time equal to 35.25% of the time required for annotating the whole data set.<\/jats:p>","DOI":"10.3390\/jimaging6100109","type":"journal-article","created":{"date-parts":[[2020,10,14]],"date-time":"2020-10-14T01:48:38Z","timestamp":1602640118000},"page":"109","source":"Crossref","is-referenced-by-count":5,"title":["One Step Is Not Enough: A Multi-Step Procedure for Building the Training Set of a Query by String Keyword Spotting System to Assist the Transcription of Historical Document"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-2911-9737","authenticated-orcid":false,"given":"Antonio","family":"Parziale","sequence":"first","affiliation":[{"name":"Department of Information and Electrical Engineering and Applied Mathematics, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy"}]},{"given":"Giuliana","family":"Capriolo","sequence":"additional","affiliation":[{"name":"Department of Cultural Heritage, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-2019-2826","authenticated-orcid":false,"given":"Angelo","family":"Marcelli","sequence":"additional","affiliation":[{"name":"Department of Information and Electrical Engineering and Applied Mathematics, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy"}]}],"member":"1968","published-online":{"date-parts":[[2020,10,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1016\/j.patcog.2017.02.023","article-title":"A survey of document image word spotting techniques","volume":"68","author":"Giotis","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Pratikakis, I., Zagoris, K., Gatos, B., Louloudis, G., and Stamatopoulos, N. (2014, January 1\u20134). ICFHR 2014 Competition on Handwritten Keyword Spotting (H-KWS 2014). Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition, Heraklion, Greece.","DOI":"10.1109\/ICFHR.2014.142"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"S\u00e1nchez, J.A., Romero, V., Toselli, A.H., and Vidal, E. (2014, January 1\u20134). ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS). Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition, Heraklion, Greece.","DOI":"10.1109\/ICFHR.2014.137"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Puigcerver, J., Toselli, A.H., and Vidal, E. (2015, January 23\u201326). ICDAR2015 Competition on Keyword Spotting for Handwritten Documents. Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia.","DOI":"10.1109\/ICDAR.2015.7333946"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2066","DOI":"10.1109\/TPAMI.2011.22","article-title":"Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition","volume":"33","author":"Menasri","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"855","DOI":"10.1109\/TPAMI.2008.137","article-title":"A novel connectionist system for unconstrained handwriting recognition","volume":"31","author":"Graves","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ahmad, A.R., Viard-Gaudin, C., and Khalid, M. (2009, January 26\u201329). Lexicon-Based Word Recognition Using Support Vector Machine and Hidden Markov Model. Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain.","DOI":"10.1109\/ICDAR.2009.248"},{"key":"ref_8","first-page":"767","article-title":"Improving offline handwritten text recognition with hybrid HMM\/ANN models","volume":"33","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Toselli, A.H., Vidal, E., and Casacuberta, F. (2011). Multimodal Interactive Pattern Recognition and Applications, Springer.","DOI":"10.1007\/978-0-85729-479-1"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"S\u00e1nchez, J.A., M\u00fchlberger, G., Gatos, B., Schofield, P., Depuydt, K., Davis, R.M., Vidal, E., and de Does, J. (2013, January 10\u201313). tranScriptorium: A european project on handwritten text recognition. Proceedings of the 2013 ACM Symposium on Document Engineering, Florence, Italy.","DOI":"10.1145\/2494266.2494294"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Manmatha, R., Han, C., and Riseman, E. (1996, January 18\u201320). Word spotting: A new approach to indexing handwriting. Proceedings of the CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.1996.517139"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Rath, T.M., Manmatha, R., and Lavrenko, V. (2004, January 25\u201329). A search engine for historical manuscript images. Proceedings of the 27th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield South Yorkshire, UK.","DOI":"10.1145\/1008992.1009056"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Vitri\u00e0, J., Sanches, J.M., and Hern\u00e1ndez, M. (2011). Handwritten Word Spotting in Old Manuscript Images Using a Pseudo-structural Descriptor Organized in a Hash Structure. Iberian Conference on Pattern Recognition and Image Analysis, Proceedings of the IbPRIA 2011: Pattern Recognition and Image Analysis, Las Palmas de Gran Canaria, Spain, 8\u201310 June 2011, Springer.","DOI":"10.1007\/978-3-642-21257-4"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Forn\u00e9s, A., Frinken, V., Fischer, A., Almaz\u00e1n, J., Jackson, G., and Bunke, H. (2011, January 16\u201317). A keyword spotting approach using blurred shape model-based descriptors. Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, Beijing, China.","DOI":"10.1145\/2037342.2037356"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Vidal, E., Toselli, A.H., and Puigcerver, J. (2015, January 23\u201326). High performance Query-by-Example keyword spotting using Query-by-String techniques. Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia.","DOI":"10.1109\/ICDAR.2015.7333860"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Almaz\u00e1n, J., Gordo, A., Forn\u00e9s, A., and Valveny, E. (2013, January 1\u20138). Handwritten Word Spotting with Corrected Attributes. Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.130"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Kumar, G., and Govindaraju, V. (2014, January 24\u201328). Bayesian Active Learning for Keyword Spotting in Handwritten Documents. Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden.","DOI":"10.1109\/ICPR.2014.356"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Rothacker, L., and Fink, G.A. (2015, January 23\u201326). Segmentation-free query-by-string word spotting with bag-of-features HMMs. Proceedings of the 2015 13th International conference on document analysis and recognition (ICDAR), Tunis, Tunisia.","DOI":"10.1109\/ICDAR.2015.7333844"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Santoro, A., Parziale, A., and Marcelli, A. (2016, January 23\u201326). A human in the loop approach to historical handwritten documents transcription. Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China.","DOI":"10.1109\/ICFHR.2016.0051"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1016\/j.patcog.2018.11.017","article-title":"Word spotting and recognition via a joint deep embedding of image and text","volume":"88","author":"Mhiri","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Retsinas, G., Louloudis, G., Stamatopoulos, N., Sfikas, G., and Gatos, B. (2019, January 16\u201320). An alternative deep feature approach to line level keyword spotting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01294"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wolf, F., and Fink, G.A. (2020). Annotation-free Learning of Deep Representations for Word Spotting using Synthetic Data and Self Labeling. arXiv.","DOI":"10.1007\/978-3-030-57058-3_21"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1007\/s10032-018-0295-0","article-title":"Attribute CNNs for word spotting in handwritten documents","volume":"21","author":"Sudholt","year":"2018","journal-title":"Int. J. Doc. Anal. Recognit. (IJDAR)"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Krishnan, P., Dutta, K., and Jawahar, C. (2018, January 24\u201327). Word spotting and recognition using deep embedding. Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria.","DOI":"10.1109\/DAS.2018.70"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Westphal, F., Grahn, H., and Lavesson, N. (2020, January 27\u201329). Representative Image Selection for Data Efficient Word Spotting. Proceedings of the 14th IAPR International Workshop on Document Analysis Systems (DAS), Wuhan, China.","DOI":"10.1007\/978-3-030-57058-3_27"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Benabdelaziz, R., Gaceb, D., and Haddad, M. (2020, January 16\u201317). Word-Spotting approach using transfer deep learning of a CNN network. Proceedings of the 2020 1st International Conference on Communications, Control Systems and Signal Processing (CCSSP), EL OUED, Algeria.","DOI":"10.1109\/CCSSP49278.2020.9151583"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.patrec.2018.03.030","article-title":"Filters for graph-based keyword spotting in historical handwritten documents","volume":"134","author":"Stauffer","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Gurjar, N., Sudholt, S., and Fink, G.A. (2018, January 24\u201327). Learning deep representations for word spotting under weak supervision. Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria.","DOI":"10.1109\/DAS.2018.35"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Vats, E., Hast, A., and Forn\u00e9s, A. (2019, January 20\u201325). Training-free and segmentation-free word spotting using feature matching and query expansion. Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia.","DOI":"10.1109\/ICDAR.2019.00209"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1007\/s10032-019-00344-x","article-title":"Document analysis systems that improve with use","volume":"23","author":"Nagy","year":"2020","journal-title":"Int. J. Doc. Anal. Recognit. (IJDAR)"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mas, J., Forn\u00e9s, A., and Llad\u00f3s, J. (2016, January 11\u201314). An interactive transcription system of census records using word-spotting based information transfer. Proceedings of the 2016 12th IAPR Workshop on Document Analysis Systems (DAS), Santorini, Greece.","DOI":"10.1109\/DAS.2016.47"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Chen, J., Riba, P., Forn\u00e9s, A., Mas, J., Llad\u00f3s, J., and Pujadas-Mora, J.M. (2018, January 5\u20138). Word-hunter: A gamesourcing experience to validate the transcription of historical manuscripts. Proceedings of the 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara Falls, NY, USA.","DOI":"10.1109\/ICFHR-2018.2018.00098"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/j.patrec.2020.01.007","article-title":"Using keyword spotting systems as tools for the transcription of historical handwritten documents: Models and procedures for performance evaluation","volume":"131","author":"Santoro","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Clausner, C., Pletschacher, S., and Antonacopoulos, A. (2011, January 18\u201321). Aletheia-an advanced document layout and text ground-truthing system for production environments. Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China.","DOI":"10.1109\/ICDAR.2011.19"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Colutto, S., Kahle, P., Guenter, H., and Muehlberger, G. (2019, January 24\u201327). Transkribus. A Platform for Automated Text Recognition and Searching of Historical Documents. Proceedings of the 2019 15th International Conference on eScience (eScience), San Diego, CA, USA.","DOI":"10.1109\/eScience.2019.00060"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1016\/j.patcog.2009.05.007","article-title":"Handwritten document image segmentation into text lines and words","volume":"43","author":"Papavassiliou","year":"2010","journal-title":"Pattern Recognit."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1109\/TSMC.1979.4310076","article-title":"A Threshold Selection Method from Gray-Level Histograms","volume":"9","author":"Otsu","year":"1979","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Cordella, L.P., De Stefano, C., Marcelli, A., and Santoro, A. (2010, January 23\u201326). Writing Order Recovery from Off-Line Handwriting by Graph Traversal. Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey.","DOI":"10.1109\/ICPR.2010.467"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1139","DOI":"10.1142\/S021800140400368X","article-title":"A saliency-based segmentation method for online cursive handwriting","volume":"18","author":"Guadagno","year":"2004","journal-title":"Int. J. Pattern Recognit. Artif. Intell."},{"key":"ref_40","unstructured":"Senatore, R., and Marcelli, A. (2013, January 11\u201313). Where are the characters? Characters segmentation in annotated cursive handwriting. Proceedings of the 16th IGS Conference, Nara, Japan."},{"key":"ref_41","unstructured":"Marcelli, A., and Stefano, C.D. (2005). Detecting Handwriting Primitives in Cursive Words by Stroke Sequence Matching. Advances in Graphonomics, Zona Editrice."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"De Stefano, C., Marcelli, A., Parziale, A., and Senatore, R. (2010, January 16\u201318). Reading cursive handwriting. Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition, Kolkata, India.","DOI":"10.1109\/ICFHR.2010.21"},{"key":"ref_43","unstructured":"Long, D.G., and Milne, A.T. (1981). The Manuscripts of Jeremy Bentham: A Chronological Index to the Collection in the Library of University College London, The Bentham Committee, University College."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Santoro, A., and Marcelli, A. (2019, January 20\u201325). A Novel Procedure to Speed up the Transcription of Historical Handwritten Documents by Interleaving Keyword Spotting and user Validation. Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia.","DOI":"10.1109\/ICDAR.2019.00198"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1080\/00949658608810963","article-title":"An omnibus test for the two-sample problem using the empirical characteristic function","volume":"26","author":"Epps","year":"1986","journal-title":"J. Stat. Comput. Simul."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Romero, V., and S\u00e1nchez, J.A. (2013, January 25\u201328). Human Evaluation of the Transcription Process of a Marriage License Book. Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA.","DOI":"10.1109\/ICDAR.2013.254"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1093\/llc\/fqw064","article-title":"Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription","volume":"33","author":"Toselli","year":"2018","journal-title":"Digit. Scholarsh. Humanit."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Zagoris, K., Pratikakis, I., and Gatos, B. (2015, January 22). A framework for efficient transcription of historical documents using keyword spotting. Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing, Nancy, France.","DOI":"10.1145\/2809544.2809557"}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/6\/10\/109\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,4]],"date-time":"2024-07-04T05:52:08Z","timestamp":1720072328000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/6\/10\/109"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,13]]},"references-count":48,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2020,10]]}},"alternative-id":["jimaging6100109"],"URL":"https:\/\/doi.org\/10.3390\/jimaging6100109","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,10,13]]}}}