Application of modified Levenshtein distance for classification of noisy business document images
Paper
4 March 2022 Application of modified Levenshtein distance for classification of noisy business document images
Oleg Slavin, Elena Andreeva, Dmitry Putincev
Author Affiliations +
Proceedings Volume 12084, Fourteenth International Conference on Machine Vision (ICMV 2021); 120840B (2022) https://doi.org/10.1117/12.2623437
Event: Fourteenth International Conference on Machine Vision (ICMV 2021), 2021, Rome, Italy
Abstract
The paper considers the classification methods for business documents images data extracted after recognition. The peculiarities of the recognized text analysis are pointed out. The identification mechanism for the recognized words is described. The advantages and disadvantages of the Levenshtein distance are listed. Other string distance metrics are considered: Jaro–Winkler similarity, multiset metric, Most Frequent K Characters (MFKC) metric. The standard Levenshtein distance is compared with other string distance metrics. A modification of the Levenshtein distance is proposed, which is aimed at the peculiarities of recognized characters. The paper provides the experimental results illustrating the proposed distance application.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Oleg Slavin, Elena Andreeva, and Dmitry Putincev "Application of modified Levenshtein distance for classification of noisy business document images", Proc. SPIE 12084, Fourteenth International Conference on Machine Vision (ICMV 2021), 120840B (4 March 2022); https://doi.org/10.1117/12.2623437
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image classification

Optical character recognition

Image segmentation

Image processing

Back to Top