Abstract
String matching is one of the essential problems in computer science. The language used in Pakistan is Urdu. For Urdu language texts, its characters are encoded by utf-8, and the utf-8 is a length-variable encoding. If we implement string matching algorithms for Urdu language texts by ASCII encoding, the correct matched positions may not be obtained. This paper analyzes the characteristics of Urdu language texts and studies the character encoding presentation for Urdu language texts and recognizes that the correct matched positions can be obtained when the wchar_t type and Unicode encoding is used to process Urdu language texts, then, this paper implements parallel algorithms for Boyer-Moore string matching, Knuth-Morris-Pratt string matching, and Sunday string matching for Urdu language texts and evaluate the execution performance of these four string matching algorithms on a large number of Urdu language patterns and text strings via experimental testing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Al-Dabbagh, S.S., Barnouti, N.H., Naser, M.A., Ali, Z.G.: Parallel quick search algorithm for the exact string matching problem using openMP. J. Comput. Commun. 4(13), 1–11 (2016)
Ďurian, B., Holub, J., Peltola, H., Tarhio, J.: Improving practical exact string matching. Inf. Process. Lett. 110(4), 148–152 (2010)
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)
Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)
Hume, A., Sunday, D.: Fast string searching. Softw. Pract. Exp. 21(11), 1221–1248 (1991)
Ajwani, D., Dementiev, R., Meyer, U.: A computational study of external-memory BFS algorithms. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 601–610. Society for Industrial and Applied Mathematics (2006)
Korf, R.E.: Depth-first iterative-deepening: an optimal admissible tree search. Artif. Intell. 27(1), 97–109 (1985)
Sunday, D.M.: A very fast substring search algorithm. Commun. ACM 33(8), 132–142 (1990)
Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Hussain, S., Afzal, M.: Urdu computing standards: urdu zabta takhti (UZT) 1.01. In: Proceedings of IEEE International 2001 Multi Topic Conference, pp. 223–228. IEEE (2001)
ISO GENEVA: ISO/IEC 14651: Information Technology - International String Ordering and Comparison- Method for Comparing Character Strings and Description of Common Template Tailorable Ordering. ISO, Geneva, Switzerland, MuqtadraQaumizuban, Jadeed Urdu Lulghat (جديداردولغت), MuqtadraQaumiZuban, Islamabad, Pakistan (2001). www.iso.ch
Afzal, M.: Urdu Software Industry: Prospects, Problems and Need for Standards. Science Vision Corp. (1999)
Hussain, S., Gul, S., Waseem, A.: Developing lexicographic sorting: an example for Urdu. ACM Trans. Asian Lang. Inf. Process. 6(3) (2007). Article no. 10
Claude, F., Navarro, G., Peltola, H., et al.: String matching with alphabet sampling. J. Discret. Algorithms 1, 37–50 (2012)
Erdem, O.: Tree-based string pattern matching on FPGAs. Comput. Electr. Eng. 49, 117–133 (2016)
Al-Ssulami, A.M., Mathkour, H.: Faster string matching based on hashing and bit-parallelism. Inf. Process. Lett. 123, 51–55 (2017)
Chung, K.-L.: O(1)-time parallel string-matching algorithm with VLDCs. Pattern Recogn. Lett. 17(5), 475–479 (1996)
Park, J.H., George, K.M.: Efficient parallel hardware algorithms for string matching. Microprocess. Microsyst. 23(3), 155–168 (1999)
Pungila, C.-P., Reja, M., Negru, V.: Efficient parallel automata construction for hybrid resource-impelled data-matching. Future Gener. Comput. Syst. 36, 31–41 (2014)
Qu, J., Zhang, G., Fang, Z., et al.: A parallel Aho-Corasick algorithm with non-deterministic finite automaton based on OpenMP. In: Proceedings of 2015 Seventh International Conference on Advanced Communication and Networking, pp. 52–55. IEEE (2015)
Trana, T.T., Liu, Y., Schmidta, B.: Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi. Parallel Comput. 54, 128–138 (2016)
Yoon, J.M., Choi, K.-I., Kim, H.J.: A memory accessing method for the parallel Aho-Corasick algorithm on GPU. In: Proceedings of 2016 International Conference on Information Science and Security, pp. 1–3. IEEE (2016)
Lin, C.-H., Li, J.-C., Liu, C.-H., et al.: Perfect hashing based parallel algorithms for multiple string matching on graphic processing units. IEEE Trans. Parallel Distrib. Syst. 28(9), 2639–2650 (2017)
Zengin, S., Schmidt, E.G.: A fast and accurate hardware string matching module with Bloom filters. IEEE Trans. Parallel Distrib. Syst. 28(2), 305–317 (2017)
Baúto, J., Canelas, A., Neves, R., Horta, N.: Parallel SAX/GA for financial pattern matching using NVIDIA’s GPU. Expert Syst. Appl. 105, 77–88 (2018)
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20, 762–772 (1977)
Lee, C.Y.: An algorithm for path connections and its applications. IRE Trans. Electron. Comput. EC-10(3), 346–365 (1961)
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(1), 323–350 (1977)
Acknowledgments
This work was partly supported by the National Natural Science Foundation of China (No. 61762010).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Baig, M.B., Li, T.S. (2019). Parallel String Matching for Urdu Language Text. In: Bajwa, I., Kamareddine, F., Costa, A. (eds) Intelligent Technologies and Applications. INTAP 2018. Communications in Computer and Information Science, vol 932. Springer, Singapore. https://doi.org/10.1007/978-981-13-6052-7_32
Download citation
DOI: https://doi.org/10.1007/978-981-13-6052-7_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6051-0
Online ISBN: 978-981-13-6052-7
eBook Packages: Computer ScienceComputer Science (R0)