Abstract
In this paper, we present an algorithm to solve the problem of multi-pattern matching with a fixed number of wildcards. The method requires each pattern in a pattern set P to be partitioned into l length blocks, and then, the Euclidean distance is calculated between each first block b y,1 of patterns and every possible alignment of the text t. If the Euclidean distance at position i is 0, then the block b y,1 of pattern p y matches the text t at position i. The Euclidean distance values are used as hash values to check the matches of the remaining blocks of the partially matched pattern. The complexity of our algorithm is O(k n log l + o + d) time. Where n is the length of the text, l is the length of the blocks, k is the number of patterns, o is the number of blocks that match using the hash values and d is the number of wildcard symbols in the blocks that match using the hash values. The major advantages of our algorithm are that (a) it can find the matches of long patterns efficiently, (b) if the alphabet size σ is large, the algorithm is still efficient and (c) it supports non-interrupted pattern updates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)
Fischer, M., Paterson, M.: String matching and other products. In: Karp, R., (ed.) Proceedings of the 7th SIAMAMS Complexity of Computation, pp. 113–125 (1974)
Muthukrishan, S., Palem, K.: Non-standard stringology: algorithms and complexity. In: Proceedings of the 26th Symposium on the Theory of Computing, Canada (1994)
Barton, C., Iliopoulos, C.S.: On the average-case complexity of pattern matching with wildcards. CoRR, abs/1407.0950 (2014)
Indyk, P.: Faster algorithms for string matching problems: matching the convolution bound. In: Proceedings of the 38th Annual Symposium on Foundations of Computer Science, pp. 166–173 (1998)
Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: [7] Proceedings of the Annual ACM Symposium on Theory of Computing, pp. 592–601 (2002)
Rahman, M., Iliopoulos, C.: Pattern matching algorithms with don’t cares. SOFSEM 2, 116–126 (2007)
Linhart, C., Shamir, R.: Faster pattern matching with character classes using prime number encoding. J. Comput. Syst. Sci. 75(3), 155–162 (2009)
Kucherov, G., Rusinowitch, M.: Matching a set of strings with variable length don’t cares. Theor. Comput. Sci. 178(1–2), 129–154 (1997)
Zhang, M., Zhang, Y., Hu, L.: A faster algorithm for matching a set of patterns with variable length don’t cares. Inf. Process. Lett. 110(6), 216–220 (2010)
Zhang, M., Zhang, Y., Tang, J.: Multi-pattern matching with wildcards. In: Proceeding of Third International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010) (2010)
Chan, H.-L., Hon, W.-K., Lam, T.W., Sadakane, K.: Compressed indexes for dynamic text collections. ACM Trans. Algorithms 3(2) (2007)
Bertossi, A.A., Logi, F.: Parallel string matching with variable length don’t cares. J. Parallel Distrib. Comput. 22, 229–234 (1994)
Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T., Seiferas, J.: The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40, 31–55 (1985)
Ding, B., Lo, D., Han, J., Khoo, S.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of the 25th IEEE International Conference on Data Engineering, pp. 1024–1035 (2009)
Wu, X., Zhu, X., He, Y., Arslan, A.N.: PMBC: pattern mining from biological sequences with wildcard constraints. Comput. Biol. Med. 43(5), 481–492 (2013)
Qiang, J., Guo, D., Fang, Y., Tian, W., Hu, X.: Multiple pattern matching with wildcards and one-off condition. J. Comput. Inf. Syst. 9, 14 (2013)
Guo, D., Hu, X., Xie, F., Wu, X.: Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl. Intell. 39(1), 57–74 (2013)
Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with application to protein searching. In: Proceedings of the 5th Annual International Conference on Computational Biology, pp. 231–240 (2001)
Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Biol. 12(8), 1065–1082 (2005)
Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th Annual ACM Symposium on the Theory of Computing, pp. 91–100 (2004)
Haapasalo, T., Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Online dictionary matching with variable-length gaps. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 76–87. Springer, Heidelberg (2011)
Arslan, A.N., He, D., He, Y., Wu, X.: Pattern matching with wildcards and length constraints using maximum network flow. J. Discrete Algorithms 35(C), 9–16 (2015)
Kalai,A.: Efficient pattern-matching with don’t cares. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 655–656 (2002)
Acknowledgment
I would like to express my gratitude to Jilin University for providing me with a full scholarship and an excellent environment for my Ph.D. study and research. Furthermore, my appreciation extends to Dr. Meng Zhang for his continuous support and discussions.
Conflict of Interest. The author declares no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Saif, A.A.F., Hu, L. (2016). Multi-pattern Matching Algorithm with Wildcards Based on Euclidean Distance and Hash Function. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9786. Springer, Cham. https://doi.org/10.1007/978-3-319-42085-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-42085-1_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42084-4
Online ISBN: 978-3-319-42085-1
eBook Packages: Computer ScienceComputer Science (R0)