Multi-pattern Matching Algorithm with Wildcards Based on Euclidean Distance and Hash Function | SpringerLink
Skip to main content

Multi-pattern Matching Algorithm with Wildcards Based on Euclidean Distance and Hash Function

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2016 (ICCSA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9786))

Included in the following conference series:

  • 1298 Accesses

Abstract

In this paper, we present an algorithm to solve the problem of multi-pattern matching with a fixed number of wildcards. The method requires each pattern in a pattern set P to be partitioned into l length blocks, and then, the Euclidean distance is calculated between each first block b y,1 of patterns and every possible alignment of the text t. If the Euclidean distance at position i is 0, then the block b y,1 of pattern p y matches the text t at position i. The Euclidean distance values are used as hash values to check the matches of the remaining blocks of the partially matched pattern. The complexity of our algorithm is O(k n log l + o + d) time. Where n is the length of the text, l is the length of the blocks, k is the number of patterns, o is the number of blocks that match using the hash values and d is the number of wildcard symbols in the blocks that match using the hash values. The major advantages of our algorithm are that (a) it can find the matches of long patterns efficiently, (b) if the alphabet size σ is large, the algorithm is still efficient and (c) it supports non-interrupted pattern updates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  2. Fischer, M., Paterson, M.: String matching and other products. In: Karp, R., (ed.) Proceedings of the 7th SIAMAMS Complexity of Computation, pp. 113–125 (1974)

    Google Scholar 

  3. Muthukrishan, S., Palem, K.: Non-standard stringology: algorithms and complexity. In: Proceedings of the 26th Symposium on the Theory of Computing, Canada (1994)

    Google Scholar 

  4. Barton, C., Iliopoulos, C.S.: On the average-case complexity of pattern matching with wildcards. CoRR, abs/1407.0950 (2014)

    Google Scholar 

  5. Indyk, P.: Faster algorithms for string matching problems: matching the convolution bound. In: Proceedings of the 38th Annual Symposium on Foundations of Computer Science, pp. 166–173 (1998)

    Google Scholar 

  6. Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: [7] Proceedings of the Annual ACM Symposium on Theory of Computing, pp. 592–601 (2002)

    Google Scholar 

  7. Rahman, M., Iliopoulos, C.: Pattern matching algorithms with don’t cares. SOFSEM 2, 116–126 (2007)

    Google Scholar 

  8. Linhart, C., Shamir, R.: Faster pattern matching with character classes using prime number encoding. J. Comput. Syst. Sci. 75(3), 155–162 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kucherov, G., Rusinowitch, M.: Matching a set of strings with variable length don’t cares. Theor. Comput. Sci. 178(1–2), 129–154 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  10. Zhang, M., Zhang, Y., Hu, L.: A faster algorithm for matching a set of patterns with variable length don’t cares. Inf. Process. Lett. 110(6), 216–220 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  11. Zhang, M., Zhang, Y., Tang, J.: Multi-pattern matching with wildcards. In: Proceeding of Third International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010) (2010)

    Google Scholar 

  12. Chan, H.-L., Hon, W.-K., Lam, T.W., Sadakane, K.: Compressed indexes for dynamic text collections. ACM Trans. Algorithms 3(2) (2007)

    Google Scholar 

  13. Bertossi, A.A., Logi, F.: Parallel string matching with variable length don’t cares. J. Parallel Distrib. Comput. 22, 229–234 (1994)

    Article  MATH  Google Scholar 

  14. Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T., Seiferas, J.: The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40, 31–55 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  15. Ding, B., Lo, D., Han, J., Khoo, S.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of the 25th IEEE International Conference on Data Engineering, pp. 1024–1035 (2009)

    Google Scholar 

  16. Wu, X., Zhu, X., He, Y., Arslan, A.N.: PMBC: pattern mining from biological sequences with wildcard constraints. Comput. Biol. Med. 43(5), 481–492 (2013)

    Article  Google Scholar 

  17. Qiang, J., Guo, D., Fang, Y., Tian, W., Hu, X.: Multiple pattern matching with wildcards and one-off condition. J. Comput. Inf. Syst. 9, 14 (2013)

    Google Scholar 

  18. Guo, D., Hu, X., Xie, F., Wu, X.: Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl. Intell. 39(1), 57–74 (2013)

    Article  Google Scholar 

  19. Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with application to protein searching. In: Proceedings of the 5th Annual International Conference on Computational Biology, pp. 231–240 (2001)

    Google Scholar 

  20. Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Biol. 12(8), 1065–1082 (2005)

    Article  Google Scholar 

  21. Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th Annual ACM Symposium on the Theory of Computing, pp. 91–100 (2004)

    Google Scholar 

  22. Haapasalo, T., Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Online dictionary matching with variable-length gaps. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 76–87. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  23. Arslan, A.N., He, D., He, Y., Wu, X.: Pattern matching with wildcards and length constraints using maximum network flow. J. Discrete Algorithms 35(C), 9–16 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  24. Kalai,A.: Efficient pattern-matching with don’t cares. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 655–656 (2002)

    Google Scholar 

Download references

Acknowledgment

I would like to express my gratitude to Jilin University for providing me with a full scholarship and an excellent environment for my Ph.D. study and research. Furthermore, my appreciation extends to Dr. Meng Zhang for his continuous support and discussions.

Conflict of Interest. The author declares no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Abdo Farhan Saif .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Saif, A.A.F., Hu, L. (2016). Multi-pattern Matching Algorithm with Wildcards Based on Euclidean Distance and Hash Function. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9786. Springer, Cham. https://doi.org/10.1007/978-3-319-42085-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42085-1_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42084-4

  • Online ISBN: 978-3-319-42085-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics