{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,2]],"date-time":"2023-01-02T05:54:31Z","timestamp":1672638871817},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","funder":[{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"crossref","award":["319454."],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Science Foundation (NSF) IIS","award":["1618814"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ACM J. Exp. Algorithmics"],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"In this article, we consider the problem of identifying patterns of interest in colored strings. A colored string is a string where each position is assigned one of a finite set of colors. Our task is to find substrings of the colored string that always occur followed by the same color at the same distance. The problem is motivated by applications in embedded systems verification, in particular, assertion mining. The goal there is to automatically find properties of the embedded system from the analysis of its simulation traces.<\/jats:p>\n \n We show that, in our setting, the number of patterns of interest is upper-bounded by\n O<\/jats:italic>\n (\n n<\/jats:italic>\n 2<\/jats:sup>\n ), where\n n<\/jats:italic>\n is the length of the string. We introduce a baseline algorithm, running in\n O<\/jats:italic>\n (\n n<\/jats:italic>\n 2<\/jats:sup>\n ) time, which identifies all patterns of interest satisfying certain minimality conditions for all colors in the string. For the case where one is interested in patterns related to one color only, we also provide a second algorithm that runs in\n O<\/jats:italic>\n (\n n<\/jats:italic>\n 2<\/jats:sup>\n log\n n<\/jats:italic>\n ) time in the worst case but is faster than the baseline algorithm in practice. Both solutions use suffix trees, and the second algorithm also uses an appropriately defined priority queue, which allows us to reduce the number of computations. We performed an experimental evaluation of the proposed approaches over both synthetic and real-world datasets, and found that the second algorithm outperforms the first algorithm on all simulated data, while on the real-world data, the performance varies between a slight slowdown (on half of the datasets) and a speedup by a factor of up to 11.\n <\/jats:p>","DOI":"10.1145\/3429280","type":"journal-article","created":{"date-parts":[[2020,12,30]],"date-time":"2020-12-30T13:08:41Z","timestamp":1609333721000},"page":"1-26","source":"Crossref","is-referenced-by-count":0,"title":["Pattern Discovery in Colored Strings"],"prefix":"10.1145","volume":"26","author":[{"given":"Zsuzsanna","family":"Lipt\u00e1k","sequence":"first","affiliation":[{"name":"University of Verona, Department of Computer Science, Verona, Italy"}]},{"given":"Simon J.","family":"Puglisi","sequence":"additional","affiliation":[{"name":"Helsinki Institute of Information Technology (HIIT) and Department of Computer Science, University of Helsinki, Finland"}]},{"given":"Massimiliano","family":"Rossi","sequence":"additional","affiliation":[{"name":"University of Florida, Department of Computer and Information Science and Engineering, Gainesville, FL, United States"}]}],"member":"320","published-online":{"date-parts":[[2020,12,30]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 20th International Conference on Very Large Data Bases (VLDB\u201994)","volume":"1215","author":"Agrawal Rakesh","year":"1994","unstructured":"Rakesh Agrawal and Ramakrishnan Srikant . 1994 . Fast algorithms for mining association rules . In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB\u201994) , Vol. 1215 . Morgan Kaufmann, 487--499. Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB\u201994), Vol. 1215. Morgan Kaufmann, 487--499."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.1995.380415"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl453"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.1989.100747"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/DASFAA.2003.1192375"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the International Conference on Database and Expert Systems Applications (DEXA\u201908)","volume":"5181","author":"Cho Chung-Wen","unstructured":"Chung-Wen Cho , Ying Zheng , Yi-Hung Wu , and Arbee L. P. Chen . 2008. A tree-based approach for event prediction using episode rules over event streams . In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA\u201908) (LNCS), Vol. 5181 . Springer, 225--240. Chung-Wen Cho, Ying Zheng, Yi-Hung Wu, and Arbee L. P. Chen. 2008. A tree-based approach for event prediction using episode rules over event streams. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA\u201908) (LNCS), Vol. 5181. Springer, 225--240."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/54.867894"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062206"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.7873\/DATE.2015.0110"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.242"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2017.10.035"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/090779759"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2005.62"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/11871637_17"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.32"},{"key":"e_1_2_1_17_1","volume-title":"Lacey","author":"Foster Harry D.","year":"2004","unstructured":"Harry D. Foster , Adam C. Krolnik , and David J . Lacey . 2004 . Assertion-based Design. Springer Science 8 Business Media. Harry D. Foster, Adam C. Krolnik, and David J. Lacey. 2004. Assertion-based Design. Springer Science 8 Business Media."},{"key":"e_1_2_1_18_1","first-page":"54","article-title":"A survey of sequential pattern mining. Data Sci","volume":"1","author":"Fournier-Viger Philippe","year":"2017","unstructured":"Philippe Fournier-Viger , Jerry Chun-Wei Lin , Rage Uday Kiran , Yun Sing Koh , and Rincy Thomas . 2017 . A survey of sequential pattern mining. Data Sci . Pattern Recog. 1 , 1 (2017), 54 -- 77 . Philippe Fournier-Viger, Jerry Chun-Wei Lin, Rage Uday Kiran, Yun Sing Koh, and Rincy Thomas. 2017. A survey of sequential pattern mining. Data Sci. Pattern Recog. 1, 1 (2017), 54--77.","journal-title":"Pattern Recog."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-07959-2_28"},{"key":"e_1_2_1_20_1","volume-title":"Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology","author":"Gusfield Dan","unstructured":"Dan Gusfield . 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology . Cambridge University Press , Cambridge, United Kingdom. Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, United Kingdom."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 3rd Annual Symposium on Combinatorial Pattern Matching (CPM\u201992)","volume":"644","author":"Kwong Hui Lucas Chi","year":"1992","unstructured":"Lucas Chi Kwong Hui . 1992 . Color set size problem with application to string matching . In Proceedings of the 3rd Annual Symposium on Combinatorial Pattern Matching (CPM\u201992) (LNCS), Vol. 644 . Springer, 230--243. Lucas Chi Kwong Hui. 1992. Color set size problem with application to string matching. In Proceedings of the 3rd Annual Symposium on Combinatorial Pattern Matching (CPM\u201992) (LNCS), Vol. 644. Springer, 230--243."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2005.60"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-006-0038-2"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201908)","author":"Laxman Srivatsan","unstructured":"Srivatsan Laxman , Vikram Tankasali , and Ryen W. White . 2008. Stream prediction using a generative model based on frequent episodes in event sequences . In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201908) . ACM, 453--461. Srivatsan Laxman, Vikram Tankasali, and Ryen W. White. 2008. Stream prediction using a generative model based on frequent episodes in event sequences. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201908). ACM, 453--461."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 18th Symposium on Experimental Algorithms (SEA\u201920) (Leibniz International Proceedings in Informatics)","volume":"160","author":"Lipt\u00e1k Zsuzsanna","year":"2020","unstructured":"Zsuzsanna Lipt\u00e1k , Simon J. Puglisi , and Massimiliano Rossi . 2020 . Pattern discovery in colored strings . In Proceedings of the 18th Symposium on Experimental Algorithms (SEA\u201920) (Leibniz International Proceedings in Informatics) , Vol. 160 . 12:1--12:14. Zsuzsanna Lipt\u00e1k, Simon J. Puglisi, and Massimiliano Rossi. 2020. Pattern discovery in colored strings. In Proceedings of the 18th Symposium on Experimental Algorithms (SEA\u201920) (Leibniz International Proceedings in Informatics), Vol. 160. 12:1--12:14."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/MEMCOD.2011.5970526"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1824795.1824798"},{"key":"e_1_2_1_28_1","volume-title":"Tomescu","author":"M\u00e4kinen Veli","year":"2015","unstructured":"Veli M\u00e4kinen , Djamal Belazzougui , Fabio Cunial , and Alexandru I . Tomescu . 2015 . Genome-Scale Algorithm\u00a0Design. CUP. Veli M\u00e4kinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. 2015. Genome-Scale Algorithm\u00a0Design. CUP."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009748302351"},{"key":"e_1_2_1_30_1","unstructured":"OpenCores. 1999. Retrieved from https:\/\/opencores.org\/. OpenCores. 1999. Retrieved from https:\/\/opencores.org\/."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSC.2017.18"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10844-006-0006-z"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-019-01378-7"},{"key":"e_1_2_1_34_1","volume-title":"Algorithms","author":"Sedgewick Robert","unstructured":"Robert Sedgewick and Kevin Wayne . 2011. Algorithms . Addison-Wesley Professional . Robert Sedgewick and Kevin Wayne. 2011. Algorithms. Addison-Wesley Professional."},{"key":"e_1_2_1_35_1","volume-title":"Computing Patterns in Strings","author":"Smyth Bill","unstructured":"Bill Smyth . 2003. Computing Patterns in Strings . Pearson Addison-Wesley , Essex, England . Bill Smyth. 2003. Computing Patterns in Strings. Pearson Addison-Wesley, Essex, England."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 12th International Workshop on Algorithms in Bioinformatics (WABI\u201912)","volume":"7534","author":"V\u00e4lim\u00e4ki Niko","unstructured":"Niko V\u00e4lim\u00e4ki and Simon J. Puglisi . 2012. Distributed string mining for high-throughput sequencing data . In Proceedings of the 12th International Workshop on Algorithms in Bioinformatics (WABI\u201912) (LNCS), Vol. 7534 . Springer, 441--452. Niko V\u00e4lim\u00e4ki and Simon J. Puglisi. 2012. Distributed string mining for high-throughput sequencing data. In Proceedings of the 12th International Workshop on Algorithms in Bioinformatics (WABI\u201912) (LNCS), Vol. 7534. Springer, 441--452."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2010.5457129"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-05810-8_25"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2013.02.006"}],"container-title":["ACM Journal of Experimental Algorithmics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3429280","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T20:20:34Z","timestamp":1672604434000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3429280"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,30]]},"references-count":38,"alternative-id":["10.1145\/3429280"],"URL":"https:\/\/doi.org\/10.1145\/3429280","relation":{},"ISSN":["1084-6654","1084-6654"],"issn-type":[{"value":"1084-6654","type":"print"},{"value":"1084-6654","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,30]]}}}