Abstract
Kärkkainen et al. (CPM, 2009) defined the concept of \(\phi \) that later became key to the construction of the r-index. Given a string S[1..n], its suffix array \(\textrm{SA}\) and its inverse suffix array \(\textrm{ISA}\), we define \(\phi \) as the permutation of \(\{1,\ldots ,n\}\) such that \(\phi (i) = \textrm{SA}[\textrm{ISA}[i]-1]\) if \(\textrm{ISA}[i] > 1\), and \(\phi (i) = \textrm{SA}[n]\) otherwise. Gagie et al. (JACM, 2020) showed that it is possible to store \(\mathcal {O}(r)\) words such that the permutations \(\phi \) and \(\phi ^{-1}\) are evaluated in \(\mathcal {O}(\log \log _w(n/r))\)-time, which was improved to \(\mathcal {O}(1)\)-time by Nishimoto and Tabei (ICALP, 2021). In this paper, we introduce the concept of \(\phi ^{-1}\)-forest, which is a data structure using sampled \(\textrm{SA}\) values to speed up random access to \(\textrm{SA}\). We implemented our approach and compared its performance with respect to the r-index.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pizza & Chili Repetitive Corpus. http://pizzachili.dcc.uchile.cl/repcorpus.html.. Accessed June 2022
Brown, N.K., Gagie, T., Rossi, M.: RLBWT tricks. In Schulz, C., Uçar, B. (eds.) 20th International Symposium on Experimental Algorithms (SEA 2022), Volume 233 of Leibniz International Proceedings in Informatics (LIPIcs), pp. 16:1–16:16, Dagstuhl, Germany. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2022)
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation (1994)
Cáceres, M., Navarro, G.: Faster repetition-aware compressed suffix trees based on block trees. Inf. Comput. 285, 104749 (2021)
Cobas, D., Gagie, T., Navarro, G.: A fast and small subsampled R-Index. In: 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2021)
Gagie, T., Navarro, G., Prezza, N.: Optimal-time text indexing in BWT-runs bounded space. In: Proceedings SODA, pp. 1459–1477 (2018)
Gagie, T., Navarro, G., Prezza, N.: Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67(1), 21–254 (2020)
Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02441-2_17
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), 1–10 (2009)
Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26(5), 589–595 (2010)
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17(3), 281–308 (2010)
Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Nishimoto, T., Tabei, Y.: Optimal-time queries on BWT-runs compressed indexes. In: Proceedings of the 48th International Colloquium on Automata, Languages, and Programming, (ICALP 2021), volume 198 of LIPIcs, pp. 101:1–101:15 (2021)
Puglisi, S.J., Zhukova, B.: Relative Lempel-Ziv compression of suffix arrays. In: Boucher, C., Thankachan, S.V. (eds.) SPIRE 2020. LNCS, vol. 12303, pp. 89–96. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59212-7_7
Puglisi, S.J., Zhukova, B.: Smaller RLZ-compressed suffix arrays. In: Proceedings of the 31st Data Compression Conference, (DCC 2021), pp. 213–222. IEEE (2021)
Rossi, M., Oliva, M., Langmead, B., Gagie, T., Boucher, C.: MONI: a pangenomic index for finding maximal exact matches. J. Comput. Biol. 29(2), 169–187 (2022)
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526 68–74 (2015)
Acknowledgments
DK was supported by JSPS KAKENHI (Grant No. JP21K17701, JP21H05847, and JP22H03551). CB and MR were supported by NIH NHGRI (R01HG011392) and NSF EAGER (Grant No. 2118251). CB and HP were funded by NSF SCH:INT (Grant No. 2013998).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Boucher, C., Köppl, D., Perera, H., Rossi, M. (2022). Accessing the Suffix Array via \(\phi ^{-1}\)-Forest. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-20643-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)