String Factorization via Prefix Free Families

String Factorization via Prefix Free Families

Authors Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, Yonathan Sadia



PDF
Thumbnail PDF

File

LIPIcs.CPM.2023.19.pdf
  • Filesize: 0.66 MB
  • 10 pages

Document Identifiers

Author Details

Matan Kraus
  • Bar-Ilan University, Ramat-Gan, Israel
Moshe Lewenstein
  • Bar-Ilan University, Ramat-Gan, Israel
Alexandru Popa
  • Faculty of Mathematics and Computer Science, University of Bucharest, Romania
Ely Porat
  • Bar-Ilan University, Ramat-Gan, Israel
Yonathan Sadia
  • Bar-Ilan University, Ramat-Gan, Israel

Cite As Get BibTex

Matan Kraus, Moshe Lewenstein, Alexandru Popa, Ely Porat, and Yonathan Sadia. String Factorization via Prefix Free Families. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 19:1-19:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.CPM.2023.19

Abstract

A factorization of a string S is a partition of w into substrings u_1,… ,u_k such that S = u_1 u_2 ⋯ u_k. Such a partition is called equality-free if no two factors are equal: u_i ≠ u_j, ∀ i,j with i ≠ j. The maximum equality-free factorization problem is to find for a given string S, the largest integer k for which S admits an equality-free factorization with k factors. 
Equality-free factorizations have lately received attention because of their applications in DNA self-assembly. The best approximation algorithm known for the problem is the natural greedy algorithm, that chooses iteratively from left to right the shortest factor that does not appear before. This algorithm has a √n approximation ratio (SOFSEM 2020) and it is an open problem whether there is a better solution. 
Our main result is to show that the natural greedy algorithm is a Θ(n^{1/4}) approximation algorithm for the maximum equality-free factorization problem. Thus, we disprove one of the conjectures of Mincu and Popa (SOFSEM 2020) according to which the greedy algorithm is a Θ(√n) approximation.
The most challenging part of the proof is to show that the greedy algorithm is an O(n^{1/4}) approximation. We obtain this algorithm via prefix free factor families, i.e. a set of non-overlapping factors of the string which are pairwise non-prefixes of each other. In the paper we show the relation between prefix free factor families and the maximum equality-free factorization. Moreover, as a byproduct we present another approximation algorithm that achieves an approximation ratio of O(n^{1/4}) that we believe is of independent interest and may lead to improved algorithms. We then show that the natural greedy algorithm has an approximation ratio that is Ω(n^{1/4}) via a clever analysis which shows that the greedy algorithm is Θ(n^{1/4}) for the maximum equality-free factorization problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • string factorization
  • NP-hard problem
  • approximation algorithm

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Yonatan Aumann, Moshe Lewenstein, and Ely Porat. Function matching. SIAM Journal on Computing, 35(5):1007-1022, 2006. Google Scholar
  2. Brenda S. Baker. Parameterized pattern matching: Algorithms and applications. Journal of Computer and System Sciences, 52(1):28-42, 1996. Google Scholar
  3. Richard Cole, Carmit Hazay, Moshe Lewenstein, and Dekel Tsur. Two-dimensional parameterized matching. ACM Transactions on Algorithms, 11(2):12:1-12:30, 2014. Google Scholar
  4. Anne Condon, Ján Maňuch, and Chris Thachuk. The complexity of string partitioning. Journal of Discrete Algorithms, 32:24-43, 2015. Google Scholar
  5. Henning Fernau, Florin Manea, Robert Mercas, and Markus L. Schmid. Pattern matching with variables: Fast algorithms and new hardness results. In 32nd International Symposium on Theoretical Aspects of Computer Science, 2015, March 4-7, 2015, Garching, Germany, pages 302-315, 2015. Google Scholar
  6. Avraham Goldstein, Petr Kolman, and Jie Zheng. Minimum common string partition problem: Hardness and approximations. The Electronic Journal of Combinatorics, 12, 2005. Google Scholar
  7. Isaac Goldstein and Moshe Lewenstein. Quick greedy computation for minimum common string partition. Theoretical Computer Science, 542:98-107, 2014. Google Scholar
  8. Niels Grüttemeier, Christian Komusiewicz, Nils Morawietz, and Frank Sommer. String factorizations under various collision constraints. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. Google Scholar
  9. Markus L. Schmid. Computing equality-free and repetitive string factorisations. Theoretical Computer Science, 618:42-51, 2016. Google Scholar
  10. Moshe Lewenstein. Parameterized pattern matching. In Encyclopedia of Algorithms, pages 1525-1530. Springer, 2016. Google Scholar
  11. Radu Stefan Mincu and Alexandru Popa. The maximum equality-free string factorization problem: Gaps vs. no gaps. In International Conference on Current Trends in Theory and Practice of Informatics, pages 531-543. Springer, 2020. Google Scholar
  12. Sebastian Ordyniak and Alexandru Popa. A parameterized study of maximum generalized pattern matching problems. Algorithmica, 75(1):1-26, 2016. Google Scholar
  13. Wojciech Plandowski. Satisfiability of word equations with constants is in PSPACE. Journal of the ACM (JACM), 51(3):483-496, 2004. Google Scholar
  14. Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24:530-536, 1978. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail