Abstract
Detecting historical recombination is an important computational problem which has received great attention recently. Due to recombination, input sequences form a mosaic, where each input sequence is composed of segments from founder sequences. In this paper, we present improved algorithms for the problem of finding the minimum mosaic (a mosaic containing the fewest ancestral segments) of a set of recombinant sequences. This problem was first formulated in [15], where an exponential-time algorithm was described. It is also known that a restricted version of this problem (assuming recombination occurs only at predefined block boundaries) can be solved in polynomial time [15, 11]. We give a polynomial-time algorithm for a special case of the minimum (blockless) mosaic problem, and a practical algorithm for the general case. Experiments with our method show that it is practical in a range of data much larger than could be handled by the algorithm described in [15].
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
El-Mabrouk, N.: Deriving haplotypes through recombination and gene conversion. J. of Bioinformatics and Computational Biology 2, 241–256 (2004)
El-Mabrouk, N., Labuda, D.: Haplotype histories as pathways of recombinations. Bioinformatics 20, 1836–1841 (2004)
Kimmel, G., Shamir, R.: A block-free hidden markov model for genotypes and its application to disease association. J. of Comp. Bio. 12, 1243–1260 (2005)
Kreitman, M.: Nucleotide polymorphism at the alcohol dehydrogenase locus of drosophila melanogaster. Nature 304, 412–417 (1983)
McMillan, L., Moore, K.: http://www.unc.edu/courses/2007spring/comp/790/087/?p=11
Myers, S.: The detection of recombination events using DNA sequence data, PhD dissertation, Dept. of Statistics, University of Oxford, Oxford, England (2003)
Myers, S.R., Griffiths, R.C.: Bounds on the minimum number of recombination events in a sample history. Genetics 163, 375–394 (2003)
Nickerson, D., Taylor, S., Weiss, K., Clark, A., et al.: DNA Sequence Diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nature Genetics 19, 233–240 (1998)
Rastas, P., Koivisto, M., Mannila, H., Ukkonen, E.: A Hidden Markov Technique for Haplotype Reconstruction. In: Proceedings of Workshop on Algorithm of Bioinformatics (WABI 2005) pp. 140–151 (2005)
Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Human Genetics 78, 629–644 (2006)
Schwartz, R., Clark, A., Istrail, S.: Methods for Inferring Block-Wise Ancestral History from Haploid Sequences. In: Proceedings of Workshop on Algorithm of Bioinformatics (WABI 2002), vol. 2452, pp. 44–59 (2002)
Song, Y.S., Wu, Y., Gusfield, D.: Efficient computation of close lower and upper bounds on the minimum number of needed recombinations in the evolution of biological sequences. Bioinformatics, vol. 421, pp. i413–i422. In: Proceedings of ISMB (2005)
Tan, Y., Fu, Y.: A Novel Method for Estimating Linkage Maps. Genetics 173, 2383–2390 (2006)
Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E., Ram, R., Richardson, P., Solovyev, V., Rubin, E., Rokhsar, D., Banfield, J.: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004)
Ukkonen, E.: Finding Founder Sequences from a Set of Recombinants. In: Proceedings of Workshop on Algorithm of Bioinformatics (WABI 2002), vol. 2452, pp. 277–286 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, Y., Gusfield, D. (2007). Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-73437-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73436-9
Online ISBN: 978-3-540-73437-6
eBook Packages: Computer ScienceComputer Science (R0)