Abstract
In this paper, we study several versions of optimization problems related to haplotype reconstruction/identification. The input to the first problem is a set C 1 of haplotypes, a set C 2 of haplotypes, and a set G of genotypes. The objective is to select the minimum number of haplotypes from C 2 so that together with haplotypes in C 1 they resolve all (or the maximum number of) genotypes in G. We show that this problem has a factor-O(logn) polynomial time approximation. We also show that this problem does not admit any approximation with a factor better than O(logn) unless P=NP. For the corresponding reconstruction problem, i.e., when C 2 is not given, the same approximability results hold.
The other versions of the haplotype identification problem are based on single individual haplotyping, including the well-known Minimum Fragment Removal (MFR) and Minimum SNP Removal (MSR), which have both shown to be APX-hard previously. We show in this paper that MFR has a polynomial time O(logn)-factor approximation. We also consider Maximum Fragment Identification (MFI), which is the complementary version of MFR; and Maximum SNP Identification (MSI), which is the complementary version of MSR. We show that, for any positive constant ε< 1, neither MFI nor MSI has a factor-n 1 − ε polynomial time approximation algorithm unless P=NP.
This research is partially supported by NSF Career Award 0845376.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bafna, V., Istrail, S., Lancia, G., Rizzi, R.: Polynomial and APX-hard cases of the individual haplotyping problem. Theoretical Computer Science 335, 109–125 (2005)
Chvátal, V.: A greedy heuristic for the set-covering problem. Math. Oper. Res. 4, 233–235 (1979)
Chen, Z., Fu, B., Sweller, R., Yang, B., Zhao, Z., Zhu, B.: Linear probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments. J. Computational Biology 15, 535–546 (2008)
Cilibrasi, R., van Iersel, L., Kelk, S., Tromp, J.: The complexity of the single individual SNP haplotyping problem. Algorithmica 49, 13–36 (2007)
Clark, A.: Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology Evolution 7, 111–122 (1990)
Douglas, J., Boehnke, M., Gillanders, E., Trent, J., Gruber, S.: Experimentally-driven haplotypes substantially increase the efficiency of linkage disequillibrium studies. Nat. Genetics 28, 361–364 (2001)
Duh, R.-c., Fürer, M.: Approximation of k-set cover by semi-local optimization. In: Proc. 29th ACM Symp. on Theory of Comput (STOC 1997), pp. 256–264 (1997)
Garg, N., Vazirani, V.V., Yannakakis, M.: Multiway cuts in directed and node weighted graphs. In: Shamir, E., Abiteboul, S. (eds.) ICALP 1994. LNCS, vol. 820, pp. 103–111. Springer, Heidelberg (1994)
Gusfield, D.: A practical algorithm for optimal inference of haplotype from diploid populations. In: ISMB 2000, pp. 183–189 (2000)
Gusfield, D.: Inference of haplotypes from samples of diploid populations: complexity and algorithms. J. Computational Biology 8, 305–323 (2001)
Gusfield, D.: Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions. In: RECOMB 2002, pp. 166–175 (2002)
Gusfield, D.: Haplotype inference by pure parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)
Hästad, J.: Clique is hard to approximate within n 1 − ε. Acta Mathematica 182, 105–142 (1999)
Huang, Y.-T., Chao, K.-M., Chen, T.: An approximation algorithm for haplotype inference by maximum parsimony. J. Computational Biology 12, 1261–1274 (2005)
Johnson, D.: Approximation algorithms for combinatorial problems. J. Comput. System Sci. 9, 256–278 (1974)
Lancia, G., Pinotti, M.C., Rizzi, R.: Haplotyping populations by pure parsimony: complexity and algorithms. INFORMS Journal on computing 16, 348–359 (2004)
Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: SNPs Problems, Complexity and Algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001)
Lancia, G., Rizzi, R.: A polynomial solution to a special case of the parsimony haplotyping problem. Operations Research letters 34, 289–295 (2006)
Lippert, R., Schwartz, R., Lancia, G., Istrail, S.: Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings in bioinformatics 3, 23–31 (2002)
Lóvasz, L.: On the ratio of optimal integral and fractional covers. Discrete Mathematics 13, 383–390 (1975)
Lund, C., Yannakakis, M.: On the hardness of approximating minimization problems. J. ACM 41, 960–981 (1994)
Panconesi, A., Sozio, M.: Fast Hare: A fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 266–277. Springer, Heidelberg (2004)
Raz, R., Safra, S.: A sub-constant error-probability low-degree test, and sub-constant error-probability PCP characterization of NP. In: Proc. 29th ACM Symp. on Theory of Comput. (STOC 1997), pp. 475–484 (1997)
Rizzi, R., Bafna, V., Istrail, S., Lancia, G.: Practical algorithms and fixed-parameter tractability for the single individual SNP haplotyping problem. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 29–43. Springer, Heidelberg (2002)
Wang, R.S., Wu, L.Y., Li, Z.P., Zhang, X.S.: Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics 21, 2456–2462 (2005)
Wang, L., Xu, Y.: Haplotype inference by maximum parsimony. Bioinformatics 19, 1773–1780 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abraham, J., Chen, Z., Fowler, R., Fu, B., Zhu, B. (2009). On the Approximability of Some Haplotyping Problems. In: Goldberg, A.V., Zhou, Y. (eds) Algorithmic Aspects in Information and Management. AAIM 2009. Lecture Notes in Computer Science, vol 5564. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02158-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-02158-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02157-2
Online ISBN: 978-3-642-02158-9
eBook Packages: Computer ScienceComputer Science (R0)