Abstract
In this paper, we develop a new algorithm to construct Multiple and Global Alignments (MGA) of primary structures, i.e., strings coding biological macromolecules. The construction of such alignments is based on the one of the (longest) Approximate Common Subsequences (ACS), made up by longer approximate substrings appearing, approximately, in the same positions in all the strings. This ACS represents a MGA. Constructing such alignments is a way to find homologies between biological macromolecules. Our algorithm is of complexity O(N 2*L 2*(log(L))2) in computing time, where N is the number of the strings and L is the length of the longest string.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sagot, M.F.: Ressemblance Lexicale et Structurale Entre Macromolécules -Formalisation et Approches Combinatoires, Thèse de Doctorat, Université de Marne-La-Vallée, France (1996)
Hannenhalli, S.: Transforming Men Into Mice a Computationnal Theory of Genome Rearrangements, PhD Thesis, The Pennsylvania State University (1995)
Hannenhalli, S., Pevzner, P.: Transforming Cabbage Into Turnip (Polynomial Algorithm for Srting Signed Permutations By Reversals). In: Proc. 27th Annual ACM Symposium on the Theory of Computing, pp. 178–189 (1995)
Christie, D.: Genome Rearrangement Problems Ph.D Thesis, University of Glasgow (1998)
Corpet, F.: Multiple Sequence Alignment With Hierarchical Clustering. Nucleic Acids Research 16(22), 10881–10890 (1988)
Depiereux, E., Feytmans, E.: MATCH-BOX - A fundamentally new Algorithm for The Simultaneous Alignment of Several Protein Sequences. Comput. Appl. Biosci. 8(5), 501–509 (1992)
Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast Algorithms for Large-ScaleGenome Alignment and Comparison. Nucleic Acids Research 30(11), 2478–2483 (2002)
Lee, C., Grasso, C., Sharlow, M.: Multiple Sequence Alignment UingPpartial Order Graphs. Bioinformatics (18), 452–464 (2002)
Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., Morgenstern, B.: Fast and Sensitive Multiple Alignment of Large Genomic Sequences. BMC Bioinformatics 4(66), 1–11 (2003)
Bray, N., Pachter, L.: MAVID: Constrained Ancestral Alignment of Multiple Sequences. Genome Research (14), 693–699 (2004)
Lassmann, T., Sonnhammer, E.: Kalign: An Accurate and Fast Multiple Sequence Alignment Algorithm. BMC Bioinformatics (6), 298 (2005)
Schwartz, A., Pachter, L.: Multiple Alignment by Sequence Annealing. Bioinformatics (2006)
Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and Protein Sequence Alignment Based on Segment-to-Segment Comparison. Proc. Natl. Acad. Sci. U.S.A. (93), 2098–12103 (1996)
Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN: Finding Local Similarities by Multiple Sequence Alignment. Bioinformatics 14(3), 290–294 (1998)
Lenhof, H.P., Morgenstern, B., Reinert, K.: An Exact Solution for The Segment-to-Segment Multiple Sequence Alignment Problem. Bioinformatics (15), 203–210 (1999)
Schwartz, S., Kent, W.J., Smit, A., Zhang, Z.: Human-Mouse Alignments with Blastz. Genome Research, 103–107 (2003)
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA. Genome Research (13), 721–731 (2003)
Frith, M.C., Hansen, U., Spouge, J.L.: Finding Functional Sequence Elements by Multiple Local Alignments. Nucleic Acids Research 32(1), 189–200 (2004)
Morgenstern, B.: DIALIGN: Multiple DNA and Protein Sequence Alignment at BiBiServ. Nucleic Acids Research 32(Web Server Issue) (2004)
Ovcharenko, I., Loots, G.G., Giardine, B.M., Hou, M., Ma, J., Hardison, R.C., Stubbs, L., Millers, W.: Mulan: Multiple-Sequence Local Alignment and Visualization for Studying Function and Evolution. Genome Research (15), 184–194 (2005)
Needleman, S.B., Wunsch, C.D.: A General Method Applicable to the Search for Similarities in The Amino-Acid Sequence of two Proteins. Journal of Molecular Biolog (48), 443–453 (1970)
Byers, T.H., Waterman, M.S.: Determining All Optimal and Near-Optimal Solutions when Solving Shortest Path Problems by Dynamic Programming. Operations Research 32(6), 1381–1384 (1984) Operations Research Society of America (Eds.)
Waterman, M.S., Byers, T.H.: A Dynamic Programming Algorithm to Find All Solutions in a Neighborhood of The Optimum. Mathematical Biosciences (77), 179–188 (1985)
Zuker, M.: Suboptimal Sequence Alignment in Molecular biology: 1nalysis with Errors. J. Mol. Biol. 221, 403–420 (1991)
Naor, D., Brutlag, D.L.: On Near-Optimal Alignments of Biological Sequences. J. Comp. Biol. (4), 349–366 (1994)
Kurtz, S., Ohlebusch, E., Schleiermacher, C., Stoye, J.: Reputer: The Manifold Applications of Repeat Analysis. Nucleic Acids Research 29(22), 4633–4642 (2001)
Noé, L.: Recherche de Similarités dans Les Séquences d’ADN: Modèles et Algorithmes pour la Conception de Graines Efficaces, Thése de Doctorat, Université Henri Poincaré (2005)
Huang, W., Umbach, D.M., Leping, L.: Accurate Anchoring Alignment of Divergent Sequences. Bioinformatics, 22(1), 29–34 (2006)
Zhang, X., Kahveci, T.: A New Approach for Alignment of Multiple Proteins. In: Pacific Symposium on Biocomputing, vol. 11, pp. 339–350 (2006)
Edgar, R.C.: MUSCLE: Multiple Sequence Alignment with High Accuracy and High throughput. Nucleic Acids Research 32(5), 1792–1797 (2004)
Liang Ye, Y., Huang, X.: MAP2: Multiple Alignments of Syntenic Genomic sequences. Nucleic Acids Research 33(1), 162–170 (2005)
Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Cybernetics and Control Theory 10(8), 707–710 (1966)
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms, pp. 60–65. Addison-Wesley Publishing Company, Reading (1974)
Karp, R., Miller, R.E., Rosenberg, A.L.: Rapid Identification of Repeated Patterns in Strings, Trees and Arrays. In: 4th symposium of theory of Computing, pp. 125–136 (1972)
RNA Families Database of Alignments and CMs http://www.sanger.ac.uk/Software/Rfam
Protein Families Database, http://www.sanger.ac.uk/Software/Pfam
National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A Model for Evolutionary Change. Atlas of Protein Sequence and Structure 5(3), 345–352 (1979)
Henikoff, S., Henikoff, J.G.: Amino Acid Substitution Matrices From Protein Blocks. Proc. Natl. Acad. Sci. U.S.A. 89, 10915–10919 (1992)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elloumi, M., Mokaddem, A. (2008). An Algorithm for Multiple and Global Alignments. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-70600-7_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70598-7
Online ISBN: 978-3-540-70600-7
eBook Packages: Computer ScienceComputer Science (R0)