Abstract
Computational DNA motif discovery is one of the major research areas in bioinformatics, which helps to understanding the mechanism of gene regulation. Recently, we have developed a GA-based motif discovery algorithm, named as GAPK, which addresses the use of some identified transcription factor binding sites extracted from orthologs for algorithm development. With our GAPK framework, technical improvements on background filtering, evolutionary computation or model refinement will contribute to achieving better performances. This paper aims to improve the GAPK framework by introducing a new fitness function, termed as relative model mismatch score (RMMS), which characterizes the conservation and rareness properties of DNA motifs simultaneously. Other technical contributions include a rule-based system for filtering background data and a “most one-in-out” (MOIO) strategy for motif model refinement. Comparative studies are carried out using eight benchmark datasets with original GAPK and two GA-based motif discovery algorithms, GAME and GALF-P. The results show that our improved GAPK method favorably outperforms others on the testing datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33, 4899–4913 (2005)
Neuwald, A.F., Liu, J.S., Lawrence, C.E.: Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Science 4, 1618–1632 (1995)
Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using EM. Machine Learning 21, 51–80 (1995)
Tompa, M., Li, N., Bailey, T.L., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)
Bailey, T.L., Elkan, C.P.: The value of prior knowledge in discovering motifs with MEME. Intell. Sys. Mol. Bilo. 3, 21–29 (1995)
Li, L.P., Liang, Y., Bass, R.L.L.: GAPWM: a genetic algorithm method for optimizing a position weight matrix. Bioinformatics 23, 1188–1194 (2007)
Wang, T., Stormo, G.D.: Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 19, 2369–2380 (2003)
Narang, V., Mittal, A., Sung, W.-K.: Localized motif discovery in gene regulatory sequences. Bioinformatics 26, 1152–1159 (2010)
Wei, Z., Jensen, S.T.: GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22, 1577–1584 (2006)
Chan, T.-M., Leung, K.-S., Lee, K.-H.: TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24, 341–349 (2008)
Wang, D.H., Li, X.: GAPK: Genetic algorithms with prior knowledge for motif discovery in DNA sequences. In: CEC 2009: IEEE Congress on Evolutionary Computation 2009, Trondheim, Norway, pp. 277–284 (2009)
Wang, D.H., Lee, N.K.: MISCORE: mismatch-based matrix similarity scores for DNA motif detection. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5506, pp. 478–485. Springer, Heidelberg (2009)
Wang, D.H.: Characterization of regulatory motif models. Technical Report, La Trobe University, Australia (October 2009)
Stormo, G.D., Fields, D.S.: Specificity, free energy and information content in protein-DNA interactions. Trends in Biochemical Sciences 23, 109–113 (1998)
Thijs, G., Lescot, M., Marchal, K., Rombauts, S., De Moor, B., Rouzé, P., Moreau, Y.: A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17, 1113–1122 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, D., Li, X. (2010). iGAPK: Improved GAPK Algorithm for Regulatory DNA Motif Discovery. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds) Neural Information Processing. Models and Applications. ICONIP 2010. Lecture Notes in Computer Science, vol 6444. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17534-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-17534-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17533-6
Online ISBN: 978-3-642-17534-3
eBook Packages: Computer ScienceComputer Science (R0)