Abstract
Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. In many cases this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF-DNA interactions, based on Markov networks. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our models, and devise an algorithm for learning their structural features from binding site data. We evaluate our approach on synthetic data, and then apply it to binding site and ChIP-chip data from yeast. We reveal sequence features that are present in the binding specificities of yeast TFs, and show that FMMs explain the binding data significantly better than PSSMs.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Elnitski, L., et al.: Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques. Genome Res. 16(12), 1455–1464 (2006)
Bulyk, M.L.: Dna microarray technologies for measuring protein-dna interactions. Current Opinion in Biotechnology 17, 1–9 (2006)
Maerkl, S.J., Quake, S.R.: A systems approach to measuring the binding energy landscapes of transcription factors. Science 315(5809), 233–236 (2007)
Barash, Y., Elidan, G., Friedman, N., Kaplan, T.: Modeling dependencies in protein-dna binding sites. In: RECOMB (2003)
Harbison, C.T., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004), 99–104 (2004)
MacIsaac, K., et al.: An improved map of conserved regulatory sites for saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006)
Della Pietra, S., et al.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 380–393 (1997)
Lee, S.I., Ganapathi, V., Koller, D.: Efficient structure learning of Markov networks using L1-regularization. In: NIPS (2007)
Perkins, S., Lacker, K., Theiler, J.: Grafting: fast, incremental feature selection by gradient descent in function space. J. Mach. Learn. Res. 3, 1333–1356 (2003)
Minka, T.P.: Algorithms for maximum-likelihood logistic regression. Technical Report 758, Carnegie Mellon University (2001)
Yedidia, J.S., et al.: Generalized belief propagation. In: NIPS, pp. 689–695 (2000)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B. 58(1), 267–288 (1996)
Ng, A.: Feature selection, l1 vs. l2 regularization, and rotational invariance. In: ICML (2004)
Rothermel, B., Thornton, J., Butow, R.: Rtgp3, a basic helix-loop-helix/leucine zipper protein that functions in mitochondrial-induced changes in gene expression, contains independent activation domains. J Biol Chem. 272, 19801–19807 (1997)
Zeitlinger, J., et al.: Program-specific distribution of a transcription factor dependent on partner transcription factor and mapk signaling. Cell 113(3), 395–404 (2003)
Segal, E., et al.: Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19(Suppl. 1), 273–282 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Sharon, E., Segal, E. (2007). A Feature-Based Approach to Modeling Protein-DNA Interactions. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-71681-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71680-8
Online ISBN: 978-3-540-71681-5
eBook Packages: Computer ScienceComputer Science (R0)