Abstract
A RNA interference, also called a gene knockdown, is a biological technique which consists of inhibiting a targeted gene in a cell. By doing so, one can identify statistical dependencies between a gene and a cell phenotype. However, during such a gene inhibition process, additional genes may also be modified. This is called the “off-target effect”. The consequence is that there are some additional phenotype perturbations which are “off-target”. In this paper, we study new machine learning tools that both model the cell phenotypes and remove the “off-target effect”. We propose two new automatic methods to remove the “off-target” components from a data sample. The first method is based on vector quantization (VQ). The second method we propose relies on a classification forest. Both methods rely on analyzing the homogeneity of several repetitions of a gene knockdown. The baseline we consider is a Gaussian mixture model whose parameters are learned under constraints with a standard Expectation–Maximization algorithm. We evaluate these methods on a real data set, a semi-synthetic data set, and a synthetic toy data set. The real data set and the semi-synthetic data set are composed of cell growth dynamic quantities measured in time laps movies. The main result is that we obtain the best recognition performance with the probabilistic version of the VQ-based method.






Similar content being viewed by others
References
Arthur D, Vassilvitskii S (2007) k-means\(++\): the advantages of careful seeding. In: Proceedings of the ACM-SIAM symposium on discrete algorithms, p 1027–1035
Bakal C (2007) Quantitative morphological signatures define local signaling networks regulating cell morphology. Science 316:1753–1756
Bishop CM, Ulusoy I (2005) Generative versus discriminative methods for object recognition. Conf Comput Vis Pattern Recogn 2:258–265
Breiman L (2001) Random forest. Mach Learn 45:5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey
Collinet C et al (2010) Systems survey of endocytosis by multiparametric image analysis. Nature 464:243–249
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B Methodol 39(1):1–38
Echeverri CJ et al (2006) Minimizing the risk of reporting false positives in large-scale RNAi screens. Nat Methods 3(10):777–779
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Hartigan JA (1975) Clustering algorithms. Wiley, New York
Held M et al (2010) CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging. Nat Methods 7:747–754
Jackson AL, Linsley PS (2010) Recognizing and avoiding siRNA off-target effects for target identification and therapeutic application. Nat Rev Drug Discov 9:57–67
Kullback S (1987) Letter to the editor: the Kullback–Leibler distance. Am Stat 41(4):340–341
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: International conference on computer vision and pattern recognition
Lefort R, Fablet R, Boucher J-M (2010) Weakly supervised classification of objects in images using soft random forests. In: European conference on computer vision
Lefort R, Fablet R, Boucher JM (2011) Object recognition using proportion-based prior information: application to fisheries acoustics. Pattern Recogn Lett 32(2):153–158
Lefort R, Fleuret F (2013) treeKL: A distance between high dimension empirical distributions. Pattern Recogn Lett 34(2):140–145
Lowe D (1999) Object recognition with informative features and linear classification. In: International conference on computer vision and pattern recognition
Lughofer E (2008) Extensions of vector quantization for incremental clustering. Pattern Recogn 41(3):995–1011
Lughofer E (2013) eVQ-AM: an extended dynamic version of evolving vector quantization. In: IEEE conference on evolving and adaptive intelligent systems, p 40–47
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New York
Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Inst Sci India 2(1):49–55
Moosman F, Nowak E, Jurie F (2008) Randomized clustering forests for image classification. IEEE Trans Pattern Anal Mach Intell 30(9):1632–1646
Neumann B et al (2010) Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Nature 464:721–72
Orvedahl A et al (2011) Image-based genome-wide siRNA screen identifies selective autophagy factors. Nature 480:113–117
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
Pertz O et al (2008) Spatial mapping of the neurite and soma proteomes reveals a functional Cdc42/Rac regulatory network. Natl Acad Sci USA 105:1931–1936
Salma J et al (2012) Computational analysis and predictive modeling of small molecule modulators of microRNA. J Cheminform 4(1):16. doi:10.1186/1758-2946-4-16
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization and beyond. MIT Press, Cambridge
Yan J et al (2013) Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154(4):801–813
Yin Z et al (2013) A screen for morphological complexity identifies regulators of switch-like transitions between discretecell shape. Nat Cell Biol 15(7):860–871
Yizong C (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799
Acknowledgments
This work was supported by the Swiss National Science Foundation under Sinergia grant 127456 “Understanding Brain morphogenesis”, and from a Human Frontier Science Program grant.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lefort, R., Fusco, L., Pertz, O. et al. Machine learning-based tools to model and to remove the off-target effect. Pattern Anal Applic 20, 87–100 (2017). https://doi.org/10.1007/s10044-015-0469-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-015-0469-z