Abstract
Motif discovery is the problem of finding subgraphs of a network that appear surprisingly often. Each such subgraph may indicate a small-scale interaction feature in applications ranging from a genomic interaction network, a significant relationship involving rock musicians, or any other application that can be represented as a network. We look at the problem of constrained search for motifs based on labels (e.g. gene ontology, musician type to continue our example from above). This chapter presents a brief review of the state of the art in motif finding and then extends the gTrie data structure from Ribeiro and Silva (Data Min Knowl Discov 28(2):337–377, 2014b) to support labels. Experiments validate the usefulness of our structure for small subgraphs, showing that we recoup the cost of the index after only a handful of queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adami C, Qian J, Rupp M, Hintze A (2011) Information content of colored motifs in complex networks. Artif Life 17(4):375–390
Adamic LA, Glance N (2005) The political blogosphere and the 2004 u.s. election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery, LinkKDD ’05. ACM, New York, pp 36–43
Adelson RM (1966) Compound Poisson distributions. Oper Res Q 17(1):73–75
Alon U (2007) Network motifs: theory and experimental approaches. Nat Rev Genet 8(6):450–461
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, Harris M, Hill D, Issel-Tarver L, Kasarskis A, Lewis S, Matese J, Richardson J, Ringwald M, Rubin G, Sherlock G (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Batagelj V, Mrvar A, Zaversnik M (2002) Network analysis of dictionaries. In: Language technologies, pp 135–142
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Fridman WH, Pages F, Trajanoski Z, Galon J (2009) ClueGO: a cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093
Dimitropoulos X, Krioukov D, Huffaker B, Claffy K, Riley G (2005) Inferring AS relationships: dead end or lively beginning? In: Nikoletseas SE (ed) Experimental and efficient algorithms. Springer, Berlin, pp 113–125
Dimitropoulos XA, Krioukov DV, Riley GF, Claffy KC (2006) Revealing the autonomous system taxonomy: the machine learning approach. CoRR abs/cs/0604015
Grochow JA, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed T, Huang H (eds) Research in computational molecular biology. Springer, Berlin, pp 92–106
Kashtan N, Alon U (2005) Spontaneous evolution of modularity and network motifs. Proc Natl Acad Sci 102(39):13773–13778
Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A (2009) Human protein reference database–2009 update. Nucleic Acids Res 37(Database issue):D767–772
Kurata H, Maeda K, Onaka T, Takata T (2014) BioFNet: biological functional network database for analysis and synthesis of biological systems. Brief Bioinform 15(5):699–709
Ley M (2002) The DBLP computer science bibliography: evolution, research issues, perspectives. In: Laender AHF, Oliveira AL (eds) String processing and information retrieval. Springer, Berlin, pp 1–10
Maere S, Heymans K, Kuiper M (2005) BiNGO: a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21(16):3448–3449
McKay BD (1981) Practical graph isomorphism. Congressus numerantium 30:45–87
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
Milo R, Kashtan N, Itzkovitz S, Newman MEJ, Alon U (2003) On the uniform generation of random graphs with prescribed degree sequences. eprint arXiv:cond-mat/0312028
Newman MEJ, Strogatz SH, Watts DJ (2001) Random graphs with arbitrary degree distributions and their applications. Phys Rev E 64:026118
Opsahl T (2011) Why anchorage is not (that) important: binary ties and sample selection. https://toreopsahl.com/2011/08/12/
Picard F, Daudin JJ, Koskas M, Schbath S, Robin S (2008) Assessing the exceptionality of network motifs. J Comput Biol 15(1):1–20
Prill RJ, Iglesias PA, Levchenko A (2005) Dynamic properties of network motifs contribute to biological network organization. PLOS Biol 3(11):e343
Ribeiro P, Silva F (2012) Querying subgraph sets with g-tries. In: Proceedings of the 2Nd ACM SIGMOD workshop on databases and social networks, DBSocial ’12. ACM, New York, pp 25–30
Ribeiro P, Silva F (2014a) Discovering colored network motifs. In: Contucci P, Menezes R, Omicini A, Poncela-Casasnovas J (eds) Complex networks V. Springer International Publishing, Cham, pp 107–118
Ribeiro P, Silva F (2014b) G-Tries: a data structure for storing and finding subgraphs. Data Min Knowl Discov 28(2):337–377
Schbath S, Lacroix V, Sagot MF (2009) Assessing the exceptionality of coloured motifs in networks. EURASIP J Bioinform Syst Biol 2009:3:1–3:9
Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31(1):64–68
Solé RV, Pastor-Satorras R, Smith E, Kepler TB (2002) A model of large-scale proteome evolution. Adv Complex Syst 05(01):43–54
Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3(4):347–359
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings - 2002 IEEE international conference on data mining. ICDM 2002, pp 721–724
Acknowledgements
Shasha’s work has been partially supported by an INRIA International Chair and the U.S. National Science Foundation under grants MCB-1412232, IOS-1339362, MCB-1355462, MCB-1158273, IOS-0922738, and MCB-0929339. This support is greatly appreciated.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Mongioví, M., Micale, G., Ferro, A., Giugno, R., Pulvirenti, A., Shasha, D. (2018). gLabTrie: A Data Structure for Motif Discovery with Constraints. In: Fletcher, G., Hidders, J., Larriba-Pey, J. (eds) Graph Data Management. Data-Centric Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-96193-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-96193-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96192-7
Online ISBN: 978-3-319-96193-4
eBook Packages: Computer ScienceComputer Science (R0)