Abstract
The area of graph embeddings is currently dominated by contrastive learning methods, which demand formulation of an explicit objective function and sampling of positive and negative examples. One of the leading class of models are graph convolutional networks (GCNs), which suffer from numerous performance issues. In this paper we present Cleora: a purely unsupervised and highly scalable graph embedding scheme. Cleora can be likened to a GCN stripped down to its most effective core operation - the repeated neighborhood aggregation. Cleora does not require the application of a GPU and can embed massive graphs on CPU only, beating other state-of-the-art CPU algorithms in terms of speed and quality as measured on downstream tasks. Cleora has been applied in top machine learning competitions involving recommendations and graph processing, taking the podium in KDD Cup 2021, WSDM Challenge 2021, and SIGIR eCom Challenge 2020. We open-source Cleora under the MIT license allowing commercial use under https://github.com/Synerise/cleora.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abu-El-Haija, S., Perozzi, B., Kapoor, A., Lee, J.: N-gcn: multi-scale graph convolution for semi-supervised node classification. In: UAI (2019)
Akyildiz, T.A., Aljundi, A.A., Kaya, K.: Gosh: embedding big graphs on small hardware. In: ICPP (2020)
Aletras, N., Chamberlain, B.P.: Predicting twitter user socioeconomic attributes with network and language information. In: Proceedings of the 29th on Hypertext and Social Media (2018)
Asatani, K., Mori, J., Ochi, M., Sakata, I.: Detecting trends in academic research from a citation network using network representation learning. PloS one 13, e0197260 (2018)
Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X.: Group formation in large social networks: membership, growth, and evolution. In: KDD (2006)
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems (2002)
Chen, H., Perozzi, B., Hu, Y., Skiena, S.: Harp: Hierarchical representation learning for networks. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Press (2018)
Chen, J., Zhu, J., Song, L.: Stochastic training of graph convolutional networks with variance reduction. In: ICML
Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Biased graph walks for rdf graph embeddings. In: WIMS (2017)
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)
Huang, W., Zhang, T., Rong, Y., Huang, J.: Adaptive sampling towards fast graph representation learning. In: NIPS (2018)
Jolliffe, I.T.: Principal Component Analysis and Factor Analysis. Springer, New York (1986)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW (2010)
Lerer, A., Wu, L., Shen, J., Lacroix, T., Wehrstedt, L., Bose, A., Peysakhovich, A.: PyTorch-BigGraph: a large-scale graph embedding system. In: SysML (2019)
Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 629–123 (2008)
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58, 1019–1031 (2007)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC 2007) (2007)
Mohamed, S.K., Nováček, V., Nounu, A.: Discovering protein drug targets using knowledge graph embeddings. Bioinf. 36, 603–610 (2019)
Nickel, M., Tresp, V., Kriegel, H.P.: A three-way model for collective learning on multi-relational data. In: Proceedings of the 28th International Conference on International Conference on Machine Learning. ICML (2011)
Oono, K., Suzuki, T.: Graph neural networks exponentially lose expressive power for node classification. In: ICLR (2020)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: KDD (2014)
Perozzi, B., Kulkarni, V., Chen, H., Skiena, S.: Don’t walk, skip! online learning of multi-scale network embeddings. In: ASONAM (2017)
Pornprasit, C., Liu, X., Kertkeidkachorn, N., Kim, K.S., Noraset, T., Tuarob, S.: Convcn: a cnn-based citation network embedding algorithm towards citation recommendation. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (2020)
Ristoski, P., Paulheim, H.: Rdf2vec: rdf graph embeddings for data mining. In: International Semantic Web Conference (2016)
Rozemberczki, B., Allen, C., Sarkar, R.: Multi-scale attributed node embedding (2019)
Sun, K., Lin, Z., Zhu, Z.: Adagcn: adaboosting graph convolutional networks into deep models. In: ICLR (2021)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: Large-scale information network embedding. In: WWW (2015)
Tsitsulin, A., Mottin, D., Karras, P., Müller, E.: Verse: Versatile graph embeddings from similarity measures (2018). https://doi.org/10.1145/3178876.3186120
Wu, F., Souza, A., Zhang, T., Fifty, C., Yu, T., Weinberger, K.: Simplifying graph convolutional networks. In: ICML (2019)
Wu, N., Zhao, X.W., Wang, J., Pan, D.: Learning effective road network representation with hierarchical graph neural networks. In: KDD (2020)
Yang, B., Yih, W.t., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014)
Yue, X., et al.: Graph embedding on biomedical networks: methods, applications and evaluations. Bioinf. 36, 1241–1251 (2020)
Zeng, H., Zhou, H., Srivastava, A., Kannan, R., Prasanna, V.: Accurate, efficient and scalable training of graph neural networks. J. Parallel Distrib. Comput.
Zhang, Y., Lyu, T., Zhang, Y.: Cosine: Community-preserving social network embedding from information diffusion cascades. In: AAAI (2018)
Zheng, C., Fan, X., Wang, C., Qi, J.: Gman: a graph multi-attention network for traffic prediction. In: AAAI (2020)
Zhu, Z., Xu, S., Qu, M., Tang, J.: Graphvite: a high-performance cpu-gpu hybrid system for node embedding. In: The World Wide Web Conference (2019)
Acknowledgements
Barbara Rychalska was supported by grant no 2018/31/N/ST6/02273 funded by National Science Centre, Poland.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rychalska, B., Bąbel, P., Gołuchowski, K., Michałowski, A., Dąbrowski, J., Biecek, P. (2021). Cleora: A Simple, Strong and Scalable Graph Embedding Scheme. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13111. Springer, Cham. https://doi.org/10.1007/978-3-030-92273-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-92273-3_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92272-6
Online ISBN: 978-3-030-92273-3
eBook Packages: Computer ScienceComputer Science (R0)