Abstract
Data centers are a crucial component of modern IT ecosystems. Their size and complexity present challenges in terms of maintaining and understanding knowledge about them. In this work we propose a novel methodology to create a semantic representation of a data center, leveraging graph-based data, external semantic knowledge, as well as continuous input and refinement captured with a human-in-the-loop interaction. Additionally, we specifically demonstrate the advantage of leveraging external knowledge to bootstrap the process. The main motivation behind the work is to support the task of migrating data centers, logically and/or physically, where the subject matter expert needs to identify the function of each node - a server, a virtual machine, a printer, etc - in the data center, which is not necessarily directly available in the data and to be able to plan a safe switch-off and relocation of a cluster of nodes. We test our method against two real-world datasets and show that we are able to correctly identify the function of each node in a data center with high performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
For example the DBpoedia endpoint http://dbpedia.org/sparql.
- 3.
For this work the similarity threshold has been set to 0.6.
- 4.
The words host and node are used interchangeably.
- 5.
The threshold was selected based on empirical observation.
- 6.
- 7.
- 8.
- 9.
The threshold was selected based on empirical evaluation.
- 10.
References
Alba, A., et al.: Task oriented data exploration with human-in-the-loop. A data center migration use case. In: Companion Proceedings of the 2019 World Wide Web Conference, WWW 2019, pp. 610–613. ACM, New York (2019)
Awad, M., Menasc, D.A.: Automatic workload characterization using system log analysis. In: Computer Measurement Group Conference (2015)
Benson, T., Akella, A., Maltz, D.A.: Network traffic characteristics of data centers in the wild. In: 10th ACM SIGCOMM (2010)
Benzadri, Z., Belala, F., Bouanaka, C.: Towards a formal model for cloud computing. In: Lomuscio, A.R., Nepal, S., Patrizi, F., Benatallah, B., Brandić, I. (eds.) ICSOC 2013. LNCS, vol. 8377, pp. 381–393. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06859-6_34
Bernstein, D., Clara, S., Court, N., Bernstein, D.: Using Semantic Web Ontology for Intercloud Directories and Exchanges (2010)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, pp. 2787–2795 (2013)
Bourdeau, R.H., Cheng, B.H.C.: A formal semantics for object model diagrams. IEEE Trans. Softw. Eng. 21(10), 799–821 (1995)
Costabello, L., Pai, S., Van, C.L., McGrath, R., McCarthy, N.: AmpliGraph: a Library for Representation Learning on Knowledge Graphs (2019)
Deng, Y., Sarkar, R., Ramasamy, H., Hosn, R., Mahindru, R.: An Ontology-Based Framework for Model-Driven Analysis of Situations in Data Centers (2013)
Gentile, A.L., Zhang, Z., Augenstein, I., Ciravegna, F.: Unsupervised wrapper induction using linked data. In: Proceedings of the Seventh International Conference on Knowledge Capture, pp. 41–48 (2013)
Grandison, T., Maximilien, E.M., Thorpe, S., Alba, A.: Towards a formal definition of a computing cloud. In: Services. IEEE (2010)
Guo, J.: Who limits the resource efficiency of my datacenter: an analysis of Alibaba datacenter traces. In: IWQoS 2019 (2019)
Hassan, W.U., Aguse, L., Aguse, N., Bates, A., Moyer, T.: Towards scalable cluster auditing through grammatical inference over provenance graphs. In: Network and Distributed Systems Security Symposium (2018)
Jiang, Y., Li, Y., Yang, C., Armstrong, E.M., Huang, T., Moroni, D.: Reconstructing sessions from data discovery and access logs to build a semantic knowledge base for improving data discovery. ISPRS 5, 54 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014, pp. 1746–1751. ACL, October 2014
Lemoudden, M., El Ouahidi, B.: Managing cloud-generated logs using big data technologies. In: WINCOM (2015)
Mavlyutov, R., Curino, C., Asipov, B., Cudre-mauroux, P.: Dependency-driven analytics: a compass for uncharted data oceans. In: 8th Biennial Conference on Innovative Data Systems Research (CIDR 2017) (2017)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL (2009)
Nickel, M., Rosasco, L., Poggio, T.: Holographic embeddings of knowledge graphs. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. Semant. Web 10, 1–32 (2018)
Shan, Y., Huang, Y., Chen, Y., Zhang, Y.: LegoOS: a disseminated, distributed \(\{\)OS\(\}\) for hardware resource disaggregation. In: 13th Symposium on Operating Systems Design and Implementation, pp. 69–87 (2018)
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G.: Complex embeddings for simple link prediction. In: ICML (2016)
Wu, H., et al.: Aladdin: optimized maximum flow management for shared production clusters. In: 2019 IEEE PDPS, pp. 696–707. IEEE (2019)
Yang, B., Yih, W., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
DeLuca, C., Gentile, A.L., Ristoski, P., Welch, S. (2020). Understanding Data Centers from Logs: Leveraging External Knowledge for Distant Supervision. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-62466-8_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62465-1
Online ISBN: 978-3-030-62466-8
eBook Packages: Computer ScienceComputer Science (R0)