Abstract
Given the new requirements of Machine Learning problems in the last years, especially in what concerns the volume, diversity and speed of data, new approaches are needed to deal with the associated challenges. In this paper we describe CEDEs - a distributed learning system that runs on top of an Hadoop cluster and takes advantage of blocks, replication and balancing. CEDEs trains models in a distributed manner following the principle of data locality, and is able to change parts of the model through an optimization module, thus allowing a model to evolve over time as the data changes. This paper describes its generic architecture, details the implementation of the first modules, and provides a first validation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhou, L., Pan, S., Wang, J., Vasilakos, A.V.: Machine learning on big data: opportunities and challenges. Neurocomputing 237, 350–361 (2017)
Mohammadi, M., Al-Fuqaha, A., Sorour, S., Guizani, M.: Deep learning for IoT big data and streaming analytics: a survey. IEEE Commun. Surv. Tutor. 20(4), 2923–2960 (2018)
Gomes, H.M., Read, J., Bifet, A., Barddal, J.P., Gama, J.: Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explorations Newsl. 21(2), 6–22 (2019)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Attiya, H.: Concurrency and the principle of data locality. IEEE Distrib. Syst. Online 8(9), 3 (2007)
Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comput. Sci. 14(2), 241–258 (2020)
Carneiro, D., Guimarães, M., Silva, F., Novais, P.: A predictive and user-centric approach to machine learning in data streaming scenarios. Neurocomputing 484, 238–249 (2021)
Carneiro, D., Guimarães, M., Carvalho, M., Novais, P.: Using meta-learning to predict performance metrics in machine learning problems. Expert Syst. 40, e12900 (2021)
Ramos, D., Carneiro, D., Novais, P.: Using evolving ensembles to deal with concept drift in streaming scenarios. In: Camacho, D., Rosaci, D., Sarné, G.M.L., Versaci, M. (eds.) IDC 2021. SCI, vol. 1026, pp. 59–68. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-96627-0_6
Acknowledgments
This work was supported by FCT - Fundação para a Ciência e Tecnologia within projects UIDB/04728/2020 and EXPL/CCI-COM/0706/2021.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Oliveira, F. et al. (2023). Dynamic Management of Distributed Machine Learning Projects. In: Braubach, L., Jander, K., Bădică, C. (eds) Intelligent Distributed Computing XV. IDC 2022. Studies in Computational Intelligence, vol 1089. Springer, Cham. https://doi.org/10.1007/978-3-031-29104-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-29104-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29103-6
Online ISBN: 978-3-031-29104-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)