A Multiobjective Metaheuristic-Based Container Consolidation Model for Cloud Application Performance Improvement

Bracke, Vincent; Santos, José; Wauters, Tim; De Turck, Filip; Volckaert, Bruno

doi:10.1007/s10922-024-09835-7

A Multiobjective Metaheuristic-Based Container Consolidation Model for Cloud Application Performance Improvement

Published: 18 June 2024

Volume 32, article number 61, (2024)
Cite this article

Journal of Network and Systems Management Aims and scope Submit manuscript

216 Accesses
Explore all metrics

Abstract

This work describes an approach to enhance container orchestration platforms with an autonomous and dynamic rescheduling system that aims at improving application service time by co-locating highly interdependent containers for network delay reduction. Unreasonable container consolidation may however lead to host CPU saturation, in turn impairing the service time. The multiobjective approach proposed in this work aims to improve application service-time by minimizing both inter-server network traffic and CPU throttling on overloaded servers. To this extent, the Simulated Annealing combinatorial optimization heuristic is used and compared on its relative performance towards the optimal solution obtained by Mathematical Programming. Additionally, the impact of the proposed system is validated on a Kubernetes cluster hosting three concurrent applications, and this under varying load scenarios. The proposed rescheduling system systematically i) improves the application service-time (up to 27.2% from our experiments) and ii) surpasses the improvement reached by the Kubernetes descheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Online Dynamic Container Rescheduling for Improved Application Service Time

Article 29 August 2023

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications

Article 02 April 2018

Data Availability

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Notes

The VRP generalizes the Traveling Salesman Problem (TSP) [11]
More advanced alternatives are presented in [15]
inspired by [16]
as ‘load1’ reports the average load for the last minute.
https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/pkg/networkaware/README.md
The intuition is that highly interacting containers spend more time in communication when placed across different nodes, thereby leading to a degradation in the observed response times.
Sysadmins usually define a more conservative value (typically around 0.7) to provide some headroom to the system.
each with a weight of 0.5.
A more detailed justification for the selection of this specific metha-heuristic is described in [10]. Note however that the model described in this work is (meta-) heuristic/algorithm agnostic; while SA provides satisfactory results, we do not pretend it to be the most efficient or effective technique. In fact, comparing all possible techniques is considered out-of-scope for this work.
50 containers on 5 servers.
See Table 1 for a complete description of the test environment specifications.
For the ‘M’ scenario (see Table 5), CPLEX crashes after 4.3h (out-of-memory on a 12GB RAM server). At this stage, it still reports a ‘Gap’ value of 25.81% in the minimalization of the ‘cntc’ function (first step of the algorithm required to compute the normalizing factor).
This suggests that the size of the search-space negatively influences the performance of the exploration phase with only little progress observed in the progressive transition to the exploitation phase.
This value has been empirically defined and can be adapted according to cluster specificity.
Most of the descriptive part of subsection 5.1 is directly imported from [48]
"requiredDuringSchedulingIgnoredDuringExecution"
"preferredDuringSchedulingIgnoredDuringExecution"
For scoping reasons, this work only considers hard constraints. The impact of soft constraints is considered as a candidate topic for future extensions of this work.
Under stable load, the control loop should eventually stop to adapt the system it controls by converging to an optimum.
Pixie is an open source observability tool for K8s applications that is contributed to by New Relic, Inc. as a CNCF sandbox project since June 2021 [51]. Pixie has been selected for its streamlined simplicity of integration, though any other network monitoring tool able to report on inter-pod network traffic can be used instead.
By design any other possible optimization algorithm/metaheuristic can be used as the New Context Generator component launches its execution through an algorithm agnostic interface.
See Table 1 for the test environment specifications.
More details on the scope of each individual service can be found on the official website of the app [52] as well as in our previous article [10]
https://github.com/idlab-discover/obelisk.
Data isolation is internally ensured through the concept of scopes which represent logical data sets with configurable perimeter. A scope can be understood as a labelling mechanism aiming at isolating data between different contexts of use. Data access APIs for data ingestion, querying and streaming all require the scope to be mentioned.
While this might at first sound counter-intuitive, in most cases, setting CPU limits do more harm than help. In fact, they are the number one cause of CPU throttling [54]. Tim Hockin, one of the K8s maintainers at Google, even suggests to never set CPU limits (https://x.com/thockin/status/1134193838841401345?s=20)
https://github.com/kubernetes-sigs/descheduler - 17/11/2023.
Currently, pods request resource requirements are considered for computing node resource utilization.(...) Implementing metrics-based descheduling is currently TODO for the project. 17/11/2023 - https://github.com/kubernetes-sigs/descheduler
3 (reschedulable) Pods that may be scheduled onto 6 distinct nodes (the scheduler may indeed reassign a pod to the node it was running on prior to the descheduling)
For instance, we could observe the reassignment of 1,2 or 3 Pods, on 1,2 or 3 Nodes, sometimes even back on the original node (Node4).

References

Silva, V.G., Kirikova, M., Alksnis, G.: Containers for virtualization: an overview. Appl. Comput. Syst. 23(1), 21–27 (2018). https://doi.org/10.2478/acss-2018-0003
Article Google Scholar
Docker, Inc.: Docker website (2023). https://www.docker.com
The Linux Foundation: LXC/LXD website (2023). https://linuxcontainers.org
Red Hat, Inc.: Podman website (2023). https://podman.io/
The Cloud Native Computing Foundation: Containerd website (2023). https://containerd.io
The Apache Software Foundation: Mesos website (2023). http://mesos.apache.org
Docker, Inc.: Docker Swarm website (2023). https://docs.docker.com/engine/swarm/
The Cloud Native Computing Foundation: Kubernetes website (2023). https://kubernetes.io (2023)
Kalmbach, P., Zerwas, J., Babarczi, P., Blenk, A., Kellerer, W., Schmid, S.: Empowering self-driving networks. In: Proceedings of the Afternoon Workshop on Self-Driving Networks. SelfDN 2018, pp. 8–14. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3229584.3229587
Bracke, V., Werrebrouck, G., Santos, J.P., Wauters, T., De Turck, F., Volckaert, B.: Online dynamic container rescheduling for improved application service time. J. Netw. Syst. Manag. 31(4) (2023) https://doi.org/10.1007/s10922-023-09766-9
Robinson, J.B.: On the Hamiltonian Game (A Traveling Salesman Problem). RAND Corporation, Santa Monica (1949)
Google Scholar
Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manage. Sci. 6, 80–91 (1959). https://doi.org/10.1287/mnsc.6.1.80
Article MathSciNet Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). https://doi.org/10.1109/TIT.1982.1056489
Article MathSciNet Google Scholar
Croes, G.A.: A method for solving traveling-salesman problems. Oper. Res. 6(6), 791–812 (1958). https://doi.org/10.1287/opre.6.6.791
Article MathSciNet Google Scholar
Reinelt, G.: The Traveling Salesman: Computational Solutions for TSP Applications, 1st edn. Lecture Notes in Computer Science, vol. 840. Springer, Berlin, Heidelberg (1994) https://doi.org/10.1007/3-540-48661-5
Almeida, R.S.: TSP Essay (2020). https://github.com/rsalmei/tsp-essay. Accessed: 20 Jan 2023
Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput.: Pract. Exper. 24(13), 1397–1420 (2012). https://doi.org/10.1002/cpe.1867
Article Google Scholar
Mahdhi, T., Mezni, H.: A prediction-based VM consolidation approach in IaaS cloud data centers. J. Syst. Softw. 146, 263–285 (2018). https://doi.org/10.1016/j.jss.2018.09.083
Article Google Scholar
Wang, J.V., Cheng, C.-T., Tse, C.K.: A thermal-aware VM consolidation mechanism with outage avoidance. Experience 49(5), 906–920 (2019). https://doi.org/10.1002/spe.2680
Article Google Scholar
Zhao, D., Mohamed, M., Ludwig, H.: Locality-aware scheduling for containers in cloud computing. IEEE Trans. Cloud Comput. 8(2), 635–646 (2020). https://doi.org/10.1109/TCC.2018.2794344
Article Google Scholar
Filip, I.-D., Pop, F., Serbanescu, C., Choi, C.: Microservices scheduling model over heterogeneous cloud-edge environments as support for IoT applications. IEEE Internet Things J. 5(4), 2672–2681 (2018). https://doi.org/10.1109/JIOT.2018.2792940
Article Google Scholar
Nanda, S., Hacker, T.J.: RACC: Resource-aware container consolidation using a deep learning approach. In: Proceedings of the First Workshop on Machine Learning for Computing Systems. MLCS’18. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3217871.3217876
Wen, Z., Lin, T., Yang, R., Ji, S., Ranjan, R., Romanovsky, A., Lin, C., Xu, J.: GA-Par: dependable microservice orchestration framework for geo-distributed clouds. IEEE Trans. Parallel Distrib. Syst. 31(1), 129–143 (2020). https://doi.org/10.1109/TPDS.2019.2929389
Article Google Scholar
Guerrero, C., Lera, I., Juiz, C.: Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications. J. Supercomput. 74(7), 2956–2983 (2018). https://doi.org/10.1007/s11227-018-2345-2
Article Google Scholar
Bittencourt, L.F., Goldman, A., Madeira, E.R.M., da Fonseca, N.L.S., Sakellariou, R.: Scheduling in distributed systems: a cloud computing perspective. Comput. Sci. Rev. 30, 31–54 (2018). https://doi.org/10.1016/j.cosrev.2018.08.002
Article Google Scholar
Söylemez, M., Tekinerdogan, B., Tarhan, A.K.: Challenges and solution directions of microservice architectures: a systematic literature review. Appl. Sci. (2022). https://doi.org/10.3390/app12115507
Article Google Scholar
Santos, J., Wang, C., Wauters, T., Turck, F.D.: Diktyo: network-aware scheduling in container-based clouds. IEEE Transactions on Network and Service Management, 1–1 (2023) https://doi.org/10.1109/TNSM.2023.3271415
Zhou, R., Li, Z., Wu, C.: An efficient online placement scheme for cloud container clusters. IEEE J. Sel. Areas Commun. 37(5), 1046–1058 (2019). https://doi.org/10.1109/JSAC.2019.2906745
Article Google Scholar
Piraghaj, S.F., Dastjerdi, A.V., Calheiros, R.N., Buyya, R.: A framework and algorithm for energy efficient container consolidation in cloud data centers. In: 2015 IEEE International Conference on Data Science and Data Intensive Systems, pp. 368–375 (2015). https://doi.org/10.1109/DSDIS.2015.67
Rattihalli, G.: Exploring potential for resource request right-sizing via estimation and container migration in Apache Mesos. In: 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion), pp. 59–64 (2018). https://doi.org/10.1109/UCC-Companion.2018.00035
Bulej, L., Bureš, T., Hnětynka, P., Khalyeyev, D.: Self-adaptive K8S cloud controller for time-sensitive applications. In: 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 166–169 (2021). https://doi.org/10.1109/SEAA53835.2021.00029
Rodriguez, M., Buyya, R.: Container orchestration with cost-efficient autoscaling in cloud computing environments. In: Handbook of Research on Multimedia Cyber Security, pp. 190–213 (2020). https://doi.org/10.4018/978-1-7998-2701-6.ch010
Wojciechowski, L., Opasiak, K., Latusek, J., Wereski, M., Morales, V., Kim, T., Hong, M.: NetMARKS: network metrics-aware Kubernetes Scheduler powered by service mesh. In: IEEE INFOCOM 2021—IEEE Conference on Computer Communications, pp. 1–9 (2021). https://doi.org/10.1109/INFOCOM42981.2021.9488670
Marchese, A., Tomarchio, O.: Network-aware container placement in cloud-edge kubernetes clusters. In: 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 859–865 (2022). https://doi.org/10.1109/CCGrid54584.2022.00102
Joseph, C.T., Chandrasekaran, K.: Nature-inspired resource management and dynamic rescheduling of microservices in Cloud datacenters. Concurrency Comput.: Practice Exp. 33(17), 6290 (2021). https://doi.org/10.1002/cpe.6290
Article Google Scholar
Podzimek, A., Chen, L.Y.: Transforming system load to throughput for consolidated applications. In: 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 288–292 (2013). https://doi.org/10.1109/MASCOTS.2013.37
Arora, J.S.: Chapter 18—Multi-objective optimum design concepts and methods. In: Arora, J.S. (ed.) Introduction to Optimum Design, 4th edn., pp. 771–794. Academic Press, Boston (2017). https://doi.org/10.1016/B978-0-12-800806-5.00018-4
Caramia, M., Dell’Olmo, P.: Multi-objective optimization. In: Multi-objective management in freight logistics: increasing capacity, service level, sustainability, and safety with optimization algorithms, pp. 21–51. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50812-8_2
Grodzevich, O., Romanko, O.: Normalization and other topics in multi-objective optimization. In: Proceedings of the fields-MITACS Industrial Problems Workshop (2006)
Emmerich, M., Deutz, A.: A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Natural Comput. 17 (2018) https://doi.org/10.1007/s11047-018-9685-y
Kirkpatrick, S., Gelatt, C., Vecchi, M.: Optimization by simulated annealing. Science 220, 671–680 (1983). https://doi.org/10.1126/science.220.4598.671
Article MathSciNet Google Scholar
Cerny, V.: Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J. Optim. Theory Appl. 45, 41–51 (1985). https://doi.org/10.1007/BF00940812
Article MathSciNet Google Scholar
Koulamas, C., Antony, S., Jaen, R.: A survey of simulated annealing applications to operations research problems. Omega 22(1), 41–56 (1994). https://doi.org/10.1016/0305-0483(94)90006-X
Article Google Scholar
Connolly, D.T.: An improved annealing scheme for the QAP. Eur. J. Oper. Res. 46(1), 93–100 (1990). https://doi.org/10.1016/0377-2217(90)90301-Q
Article MathSciNet Google Scholar
Fidanova, S.: Simulated annealing for grid scheduling problem. In: IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing (JVA’06), pp. 41–45 (2006). https://doi.org/10.1109/JVA.2006.44
Ellison Geltman, K.: The simulated annealing algorithm. http://katrinaeg.com/simulated-annealing.html (2014)
Flexera: 2023 state of the cloud report (2023). https://info.flexera.com/CM-REPORT-State-of-the-Cloud
Kubernetes SIGs.: Kubernetes concepts (2023). https://kubernetes.io/docs/concepts/. Accessed: 22 June 2023
Santos, J., Wauters, T., Volckaert, B., De Turck, F.: Towards network-aware resource provisioning in Kubernetes for fog computing applications. In: 2019 IEEE Conference on Network Softwarization (NetSoft), pp. 351–359 (2019). https://doi.org/10.1109/NETSOFT.2019.8806671
Kubernetes SIGs: Descheduler for Kubernetes (2023). https://github.com/kubernetes-sigs/descheduler. Accessed: 15 June 2023
pixielabs.ai: pixie overview (2023). https://docs.pixielabs.ai/about-pixie/what-is-pixie. Accessed: 6 Oct 2023
google.com: GoogleCloudPlatform microservices-demo: Online Boutique (2023). https://github.com/GoogleCloudPlatform/microservices-demo. Accessed: 10 Oct 2023
Bracke, V., Sebrechts, M., Moons, B., Hoebeke, J., De Turck, F., Volckaert, B.: Design and evaluation of a scalable Internet of Things backend for smart ports. Software: Practice Exp. 51(7), 1557–1579 (2021). https://doi.org/10.1002/spe.2973
Article Google Scholar
Yellin, N.: For the love of god, stop using CPU limits on Kubernetes (updated) (2022). https://home.robusta.dev/blog/stop-using-cpu-limits. Accessed 10 August 2023

Download references

Acknowledgements

José Santos is funded by the Research Foundation Flanders (FWO), grant number 1299323N.

Funding

José Santos is funded by the Research Foundation Flanders (FWO), grant number 1299323N.

Author information

Authors and Affiliations

IDLab, Department of Information Technology, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, B-9052, Belgium
Vincent Bracke, José Santos, Tim Wauters, Filip De Turck & Bruno Volckaert

Authors

Vincent Bracke
View author publications
You can also search for this author in PubMed Google Scholar
José Santos
View author publications
You can also search for this author in PubMed Google Scholar
Tim Wauters
View author publications
You can also search for this author in PubMed Google Scholar
Filip De Turck
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Volckaert
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Vincent Bracke substantially contributed to the conception and design of the work, to the acquisition, analysis and interpretation of data and to the creation of new software used in the work. He drafted the work and substantively revised it. José Santos and Tim Wauters substantially contributed to the analysis and interpretation of data. They substantively revised the work. Filip De Turck and Bruno Volckaert substantially contributed to the conception of the work, as well as to the analysis and interpretation of data. They drafted the work and substantively revised it. All authors have approved the submitted version.

Corresponding author

Correspondence to Vincent Bracke.

Ethics declarations

Conflict of interest

Filip De Turck, a listed author, is member of the editorial advisory board of this Journal. The authors declare that they have no other Conflict of interest.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bracke, V., Santos, J., Wauters, T. et al. A Multiobjective Metaheuristic-Based Container Consolidation Model for Cloud Application Performance Improvement. J Netw Syst Manage 32, 61 (2024). https://doi.org/10.1007/s10922-024-09835-7

Download citation

Received: 14 December 2023
Revised: 23 May 2024
Accepted: 28 May 2024
Published: 18 June 2024
DOI: https://doi.org/10.1007/s10922-024-09835-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A Multiobjective Metaheuristic-Based Container Consolidation Model for Cloud Application Performance Improvement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online Dynamic Container Rescheduling for Improved Application Service Time

A survey of Kubernetes scheduling algorithms

Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A Multiobjective Metaheuristic-Based Container Consolidation Model for Cloud Application Performance Improvement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online Dynamic Container Rescheduling for Improved Application Service Time

A survey of Kubernetes scheduling algorithms

Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation