A Multiobjective Metaheuristic-Based Container Consolidation Model for Cloud Application Performance Improvement | Journal of Network and Systems Management Skip to main content
Log in

A Multiobjective Metaheuristic-Based Container Consolidation Model for Cloud Application Performance Improvement

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

This work describes an approach to enhance container orchestration platforms with an autonomous and dynamic rescheduling system that aims at improving application service time by co-locating highly interdependent containers for network delay reduction. Unreasonable container consolidation may however lead to host CPU saturation, in turn impairing the service time. The multiobjective approach proposed in this work aims to improve application service-time by minimizing both inter-server network traffic and CPU throttling on overloaded servers. To this extent, the Simulated Annealing combinatorial optimization heuristic is used and compared on its relative performance towards the optimal solution obtained by Mathematical Programming. Additionally, the impact of the proposed system is validated on a Kubernetes cluster hosting three concurrent applications, and this under varying load scenarios. The proposed rescheduling system systematically i) improves the application service-time (up to 27.2% from our experiments) and ii) surpasses the improvement reached by the Kubernetes descheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. The VRP generalizes the Traveling Salesman Problem (TSP) [11]

  2. More advanced alternatives are presented in [15]

  3. inspired by [16]

  4. as ‘load1’ reports the average load for the last minute.

  5. https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/pkg/networkaware/README.md

  6. The intuition is that highly interacting containers spend more time in communication when placed across different nodes, thereby leading to a degradation in the observed response times.

  7. Sysadmins usually define a more conservative value (typically around 0.7) to provide some headroom to the system.

  8. each with a weight of 0.5.

  9. A more detailed justification for the selection of this specific metha-heuristic is described in [10]. Note however that the model described in this work is (meta-) heuristic/algorithm agnostic; while SA provides satisfactory results, we do not pretend it to be the most efficient or effective technique. In fact, comparing all possible techniques is considered out-of-scope for this work.

  10. 50 containers on 5 servers.

  11. See Table 1 for a complete description of the test environment specifications.

  12. For the ‘M’ scenario (see Table 5), CPLEX crashes after 4.3h (out-of-memory on a 12GB RAM server). At this stage, it still reports a ‘Gap’ value of 25.81% in the minimalization of the ‘cntc’ function (first step of the algorithm required to compute the normalizing factor).

  13. This suggests that the size of the search-space negatively influences the performance of the exploration phase with only little progress observed in the progressive transition to the exploitation phase.

  14. This value has been empirically defined and can be adapted according to cluster specificity.

  15. Most of the descriptive part of subsection 5.1 is directly imported from [48]

  16. "requiredDuringSchedulingIgnoredDuringExecution"

  17. "preferredDuringSchedulingIgnoredDuringExecution"

  18. For scoping reasons, this work only considers hard constraints. The impact of soft constraints is considered as a candidate topic for future extensions of this work.

  19. Under stable load, the control loop should eventually stop to adapt the system it controls by converging to an optimum.

  20. Pixie is an open source observability tool for K8s applications that is contributed to by New Relic, Inc. as a CNCF sandbox project since June 2021 [51]. Pixie has been selected for its streamlined simplicity of integration, though any other network monitoring tool able to report on inter-pod network traffic can be used instead.

  21. By design any other possible optimization algorithm/metaheuristic can be used as the New Context Generator component launches its execution through an algorithm agnostic interface.

  22. See Table 1 for the test environment specifications.

  23. More details on the scope of each individual service can be found on the official website of the app [52] as well as in our previous article [10]

  24. https://github.com/idlab-discover/obelisk.

  25. Data isolation is internally ensured through the concept of scopes which represent logical data sets with configurable perimeter. A scope can be understood as a labelling mechanism aiming at isolating data between different contexts of use. Data access APIs for data ingestion, querying and streaming all require the scope to be mentioned.

  26. While this might at first sound counter-intuitive, in most cases, setting CPU limits do more harm than help. In fact, they are the number one cause of CPU throttling [54]. Tim Hockin, one of the K8s maintainers at Google, even suggests to never set CPU limits (https://x.com/thockin/status/1134193838841401345?s=20)

  27. https://github.com/kubernetes-sigs/descheduler - 17/11/2023.

  28. Currently, pods request resource requirements are considered for computing node resource utilization.(...) Implementing metrics-based descheduling is currently TODO for the project. 17/11/2023 - https://github.com/kubernetes-sigs/descheduler

  29. 3 (reschedulable) Pods that may be scheduled onto 6 distinct nodes (the scheduler may indeed reassign a pod to the node it was running on prior to the descheduling)

  30. For instance, we could observe the reassignment of 1,2 or 3 Pods, on 1,2 or 3 Nodes, sometimes even back on the original node (Node4).

References

  1. Silva, V.G., Kirikova, M., Alksnis, G.: Containers for virtualization: an overview. Appl. Comput. Syst. 23(1), 21–27 (2018). https://doi.org/10.2478/acss-2018-0003

    Article  Google Scholar 

  2. Docker, Inc.: Docker website (2023). https://www.docker.com

  3. The Linux Foundation: LXC/LXD website (2023). https://linuxcontainers.org

  4. Red Hat, Inc.: Podman website (2023). https://podman.io/

  5. The Cloud Native Computing Foundation: Containerd website (2023). https://containerd.io

  6. The Apache Software Foundation: Mesos website (2023). http://mesos.apache.org

  7. Docker, Inc.: Docker Swarm website (2023). https://docs.docker.com/engine/swarm/

  8. The Cloud Native Computing Foundation: Kubernetes website (2023). https://kubernetes.io (2023)

  9. Kalmbach, P., Zerwas, J., Babarczi, P., Blenk, A., Kellerer, W., Schmid, S.: Empowering self-driving networks. In: Proceedings of the Afternoon Workshop on Self-Driving Networks. SelfDN 2018, pp. 8–14. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3229584.3229587

  10. Bracke, V., Werrebrouck, G., Santos, J.P., Wauters, T., De Turck, F., Volckaert, B.: Online dynamic container rescheduling for improved application service time. J. Netw. Syst. Manag. 31(4) (2023) https://doi.org/10.1007/s10922-023-09766-9

  11. Robinson, J.B.: On the Hamiltonian Game (A Traveling Salesman Problem). RAND Corporation, Santa Monica (1949)

    Google Scholar 

  12. Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manage. Sci. 6, 80–91 (1959). https://doi.org/10.1287/mnsc.6.1.80

    Article  MathSciNet  Google Scholar 

  13. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982). https://doi.org/10.1109/TIT.1982.1056489

    Article  MathSciNet  Google Scholar 

  14. Croes, G.A.: A method for solving traveling-salesman problems. Oper. Res. 6(6), 791–812 (1958). https://doi.org/10.1287/opre.6.6.791

    Article  MathSciNet  Google Scholar 

  15. Reinelt, G.: The Traveling Salesman: Computational Solutions for TSP Applications, 1st edn. Lecture Notes in Computer Science, vol. 840. Springer, Berlin, Heidelberg (1994) https://doi.org/10.1007/3-540-48661-5

  16. Almeida, R.S.: TSP Essay (2020). https://github.com/rsalmei/tsp-essay. Accessed: 20 Jan 2023

  17. Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput.: Pract. Exper. 24(13), 1397–1420 (2012). https://doi.org/10.1002/cpe.1867

    Article  Google Scholar 

  18. Mahdhi, T., Mezni, H.: A prediction-based VM consolidation approach in IaaS cloud data centers. J. Syst. Softw. 146, 263–285 (2018). https://doi.org/10.1016/j.jss.2018.09.083

    Article  Google Scholar 

  19. Wang, J.V., Cheng, C.-T., Tse, C.K.: A thermal-aware VM consolidation mechanism with outage avoidance. Experience 49(5), 906–920 (2019). https://doi.org/10.1002/spe.2680

    Article  Google Scholar 

  20. Zhao, D., Mohamed, M., Ludwig, H.: Locality-aware scheduling for containers in cloud computing. IEEE Trans. Cloud Comput. 8(2), 635–646 (2020). https://doi.org/10.1109/TCC.2018.2794344

    Article  Google Scholar 

  21. Filip, I.-D., Pop, F., Serbanescu, C., Choi, C.: Microservices scheduling model over heterogeneous cloud-edge environments as support for IoT applications. IEEE Internet Things J. 5(4), 2672–2681 (2018). https://doi.org/10.1109/JIOT.2018.2792940

    Article  Google Scholar 

  22. Nanda, S., Hacker, T.J.: RACC: Resource-aware container consolidation using a deep learning approach. In: Proceedings of the First Workshop on Machine Learning for Computing Systems. MLCS’18. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3217871.3217876

  23. Wen, Z., Lin, T., Yang, R., Ji, S., Ranjan, R., Romanovsky, A., Lin, C., Xu, J.: GA-Par: dependable microservice orchestration framework for geo-distributed clouds. IEEE Trans. Parallel Distrib. Syst. 31(1), 129–143 (2020). https://doi.org/10.1109/TPDS.2019.2929389

    Article  Google Scholar 

  24. Guerrero, C., Lera, I., Juiz, C.: Resource optimization of container orchestration: a case study in multi-cloud microservices-based applications. J. Supercomput. 74(7), 2956–2983 (2018). https://doi.org/10.1007/s11227-018-2345-2

    Article  Google Scholar 

  25. Bittencourt, L.F., Goldman, A., Madeira, E.R.M., da Fonseca, N.L.S., Sakellariou, R.: Scheduling in distributed systems: a cloud computing perspective. Comput. Sci. Rev. 30, 31–54 (2018). https://doi.org/10.1016/j.cosrev.2018.08.002

    Article  Google Scholar 

  26. Söylemez, M., Tekinerdogan, B., Tarhan, A.K.: Challenges and solution directions of microservice architectures: a systematic literature review. Appl. Sci. (2022). https://doi.org/10.3390/app12115507

    Article  Google Scholar 

  27. Santos, J., Wang, C., Wauters, T., Turck, F.D.: Diktyo: network-aware scheduling in container-based clouds. IEEE Transactions on Network and Service Management, 1–1 (2023) https://doi.org/10.1109/TNSM.2023.3271415

  28. Zhou, R., Li, Z., Wu, C.: An efficient online placement scheme for cloud container clusters. IEEE J. Sel. Areas Commun. 37(5), 1046–1058 (2019). https://doi.org/10.1109/JSAC.2019.2906745

    Article  Google Scholar 

  29. Piraghaj, S.F., Dastjerdi, A.V., Calheiros, R.N., Buyya, R.: A framework and algorithm for energy efficient container consolidation in cloud data centers. In: 2015 IEEE International Conference on Data Science and Data Intensive Systems, pp. 368–375 (2015). https://doi.org/10.1109/DSDIS.2015.67

  30. Rattihalli, G.: Exploring potential for resource request right-sizing via estimation and container migration in Apache Mesos. In: 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion), pp. 59–64 (2018). https://doi.org/10.1109/UCC-Companion.2018.00035

  31. Bulej, L., Bureš, T., Hnětynka, P., Khalyeyev, D.: Self-adaptive K8S cloud controller for time-sensitive applications. In: 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 166–169 (2021). https://doi.org/10.1109/SEAA53835.2021.00029

  32. Rodriguez, M., Buyya, R.: Container orchestration with cost-efficient autoscaling in cloud computing environments. In: Handbook of Research on Multimedia Cyber Security, pp. 190–213 (2020). https://doi.org/10.4018/978-1-7998-2701-6.ch010

  33. Wojciechowski, L., Opasiak, K., Latusek, J., Wereski, M., Morales, V., Kim, T., Hong, M.: NetMARKS: network metrics-aware Kubernetes Scheduler powered by service mesh. In: IEEE INFOCOM 2021—IEEE Conference on Computer Communications, pp. 1–9 (2021). https://doi.org/10.1109/INFOCOM42981.2021.9488670

  34. Marchese, A., Tomarchio, O.: Network-aware container placement in cloud-edge kubernetes clusters. In: 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 859–865 (2022). https://doi.org/10.1109/CCGrid54584.2022.00102

  35. Joseph, C.T., Chandrasekaran, K.: Nature-inspired resource management and dynamic rescheduling of microservices in Cloud datacenters. Concurrency Comput.: Practice Exp. 33(17), 6290 (2021). https://doi.org/10.1002/cpe.6290

    Article  Google Scholar 

  36. Podzimek, A., Chen, L.Y.: Transforming system load to throughput for consolidated applications. In: 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 288–292 (2013). https://doi.org/10.1109/MASCOTS.2013.37

  37. Arora, J.S.: Chapter 18—Multi-objective optimum design concepts and methods. In: Arora, J.S. (ed.) Introduction to Optimum Design, 4th edn., pp. 771–794. Academic Press, Boston (2017). https://doi.org/10.1016/B978-0-12-800806-5.00018-4

  38. Caramia, M., Dell’Olmo, P.: Multi-objective optimization. In: Multi-objective management in freight logistics: increasing capacity, service level, sustainability, and safety with optimization algorithms, pp. 21–51. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50812-8_2

  39. Grodzevich, O., Romanko, O.: Normalization and other topics in multi-objective optimization. In: Proceedings of the fields-MITACS Industrial Problems Workshop (2006)

  40. Emmerich, M., Deutz, A.: A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Natural Comput. 17 (2018) https://doi.org/10.1007/s11047-018-9685-y

  41. Kirkpatrick, S., Gelatt, C., Vecchi, M.: Optimization by simulated annealing. Science 220, 671–680 (1983). https://doi.org/10.1126/science.220.4598.671

    Article  MathSciNet  Google Scholar 

  42. Cerny, V.: Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J. Optim. Theory Appl. 45, 41–51 (1985). https://doi.org/10.1007/BF00940812

    Article  MathSciNet  Google Scholar 

  43. Koulamas, C., Antony, S., Jaen, R.: A survey of simulated annealing applications to operations research problems. Omega 22(1), 41–56 (1994). https://doi.org/10.1016/0305-0483(94)90006-X

    Article  Google Scholar 

  44. Connolly, D.T.: An improved annealing scheme for the QAP. Eur. J. Oper. Res. 46(1), 93–100 (1990). https://doi.org/10.1016/0377-2217(90)90301-Q

    Article  MathSciNet  Google Scholar 

  45. Fidanova, S.: Simulated annealing for grid scheduling problem. In: IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing (JVA’06), pp. 41–45 (2006). https://doi.org/10.1109/JVA.2006.44

  46. Ellison Geltman, K.: The simulated annealing algorithm. http://katrinaeg.com/simulated-annealing.html (2014)

  47. Flexera: 2023 state of the cloud report (2023). https://info.flexera.com/CM-REPORT-State-of-the-Cloud

  48. Kubernetes SIGs.: Kubernetes concepts (2023). https://kubernetes.io/docs/concepts/. Accessed: 22 June 2023

  49. Santos, J., Wauters, T., Volckaert, B., De Turck, F.: Towards network-aware resource provisioning in Kubernetes for fog computing applications. In: 2019 IEEE Conference on Network Softwarization (NetSoft), pp. 351–359 (2019). https://doi.org/10.1109/NETSOFT.2019.8806671

  50. Kubernetes SIGs: Descheduler for Kubernetes (2023). https://github.com/kubernetes-sigs/descheduler. Accessed: 15 June 2023

  51. pixielabs.ai: pixie overview (2023). https://docs.pixielabs.ai/about-pixie/what-is-pixie. Accessed: 6 Oct 2023

  52. google.com: GoogleCloudPlatform microservices-demo: Online Boutique (2023). https://github.com/GoogleCloudPlatform/microservices-demo. Accessed: 10 Oct 2023

  53. Bracke, V., Sebrechts, M., Moons, B., Hoebeke, J., De Turck, F., Volckaert, B.: Design and evaluation of a scalable Internet of Things backend for smart ports. Software: Practice Exp. 51(7), 1557–1579 (2021). https://doi.org/10.1002/spe.2973

    Article  Google Scholar 

  54. Yellin, N.: For the love of god, stop using CPU limits on Kubernetes (updated) (2022). https://home.robusta.dev/blog/stop-using-cpu-limits. Accessed 10 August 2023

Download references

Acknowledgements

José Santos is funded by the Research Foundation Flanders (FWO), grant number 1299323N.

Funding

José Santos is funded by the Research Foundation Flanders (FWO), grant number 1299323N.

Author information

Authors and Affiliations

Authors

Contributions

Vincent Bracke substantially contributed to the conception and design of the work, to the acquisition, analysis and interpretation of data and to the creation of new software used in the work. He drafted the work and substantively revised it. José Santos and Tim Wauters substantially contributed to the analysis and interpretation of data. They substantively revised the work. Filip De Turck and Bruno Volckaert substantially contributed to the conception of the work, as well as to the analysis and interpretation of data. They drafted the work and substantively revised it. All authors have approved the submitted version.

Corresponding author

Correspondence to Vincent Bracke.

Ethics declarations

Conflict of interest

Filip De Turck, a listed author, is member of the editorial advisory board of this Journal. The authors declare that they have no other Conflict of interest.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bracke, V., Santos, J., Wauters, T. et al. A Multiobjective Metaheuristic-Based Container Consolidation Model for Cloud Application Performance Improvement. J Netw Syst Manage 32, 61 (2024). https://doi.org/10.1007/s10922-024-09835-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10922-024-09835-7

Keywords

Navigation