Abstract
In this work, we discuss our experience when utilizing the Kubernetes orchestrator (K8s) to efficiently allocate resources in a heterogeneous and dynamic academic environment. In the commercial world, the “pay per use” model is a strong regulating factor for efficient resource usage. In the academic environment, resources are usually provided “for free” to the end-users, thus they often lack a clear motivation to plan their use efficiently. In this paper, we show three major sources of inefficiencies. One is the users’ requirement to have interactive computing environments, where the users need resources for their application as soon as possible. Users do not appreciate waiting for interactive environments, but constantly keeping some resources available for interactive tasks is inefficient. The second phenomenon is observable in both interactive and batch workloads; users tend to overestimate necessary limits for their computations, thus wasting resources. Finally, Kubernetes does not support fair-sharing functionality (dynamic user priorities) which hampers the efforts when developing a fair scheme for Pod/job scheduling and/or eviction. We discuss various approaches to deal with these problems such as scavenger jobs, placeholder jobs, Kubernetes-specific resource allocation policies, separate clusters, priority classes, and novel hybrid cloud approach. We also show that all these proposals open interesting scheduling-related questions that are hard to answer with existing Kubernetes tools and policies. Last but not least, we provide a real workload trace from our installation to the scheduling community which captures these phenomena.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.
- 4.
- 5.
- 6.
- 7.
- 8.
In our system, HPC workloads typically utilize more than 80% of requested CPU resources.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
References
CERIT Scientific Cloud, July 2022. http://www.cerit-sc.cz
Chen, J., Cao, C., Zhang, Y., Ma, X., Zhou, H., Yang, C.: Improving cluster resource efficiency with oversubscription. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), vol. 01, pp. 144–153 (2018). https://doi.org/10.1109/COMPSAC.2018.00027
Farias, G., da Silva, V.B., Brasileiro, F., Lopes, R., Turull, D.: Availability-driven scheduling in kubernetes
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2011) (2011)
Hamzeh, H., Meacham, S., Khan, K.: A new approach to calculate resource limits with fairness in kubernetes. In: 2019 First International Conference on Digital Data Processing (DDP), pp. 51–58 (2019). https://doi.org/10.1109/DDP.2019.00020
Hamzeh, H., Meacham, S., Khan, K., Phalp, K., Stefanidis, A.: FFMRA: a fully fair multi-resource allocation algorithm in cloud environments. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Internet of People and Smart City Innovation, pp. 279–286 (2019). https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00091
Hamzeh, H., Meacham, S., Virginas, B., Khan, K., Phalp, K.: MLF-DRS: a multi-level fair resource allocation algorithm in heterogeneous cloud computing systems. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 316–321 (2019). https://doi.org/10.1109/CCOMS.2019.8821774
JSSPP workloads archive (July 2022). https://jsspp.org/workload/
Kane, K., Dillaway, B.: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise. In: Proceedings of the 6th International Workshop on Middleware for Grid Computing. Association for Computing Machinery, Inc., December 2008
Klusáček, D., Parák, B.: Analysis of mixed workloads from shared cloud infrastructure. In: Klusáček, D., Cirne, W., Desai, N. (eds.) JSSPP 2017. LNCS, vol. 10773, pp. 25–42. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77398-8_2
Klusáček, D., Chlumský, V.: Planning and metaheuristic optimization in production job scheduler. In: Desai, N., Cirne, W. (eds.) JSSPP 2015-2016. LNCS, vol. 10353, pp. 198–216. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61756-5_11
Le, T.N., Liu, Z.: Flex: closing the gaps between usage and allocation. In: Proceedings of the Eleventh ACM International Conference on Future Energy Systems. e-Energy 2020, pp. 404–405. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3396851.3403514
Ma, K., Wang, K.: Introducing Volcano : a Kubernetes native batch system for high performance workload. In: KubeCon Europe. CNCF (2019)
Medel, V., Tolón, C., Arronategui, U., Tolosana-Calasanz, R., Bañares, J., Rana, O.: Client-side scheduling based on application characterization on kubernetes, pp. 162–176 (2017). https://doi.org/10.1007/978-3-319-68066-8_13
Morris, A.: Choosing the right scheduler for HPC and AI workloads. https://www.hpcwire.com/solution_content/ibm/cross-industry/choosing-the-right-scheduler-for-hpc-and-ai-workloads/
Randal, A.: The ideal versus the real: revisiting the history of virtual machines and containers. ACM Comput. Surv. 53(1) (2020). https://doi.org/10.1145/3365199
Tsafrir, D.: Using inaccurate estimates accurately. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 208–221. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16505-4_12
Acknowledgments
Access to the CERIT-SC computing and storage facilities provided by the CERIT-SC Center, under the program “Projects of Large Research, Development, and Innovations Infrastructures” (CERIT Scientific Cloud LM2015085), is greatly appreciated. We also acknowledge the support supplied by the project “e-Infrastruktura CZ” (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Spišaková, V., Klusáček, D., Hejtmánek, L. (2023). Using Kubernetes in Academic Environment: Problems and Approaches. In: Klusáček, D., Julita, C., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2022. Lecture Notes in Computer Science, vol 13592. Springer, Cham. https://doi.org/10.1007/978-3-031-22698-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-22698-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22697-7
Online ISBN: 978-3-031-22698-4
eBook Packages: Computer ScienceComputer Science (R0)