Using Kubernetes in Academic Environment: Problems and Approaches

Spišaková, Viktória; Klusáček, Dalibor; Hejtmánek, Lukáš

doi:10.1007/978-3-031-22698-4_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13592))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

510 Accesses

Abstract

In this work, we discuss our experience when utilizing the Kubernetes orchestrator (K8s) to efficiently allocate resources in a heterogeneous and dynamic academic environment. In the commercial world, the “pay per use” model is a strong regulating factor for efficient resource usage. In the academic environment, resources are usually provided “for free” to the end-users, thus they often lack a clear motivation to plan their use efficiently. In this paper, we show three major sources of inefficiencies. One is the users’ requirement to have interactive computing environments, where the users need resources for their application as soon as possible. Users do not appreciate waiting for interactive environments, but constantly keeping some resources available for interactive tasks is inefficient. The second phenomenon is observable in both interactive and batch workloads; users tend to overestimate necessary limits for their computations, thus wasting resources. Finally, Kubernetes does not support fair-sharing functionality (dynamic user priorities) which hampers the efforts when developing a fair scheme for Pod/job scheduling and/or eviction. We discuss various approaches to deal with these problems such as scavenger jobs, placeholder jobs, Kubernetes-specific resource allocation policies, separate clusters, priority classes, and novel hybrid cloud approach. We also show that all these proposals open interesting scheduling-related questions that are hard to answer with existing Kubernetes tools and policies. Last but not least, we provide a real workload trace from our installation to the scheduling community which captures these phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 7435; Price includes VAT (Japan)

Softcover Book: JPY 9294; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Standard Kubernetes Scheduling Interfaces for Converged Computing

Scientific workflow scheduling algorithms in cloud environments: a comprehensive taxonomy, survey, and future directions

Article 28 October 2024

MQFURP: An Overprovision Strategy Supporting Performance Interference Management in Cloud

Notes

1.
https://kubernetes.io.
2.
https://kubernetes.io/blog/2021/04/19/introducing-indexed-jobs/.
3.
Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.
4.
https://slurm.schedmd.com/documentation.html.
5.
https://www.openpbs.org.
6.
https://slurm.schedmd.com/containers.html.
7.
https://openondemand.org.
8.
In our system, HPC workloads typically utilize more than 80% of requested CPU resources.
9.
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/.
10.
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/.
11.
https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/.
12.
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/.
13.
https://github.com/kubernetes/kubernetes/pull/102884.
14.
https://www.redhat.com/en/topics/cloud-computing/what-is-hybrid-cloud.
15.
https://github.com/kubecost/cost-model.
16.
https://aws.amazon.com.
17.
https://aws.amazon.com/ec2/spot/.
18.
https://github.com.

References

CERIT Scientific Cloud, July 2022. http://www.cerit-sc.cz
Chen, J., Cao, C., Zhang, Y., Ma, X., Zhou, H., Yang, C.: Improving cluster resource efficiency with oversubscription. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), vol. 01, pp. 144–153 (2018). https://doi.org/10.1109/COMPSAC.2018.00027
Farias, G., da Silva, V.B., Brasileiro, F., Lopes, R., Turull, D.: Availability-driven scheduling in kubernetes
Google Scholar
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2011) (2011)
Google Scholar
Hamzeh, H., Meacham, S., Khan, K.: A new approach to calculate resource limits with fairness in kubernetes. In: 2019 First International Conference on Digital Data Processing (DDP), pp. 51–58 (2019). https://doi.org/10.1109/DDP.2019.00020
Hamzeh, H., Meacham, S., Khan, K., Phalp, K., Stefanidis, A.: FFMRA: a fully fair multi-resource allocation algorithm in cloud environments. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Internet of People and Smart City Innovation, pp. 279–286 (2019). https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00091
Hamzeh, H., Meacham, S., Virginas, B., Khan, K., Phalp, K.: MLF-DRS: a multi-level fair resource allocation algorithm in heterogeneous cloud computing systems. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 316–321 (2019). https://doi.org/10.1109/CCOMS.2019.8821774
JSSPP workloads archive (July 2022). https://jsspp.org/workload/
Kane, K., Dillaway, B.: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise. In: Proceedings of the 6th International Workshop on Middleware for Grid Computing. Association for Computing Machinery, Inc., December 2008
Google Scholar
Klusáček, D., Parák, B.: Analysis of mixed workloads from shared cloud infrastructure. In: Klusáček, D., Cirne, W., Desai, N. (eds.) JSSPP 2017. LNCS, vol. 10773, pp. 25–42. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77398-8_2
Chapter Google Scholar
Klusáček, D., Chlumský, V.: Planning and metaheuristic optimization in production job scheduler. In: Desai, N., Cirne, W. (eds.) JSSPP 2015-2016. LNCS, vol. 10353, pp. 198–216. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61756-5_11
Chapter Google Scholar
Le, T.N., Liu, Z.: Flex: closing the gaps between usage and allocation. In: Proceedings of the Eleventh ACM International Conference on Future Energy Systems. e-Energy 2020, pp. 404–405. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3396851.3403514
Ma, K., Wang, K.: Introducing Volcano : a Kubernetes native batch system for high performance workload. In: KubeCon Europe. CNCF (2019)
Google Scholar
Medel, V., Tolón, C., Arronategui, U., Tolosana-Calasanz, R., Bañares, J., Rana, O.: Client-side scheduling based on application characterization on kubernetes, pp. 162–176 (2017). https://doi.org/10.1007/978-3-319-68066-8_13
Morris, A.: Choosing the right scheduler for HPC and AI workloads. https://www.hpcwire.com/solution_content/ibm/cross-industry/choosing-the-right-scheduler-for-hpc-and-ai-workloads/
Randal, A.: The ideal versus the real: revisiting the history of virtual machines and containers. ACM Comput. Surv. 53(1) (2020). https://doi.org/10.1145/3365199
Tsafrir, D.: Using inaccurate estimates accurately. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 208–221. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16505-4_12
Chapter Google Scholar

Download references

Acknowledgments

Access to the CERIT-SC computing and storage facilities provided by the CERIT-SC Center, under the program “Projects of Large Research, Development, and Innovations Infrastructures” (CERIT Scientific Cloud LM2015085), is greatly appreciated. We also acknowledge the support supplied by the project “e-Infrastruktura CZ” (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.

Author information

Authors and Affiliations

Institute of Computer Science, Masaryk University, Brno, Czech Republic
Viktória Spišaková & Lukáš Hejtmánek
CESNET, a.l.e., Prague, Czech Republic
Dalibor Klusáček

Authors

Viktória Spišaková
View author publications
You can also search for this author in PubMed Google Scholar
Dalibor Klusáček
View author publications
You can also search for this author in PubMed Google Scholar
Lukáš Hejtmánek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Viktória Spišaková .

Editor information

Editors and Affiliations

CESNET, Prague, Czech Republic
Dalibor Klusáček
Polytechnic University of Catalonia, Barcelona, Spain
Corbalán Julita
Apple, Cupertino, CA, USA
Gonzalo P. Rodrigo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spišaková, V., Klusáček, D., Hejtmánek, L. (2023). Using Kubernetes in Academic Environment: Problems and Approaches. In: Klusáček, D., Julita, C., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2022. Lecture Notes in Computer Science, vol 13592. Springer, Cham. https://doi.org/10.1007/978-3-031-22698-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-22698-4_12
Published: 12 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22697-7
Online ISBN: 978-3-031-22698-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Kubernetes in Academic Environment: Problems and Approaches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Standard Kubernetes Scheduling Interfaces for Converged Computing

Scientific workflow scheduling algorithms in cloud environments: a comprehensive taxonomy, survey, and future directions

MQFURP: An Overprovision Strategy Supporting Performance Interference Management in Cloud

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Using Kubernetes in Academic Environment: Problems and Approaches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Standard Kubernetes Scheduling Interfaces for Converged Computing

Scientific workflow scheduling algorithms in cloud environments: a comprehensive taxonomy, survey, and future directions

MQFURP: An Overprovision Strategy Supporting Performance Interference Management in Cloud

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation