Using Kubernetes in Academic Environment: Problems and Approaches | SpringerLink
Skip to main content

Using Kubernetes in Academic Environment: Problems and Approaches

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 2022)

Abstract

In this work, we discuss our experience when utilizing the Kubernetes orchestrator (K8s) to efficiently allocate resources in a heterogeneous and dynamic academic environment. In the commercial world, the “pay per use” model is a strong regulating factor for efficient resource usage. In the academic environment, resources are usually provided “for free” to the end-users, thus they often lack a clear motivation to plan their use efficiently. In this paper, we show three major sources of inefficiencies. One is the users’ requirement to have interactive computing environments, where the users need resources for their application as soon as possible. Users do not appreciate waiting for interactive environments, but constantly keeping some resources available for interactive tasks is inefficient. The second phenomenon is observable in both interactive and batch workloads; users tend to overestimate necessary limits for their computations, thus wasting resources. Finally, Kubernetes does not support fair-sharing functionality (dynamic user priorities) which hampers the efforts when developing a fair scheme for Pod/job scheduling and/or eviction. We discuss various approaches to deal with these problems such as scavenger jobs, placeholder jobs, Kubernetes-specific resource allocation policies, separate clusters, priority classes, and novel hybrid cloud approach. We also show that all these proposals open interesting scheduling-related questions that are hard to answer with existing Kubernetes tools and policies. Last but not least, we provide a real workload trace from our installation to the scheduling community which captures these phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 7435
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 9294
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://kubernetes.io.

  2. 2.

    https://kubernetes.io/blog/2021/04/19/introducing-indexed-jobs/.

  3. 3.

    Pods are the smallest deployable units of computing that you can create and manage in Kubernetes.

  4. 4.

    https://slurm.schedmd.com/documentation.html.

  5. 5.

    https://www.openpbs.org.

  6. 6.

    https://slurm.schedmd.com/containers.html.

  7. 7.

    https://openondemand.org.

  8. 8.

    In our system, HPC workloads typically utilize more than 80% of requested CPU resources.

  9. 9.

    https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/.

  10. 10.

    https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/.

  11. 11.

    https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/.

  12. 12.

    https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/.

  13. 13.

    https://github.com/kubernetes/kubernetes/pull/102884.

  14. 14.

    https://www.redhat.com/en/topics/cloud-computing/what-is-hybrid-cloud.

  15. 15.

    https://github.com/kubecost/cost-model.

  16. 16.

    https://aws.amazon.com.

  17. 17.

    https://aws.amazon.com/ec2/spot/.

  18. 18.

    https://github.com.

References

  1. CERIT Scientific Cloud, July 2022. http://www.cerit-sc.cz

  2. Chen, J., Cao, C., Zhang, Y., Ma, X., Zhou, H., Yang, C.: Improving cluster resource efficiency with oversubscription. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), vol. 01, pp. 144–153 (2018). https://doi.org/10.1109/COMPSAC.2018.00027

  3. Farias, G., da Silva, V.B., Brasileiro, F., Lopes, R., Turull, D.: Availability-driven scheduling in kubernetes

    Google Scholar 

  4. Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2011) (2011)

    Google Scholar 

  5. Hamzeh, H., Meacham, S., Khan, K.: A new approach to calculate resource limits with fairness in kubernetes. In: 2019 First International Conference on Digital Data Processing (DDP), pp. 51–58 (2019). https://doi.org/10.1109/DDP.2019.00020

  6. Hamzeh, H., Meacham, S., Khan, K., Phalp, K., Stefanidis, A.: FFMRA: a fully fair multi-resource allocation algorithm in cloud environments. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Internet of People and Smart City Innovation, pp. 279–286 (2019). https://doi.org/10.1109/SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00091

  7. Hamzeh, H., Meacham, S., Virginas, B., Khan, K., Phalp, K.: MLF-DRS: a multi-level fair resource allocation algorithm in heterogeneous cloud computing systems. In: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 316–321 (2019). https://doi.org/10.1109/CCOMS.2019.8821774

  8. JSSPP workloads archive (July 2022). https://jsspp.org/workload/

  9. Kane, K., Dillaway, B.: Cyclotron: a secure, isolated, virtual cycle-scavenging grid in the enterprise. In: Proceedings of the 6th International Workshop on Middleware for Grid Computing. Association for Computing Machinery, Inc., December 2008

    Google Scholar 

  10. Klusáček, D., Parák, B.: Analysis of mixed workloads from shared cloud infrastructure. In: Klusáček, D., Cirne, W., Desai, N. (eds.) JSSPP 2017. LNCS, vol. 10773, pp. 25–42. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77398-8_2

    Chapter  Google Scholar 

  11. Klusáček, D., Chlumský, V.: Planning and metaheuristic optimization in production job scheduler. In: Desai, N., Cirne, W. (eds.) JSSPP 2015-2016. LNCS, vol. 10353, pp. 198–216. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61756-5_11

    Chapter  Google Scholar 

  12. Le, T.N., Liu, Z.: Flex: closing the gaps between usage and allocation. In: Proceedings of the Eleventh ACM International Conference on Future Energy Systems. e-Energy 2020, pp. 404–405. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3396851.3403514

  13. Ma, K., Wang, K.: Introducing Volcano : a Kubernetes native batch system for high performance workload. In: KubeCon Europe. CNCF (2019)

    Google Scholar 

  14. Medel, V., Tolón, C., Arronategui, U., Tolosana-Calasanz, R., Bañares, J., Rana, O.: Client-side scheduling based on application characterization on kubernetes, pp. 162–176 (2017). https://doi.org/10.1007/978-3-319-68066-8_13

  15. Morris, A.: Choosing the right scheduler for HPC and AI workloads. https://www.hpcwire.com/solution_content/ibm/cross-industry/choosing-the-right-scheduler-for-hpc-and-ai-workloads/

  16. Randal, A.: The ideal versus the real: revisiting the history of virtual machines and containers. ACM Comput. Surv. 53(1) (2020). https://doi.org/10.1145/3365199

  17. Tsafrir, D.: Using inaccurate estimates accurately. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 208–221. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16505-4_12

    Chapter  Google Scholar 

Download references

Acknowledgments

Access to the CERIT-SC computing and storage facilities provided by the CERIT-SC Center, under the program “Projects of Large Research, Development, and Innovations Infrastructures” (CERIT Scientific Cloud LM2015085), is greatly appreciated. We also acknowledge the support supplied by the project “e-Infrastruktura CZ” (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viktória Spišaková .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Spišaková, V., Klusáček, D., Hejtmánek, L. (2023). Using Kubernetes in Academic Environment: Problems and Approaches. In: Klusáček, D., Julita, C., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2022. Lecture Notes in Computer Science, vol 13592. Springer, Cham. https://doi.org/10.1007/978-3-031-22698-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22698-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22697-7

  • Online ISBN: 978-3-031-22698-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics