May 24 – 27, 2021
Virtual
America/Chicago timezone

Improving Kubernetes support for batch scheduling of high throughput and parallel jobs

May 27, 2021, 11:45 AM
20m
Virtual

Virtual

Speaker

Abdullah Gharaibeh (Google Cloud)

Description

Kubernetes is an open source cluster orchestration system whose popularity stems in part because it acts as a standard resource management interface across cloud providers and on-premises data centers. There is significant interest in managing HTCondor services and scheduling user jobs in Kubernetes clusters. These solutions often rely on running standard HTCondor daemons inside a container or developing custom Kubernetes operators to bridge the two services. Originally designed by Google, it remains a major contributor to Kubernetes which is now governed by the Cloud Native Computing Foundation. We will describe recent (1.21) and planned (1.22+) contributions to improve direct support for batch scheduling of high throughput and parallel jobs as well as developments in our Google Kubernetes Engine product, which offers Kubernetes clusters with reduced management overhead.

Primary authors

Maciek Różacki (Google Cloud) Abdullah Gharaibeh (Google Cloud)

Presentation materials