May 21 – 24, 2018
Fluno Center on the University of Wisconsin-Madison Campus
America/Chicago timezone

Migrate and run HTCondor job to SLURM cluster via container

May 23, 2018, 10:40 AM
20m
Howard Auditorium (Fluno Center on the University of Wisconsin-Madison Campus)

Howard Auditorium

Fluno Center on the University of Wisconsin-Madison Campus

601 University Avenue, Madison, WI 53715-1035

Speaker

Jingyan Shi (Institute of High Energy Physics (IHEP) of the Chinese Academy of Sciences (CAS))

Summary (2-4 sentences)<br>Just a few informal sentences describing what you want to present.<br>No need to spend a lot of time on this! You can change it later.

The computing center of the institute of High Energy Physics maintains a HTC cluster managed by HTCondor and a HPC cluster managed by SLURM. Compared with the SLURM cluster, the HTCondor cluster is much busier and a lot of jobs have to wait at HTCondor queue at the most of time. We designed and developed a tool to migrate and run HTCondor job at SLURM free job slots. Firstly, it matches between the free SLURM job slots and HTCondor queuing jobs. Secondly, it submits dedicated SLURM job which would run "startd" via a container. In this way, the SLURM job slots can be added as the HTCondor resources. Thirdly, the HTCondor schedules the jobs to the jobs slots running at SLURM work node. If new SLURM jobs is coming, the above HTCondor jobs will be deleted and re-queued to HTCondor cluster. We provide user an job submit option to choose whether the submitted job can be scheduled to SLURM cluster.

Primary author

Jingyan Shi (Institute of High Energy Physics (IHEP) of the Chinese Academy of Sciences (CAS))

Presentation materials