Jun 9 – 12, 2026
Fluno Center on the University of Wisconsin-Madison Campus
America/Chicago timezone

From Per-Job Data to an Aggregated Workload Insight: A Toolkit for Profiling HTCondor Workload

Jun 11, 2026, 10:10 AM
20m
Howard Auditorium (Fluno Center on the University of Wisconsin-Madison Campus)

Howard Auditorium

Fluno Center on the University of Wisconsin-Madison Campus

601 University Avenue, Madison, WI 53715-1035

Speaker

Kashika Mahajan (UW-Center for High Throughput Computing)

Description

Researchers using HTCondor for high-throughput computing routinely submit groups of related jobs, known as Clusters, ranging from hundreds to tens of thousands of jobs each. Current tools report per-job data, making it difficult to diagnose Cluster-wide issues such as jobs stuck on hold, poor resource utilization, or unexpected failures. We present a Python toolkit, to be included as a part of the HTCondor suite, that bridges this gap. Given a single Cluster ID, the toolkit evaluates job status distribution, runtime patterns, hold reasons, and resource utilization, then synthesizes these into a colour-coded health summary that tells researchers and facilitators whether their Cluster is running well, and if not, why. Each analysis transforms thousands of individual job records into a concise, actionable report.

Presentation materials

There are no materials yet.