Throughput Computing 2025

Name: Throughput Computing 2025
Start: 2025-06-02T06:30:00-05:00
End: 2025-06-06T23:15:00-05:00
Location: Fluno Center on the University of Wisconsin-Madison Campus

Jun 2 – 6, 2025

Fluno Center on the University of Wisconsin-Madison Campus

America/Chicago timezone

Questions about attending, speaking, accommodations, and other concerns

htc@path-cc.io

Operating a Federated HTCondor Infrastructure: Monitoring and Management for CMS Computing

Jun 5, 2025, 4:35 PM

20m

Howard Auditorium (Fluno Center on the University of Wisconsin-Madison Campus)

Howard Auditorium

Fluno Center on the University of Wisconsin-Madison Campus

601 University Avenue, Madison, WI 53715-1035

Tools for Production Pools

Bruno Coimbra (Fermilab)

The Compact Muon Solenoid (CMS) experiment at CERN generates and processes vast volumes of data requiring significant computing capacity. To meet these demands, CMS has adopted a federated throughput computing model distributed across a global infrastructure based on HTCondor, the CMS Submission Infrastructure. Seamless integration of heterogeneous resources from multiple sites allows for operating a unified, virtualized pool. This infrastructure currently provides access to over 500,000 CPU cores, enabling CMS to efficiently execute a wide variety of data processing and simulation workloads.

This federation, however, comes with substantial operational challenges, notably, the need for robust and scalable monitoring. To ensure reliability, performance, and rapid diagnosis of issues, we have developed a comprehensive monitoring ecosystem that spans job execution, resource availability, and system health across the entire pool. This talk will present the architecture of the CMS federated compute infrastructure, detail the role of HTCondor in enabling global workload distribution, and highlight recent developments in monitoring that are critical to operating such a large-scale system effectively.

Bruno Coimbra (Fermilab) Marco Mascheroni (UCSD)

Antonio Pérez-Calero Yzquierdo (CMS) Hyunwoo Kim (Fermilab) Mr Ralf Von Cube Ms Vaiva Zokaite

Operating a Federated HTCondor Infrastructure_ Monitoring and Management for CMS Computing -1.pdf

Throughput Computing 2025

Questions about attending, speaking, accommodations, and other concerns

Operating a Federated HTCondor Infrastructure: Monitoring and Management for CMS Computing

Howard Auditorium

Fluno Center on the University of Wisconsin-Madison Campus

Speaker

Description

Primary authors

Co-authors

Presentation materials

Choose timezone

Throughput Computing 2025

Questions about attending, speaking, accommodations, and other concerns

Speaker

Description

Primary authors

Co-authors

Presentation materials