Jun 9 – 12, 2026
Fluno Center on the University of Wisconsin-Madison Campus
America/Chicago timezone

Analyzing and Optimizing Machine Timeout Duration Using AI (Remote Presentation)

Jun 12, 2026, 11:00 AM
20m
Howard Auditorium (Fluno Center on the University of Wisconsin-Madison Campus)

Howard Auditorium

Fluno Center on the University of Wisconsin-Madison Campus

601 University Avenue, Madison, WI 53715-1035

Speaker

Khyathi Vagolu (UW-Madison)

Description

When network disruptions or worker node failures occur, HTCondor relies on a static lease timeout, traditionally 40 minutes, before abandoning a job. This static window creates a costly trade-off: waiting too long causes massive machine idle time on unrecoverable failures, while cutting it too short prematurely kills jobs that could have successfully reconnected. Can we use AI to solve this? This talk explores how we can dynamically predict an optimal timeout duration by training a simple ML model!

Presentation materials

There are no materials yet.