Jun 9 – 12, 2026
Fluno Center on the University of Wisconsin-Madison Campus
America/Chicago timezone

Enabling Large-Scale and Ultra-Large AlphaFold3 Predictions with Distributed High-Throughput Computing

Jun 12, 2026, 10:10 AM
20m
Howard Auditorium (Fluno Center on the University of Wisconsin-Madison Campus)

Howard Auditorium

Fluno Center on the University of Wisconsin-Madison Campus

601 University Avenue, Madison, WI 53715-1035

Speaker

Daniel (Danny) Morales (UW-Center for High Throughput Computing)

Description

AlphaFold 3 (AF3) enables atomic-resolution prediction of biomolecular complexes, driving rapidly growing demand across the life sciences. However, its ~750 GB reference database has effectively confined production deployments to systems with shared parallel filesystems, creating a major barrier for scalability. Distributed high-throughput computing (dHTC) platforms offer vast, heterogeneous compute capacity but fundamentally lack the shared data infrastructure assumed by AF3. We present a data-aware deployment of AF3 for dHTC, implemented on the Center for High Throughput Computing (CHTC) and the Open Science Pool (OSPool). The workflow is decomposed into a CPU-bound data pipeline that executes on nodes with locally staged, scheduler-advertised databases, and a GPU-bound inference pipeline that opportunistically scales across distributed resources. Using CUDA Unified Virtual Memory (UVM), we extend inference beyond physical GPU limits, enabling predictions of ultra-large complexes that exceed device VRAM. By elevating dataset locality to a schedulable resource via HTCondor ClassAds, we eliminate prohibitive per-job data transfers and enable efficient, federated execution. Beyond scaling throughput, we demonstrate that dHTC can support previously infeasible workloads. Together, these results establish dHTC as a viable—and in some regimes superior—execution model for data-intensive structural biology workflows and provide a general blueprint for deploying large, data-intensive applications on distributed cyberinfrastructure.

Presentation materials

There are no materials yet.