-
Dr Alexei Klimentov (Brookhaven National Lab), Shawn McKee (University of Michigan)6/9/26, 9:00 AM
Welcome and quick preview of the next 2 days.
Go to contribution page -
Dr Patrick McDaniel (UW-Madison)6/9/26, 9:00 AM
-
Ivan Glushkov (BNL)6/9/26, 9:05 AM
What should be we planning on doing for infrastructure updates?
Go to contribution page
- ARM compute servers?
- GPU servers?
- Storage servers, specifically what are the needed charactistics?
- Netwoking: how do we get to 400 Gpbs links by HL-LHC startup?
- Infrastructure: power, management, resiliency, capacity; what do we need?
- Services: what middleware and are changes needed in services we deploy?
... -
Dr Miron Livny (UW-Center for High Throughput Computing)6/9/26, 9:05 AM
-
Frederick Luehring (Indiana University), Kaushik De (University of Texas at Arlington)6/9/26, 9:30 AM
We need to discuss facility priorities during the upcoming LS3.
- US ATLAS pledge resources
- More important: infrastructure upgrades
- Common tools for US sites (perhaps new AI tools)
We also need to explore our current cost environment and its implications for our facilities. With no additional "bump" of funding foreseen before HL-LHC, how can we optimize our resource...
Go to contribution page -
Jason Simms (Swarthmore College)6/9/26, 9:35 AM
-
6/9/26, 10:00 AM
Opportunity to finish any discussion items from this session.
Go to contribution page -
Ewa Deelman (USC Information Sciences Institute)6/9/26, 10:05 AM
-
Dr Viviana Cavaliere (BNL)6/9/26, 11:00 AM
Present the results from the recent AF survey and discuss
Go to contribution page -
Ronda Swanson (University of Nebraska-Lincoln)6/9/26, 11:00 AM
-
Saloni Bhogale (University of Wisconsin-Madison)6/9/26, 11:10 AM
-
Dr Giordon Stark (UCSC)6/9/26, 11:25 AM
-
Christina Koch (UW Madison)6/9/26, 11:45 AM
-
Ilija Vukotic (University of Chicago)6/9/26, 11:50 AM
Can we as a facility define use-cases and challenges that we need to meet with AI/LLM?
What are the options currently available in this space?
Can we create an initiative in this area for the facility to pursue in the next year? (Has scrubbing implications)
Go to contribution page -
Daniel (Danny) Morales (UW-Center for High Throughput Computing)6/9/26, 12:10 PM
AlphaFold 3 (AF3) enables atomic-resolution prediction of biomolecular complexes, driving rapidly growing demand across the life sciences. However, its ~750 GB reference database has effectively confined production deployments to systems with shared parallel filesystems, creating a major barrier for scalability. Distributed high-throughput computing (dHTC) platforms offer vast, heterogeneous...
Go to contribution page -
Louis Pelosi (BNL)6/9/26, 12:15 PM
-
Brian Bockelman (Morgridge Institute for Research)6/9/26, 1:30 PM
-
Dr Michael Coughlin (University of Minnesota)6/9/26, 1:35 PM
With the detection of compact binary coalescences and their electromagnetic counterparts by gravitational-wave detectors, a new era of multi-messenger astronomy has begun. In this talk, I will describe how data science is enabling the gravitational-wave community to make very low-latency detection and parameter estimation possible within the alert system. I will then discuss how current ground...
Go to contribution page -
Oszkar Tarjan (BNL)6/9/26, 3:05 PM
What are the technologies we use to interconnect our services and data?
Go to contribution page
Are there technologies or services our sites should deploy broadly?
What are the available storage components for USATLAS, both in-use and available? Pros/Cons? -
Kendall Ackley (University of Warwick/Gravitational-wave Optical Transient Observer)6/9/26, 3:05 PM
-
Shreya Anand (Standford University)6/9/26, 3:30 PM
-
Ofer Rind (BNL)6/9/26, 3:30 PM
We need to discuss our plans for storage supporting the needs of our USATLAS users.
Go to contribution page -
Ved Shah (Northwestern University)6/9/26, 3:50 PM
The Rubin Observatory is detecting an unprecedented number of transients each night, exceeding the classification capacity of all existing spectroscopic resources my several orders of magnitude. As a result, robust photometric classifications will be essential both for assembling complete samples of different transient subtypes and for identifying events that warrant spectroscopic follow-up....
Go to contribution page -
Thomas Smith (BNL)6/9/26, 3:55 PM
-
Ofer Rind (BNL)6/9/26, 4:10 PM
We need to discuss how we operate USATLAS with a focus on current and possible monitoring components.
Go to contribution page -
Gautham Narayan (University of Illinois Urbana-Champaign)6/9/26, 4:15 PM
Real-time multi-messenger astrophysics is as much a computing problem as a telescope problem. The same event can trigger gravitational-wave, neutrino, radio, and optical alerts within seconds, but the science only happens if the infrastructure can carry, filter, and join those streams fast enough to act on them. I work on three of these systems: ANTARES, which ingests and classifies Rubin's...
Go to contribution page -
6/9/26, 4:30 PM
Open discussion on session items.
Need to gather scrubbing related items by WBS area.
How will we operate during LS3?
Go to contribution page
How will we meet ATLAS needs for HL-LHC, given our funding and cost environment? -
Nathan Walker (Oregon State University)6/9/26, 4:40 PM
I will talk about difficulties associated with free parameters in modeling GRB emission. Without a way to constrain these parameters, inferring too much information from models can be problematic. Recent advances in the physics that can launch GRB jets, however, have the potential to enable a deep understanding of GRBs and their progenitors. My talk will address how degeneracies arise in...
Go to contribution page -
Shawn McKee (University of Michigan)6/10/26, 9:00 AM
-
Colin Slater6/10/26, 9:00 AM
-
Tomas Cabrera (Carnegie Mellon University)6/10/26, 9:25 AM
Gravitational wave follow-up requires iterative observation, analysis, and decision-making on the order of hours in the most extreme of cases. To this extent, it is a subfield where throughput is defined not only by the amount of pertinent data, but rather the speed at which the information in the data can be synthesized into a human-digestible format. In this talk I will present the...
Go to contribution page -
Eduardo Bach (University of Massachusetts - Amherst)6/10/26, 9:30 AM
-
Aidan Rosberg (Indiana University)6/10/26, 9:45 AM
Aidan will present on RP1 and Open Data. We can use this as a springboard to discuss needs for AFs during HL-LHC.
Go to contribution page -
Alessandra Corsi (John Hopkins University)6/10/26, 9:50 AM
In this talk, I will discuss current and future opportunities in the radio follow-up of gravitational-wave (GW) events using interferometric, PI-driven facilities such as the Karl G. Jansky Very Large Array (VLA) and the future next-generation VLA (ngVLA). I will highlight lessons learned from recent GW observing runs, and strategies for identifying and characterizing relativistic ejecta and...
Go to contribution page -
6/10/26, 10:05 AM
We need to discuss what items we want to cover during the WLCG-HSF meeting in November. What are the important (to USTATLAS) topics and what our our goals for the meeting?
We also need to wrap up items from this session.
Go to contribution page -
Mark Lacy6/10/26, 10:10 AM
The VLA Sky Survey images the whole sky at 2-4GHz in the radio with angular resolution comparable to optical telescopes, providing a unique reference survey for the radio sky. Multi-epoch data allows variable sources and transients to be found and characterized and spectral and polarimetric information is also recorded. Processing the large amount of data from this survey is a significant...
Go to contribution page -
Tim Cartwright (University of Wisconsin–Madison, OSG)6/10/26, 11:00 AM
-
Kaushik De (University of Texas at Arlington), Ofer Rind (BNL)6/10/26, 11:00 AM
What are the activities we have been focused on recently for WBS 2.3 outside of the T1/T2s?
Go to contribution page -
David Schultz (University of Wisconsin-Madison)6/10/26, 11:00 AM
If the built-in HTCondor OAuth2 doesn't meet your needs, you can always choose to do your own thing. In this talk, we show how and why IceCube wrote a custom token storer and CredMon to handle OAuth2 token creation and refresh. While custom code allows exactly matching the IceCube workflow, the main defining feature is not having to ask command line users to do an additional web login. Along...
Go to contribution page -
Dr Pierrette Dagg (MERIT)6/10/26, 11:10 AM
-
William Leight (University of Massachusetts), philippe Laurens (Michigan State University), wenjing wu (University of Michigan)6/10/26, 11:25 AM
One or two slides each, 5 minutes
Your thoughts on "What we could do better for WBS 2.3?"
Zachary Booth
Go to contribution page
Judith Stephens
Wenjing Wu
Andrey Zarochentsev
Philippe Laurens
Will Leight -
Greg Daues (NCSA)6/10/26, 11:25 AM
Rubin Observatory will begin conducting the Legacy Survey of Space and Time (LSST), a 10 year survey of the Southern sky. This presentation will briefly cover how the project successfully uses HTCondor based workflows for several flavors of processing, including preparations for the project's Data Release Production at the United States Data Facility.
Go to contribution page
The project's middleware that interfaces... -
Preston Smith (Purdue University)6/10/26, 11:35 AM
As AI demands boom, and the market forces we all face as a side effect continue to impact campus buying power, it is more important than ever to understand how to capacity plan for your campus, and how that capacity planning fits into the larget picture of your state, region, and the nation. This presentation will share research in capacity planning for campuses, and explore how shared...
Go to contribution page -
Seung-Jin Sul (Staff Software Engineer, U.S. Department of Energy (DOE) Joint Genome Institute (JGI), Lawrence Berkeley National Laboratory)6/10/26, 11:50 AM
Authors
Seung-Jin Sul*, Mario Melara, Ramani Kothadia, Ludovico Bianchi, Joshua Boverhof, Nick Tyler, Daniela Cassol, Mike Sneddon, Setareh Sarrafan, Kjiersten Fagnan
Lawrence Berkeley National Laboratory, Berkeley, CA, USAAbstract
Scientific workflows at the DOE Joint Genome Institute (JGI) increasingly need to run across multiple high-performance computing facilities, but...
Go to contribution page -
Stephen Deems6/10/26, 12:00 PM
-
Shawn McKee (University of Michigan)6/10/26, 12:05 PM
-
Marco Mascheroni (UCSD)6/10/26, 12:10 PM
The CMS Submission Infrastructure (SI), based on HTCondor and GlideinWMS, operates a federated pool of resources, integrated from WLCG, HPC and cloud providers, supporting CMS offline computing needs. As CMS prepares for the High-Luminosity LHC (HL-LHC) phase, the infrastructure must continue evolving to support increasing resource scale and heterogeinity, while maintaining the stability,...
Go to contribution page -
Justin Hiemstra (Morgridge Institute for Research)6/10/26, 1:30 PM
-
6/10/26, 1:30 PM
We need to summarize what we feel are the key decisions from our 2 day meeting.
Go to contribution page -
6/10/26, 1:50 PM
-
Brian Aydemir (Morgridge Institute for Research, UW–Madison)6/10/26, 1:55 PM
-
6/10/26, 2:10 PM
We need to specify next steps.
Some of our work will provide guidance and input for scrubbing.
What follow-on meetings are needed? Can we handle action items and next steps with our regularly scheduled facilities meetings or do we need more?
Go to contribution page -
Fabio Andrijauskas6/10/26, 2:15 PM
The Open Science Data Federation (OSDF) and the National Research Platform (NRP) represent a transformative shift toward a democratized, high-performance computational ecosystem designed to accelerate global scientific discovery. By integrating the OSDF’s robust data distribution capabilities with the NRP’s distributed, Kubernetes-based "Nautilus" infrastructure, researchers can seamlessly...
Go to contribution page -
Brian Bockelman (Morgridge Institute for Research)6/10/26, 3:00 PM
-
Nicholas Minor (Dave O'Connor's Laboratory/UW-Madison)6/10/26, 3:25 PM
-
Ian Ross (U. Wisconsin)6/10/26, 3:50 PM
-
Dr Ewa Deelman (USC Information Sciences Institute)6/10/26, 4:10 PM
-
Steven Timm (Fermilab (DUNE))6/10/26, 4:35 PM
DUNE faces unique challenges for data management and movement due to its very large event record size, needs for immediate prompt processing, and requirements for heterogeneous computing across an internationally distributed compute and storage system. We describe DUNE's current use of high throughput and high performance computing, including the growing AI/ML component, and the challenges we...
Go to contribution page -
Yuanyuan Zhang (NSF NOIRLab)6/11/26, 9:00 AM
I will describe how OSG has enabled us to develop a simulation-based inference (SBI) method for cosmology analysis. For this application, we train an SBI model, based on a mixture density network (MDN), to derive posteriors for cosmological parameters from a data vector that describe observations of galaxy clusters in the universe. We use analytic models to generate mocks of the observational...
Go to contribution page -
Mark Krenz (Indiana University)6/11/26, 9:25 AM
This will be a high level overview of the security team's activities on OSG, what we're responsible for, and where we're going. It will provide summary metrics of what we've done, a short overview of the security exercise we've performed for US CMS and US ATLAS, and talk about us preparing for the coming wave of super AI that will discover all the zero days all at once.
Go to contribution page -
Peter Pizzimenti6/11/26, 9:50 AM
OrangeGrid Cloud Workers is the infrastructure that lets Syracuse University's on premises HTCondor cluster overflow into public cloud spot instances when campus capacity runs out. A researcher submits a job the same way they always have,
Go to contribution pagecondor_submit my.subfrom a campus submit host, and if the job opts in with+WantCloudBurst = "...", it can land on an ephemeral VM running in any public... -
Lily Sheram (Georgia Institute of Technology)6/11/26, 10:15 AM
The Trinity Demonstrator is a one square meter imaging atmospheric Cherenkov telescope intended to validate the proposed technique for the Trinity Neutrino Observatory and search for high-energy neutrinos. Weakly interacting and chargeless, these elusive particles deliver data directly from their extragalactic sources to Earth. Trinity telescopes use Earth-Skimming, a detection method where a...
Go to contribution page -
Kashika Mahajan (UW-Center for High Throughput Computing)6/11/26, 11:00 AM
Researchers using HTCondor for high-throughput computing routinely submit groups of related jobs, known as Clusters, ranging from hundreds to tens of thousands of jobs each. Current tools report per-job data, making it difficult to diagnose Cluster-wide issues such as jobs stuck on hold, poor resource utilization, or unexpected failures. We present a Python toolkit, to be included as a part of...
Go to contribution page -
Cole Bollig (UW-Center for High Throughput Computing)6/11/26, 11:25 AM
Connecting High Throughput batch resources to Open OnDemand for access via a web browser anywhere/anytime.
Go to contribution page -
Todd Miller (CHTC)6/11/26, 11:50 AM
-
Matyas Selmeci (UW-Madison CHTC)6/11/26, 12:15 PM
-
Marco Mambelli (Fermilab), Namratha Urs (Fermi National Accelerator Laboratory)6/11/26, 1:30 PM
GlideinWMS is a workload management and resource provisioning system widely used for distributed scientific computing. Recent development efforts have focused on simplifying token-based authentication which is finally starting to be adopted in production and improving support for High Performance Computing (HPC) resources. This presentation will highlight recent progress in these areas and...
Go to contribution page -
Benjamin FitzGerald (University of Wisconsin-Madison)6/11/26, 1:55 PM
I use CHTC and OSPool to run a rainfall frequency analysis pipeline across 200+ watersheds as part of FEMA's National Flood Insurance Program work. DAGMan was essential for automating job submission and output processing at this scale, but managing pipeline failures proved challenging -- specifically, ensuring that every daily precipitation file was properly analyzed before downstream steps...
Go to contribution page -
Douglas Thain (University of Notre Dame)6/11/26, 2:15 PM
A number of modern programming frameworks encourage end users to write concurrent functional programs that are expanded into task graphs, and then executed using local parallelism. While providing an elegant user experience, these systems struggle when presented with large programs that generate million-node graphs and must run on heterogeneous systems. We demonstrate a new framework,...
Go to contribution page -
Todd Tannenbaum (University of Wisconsin)6/11/26, 3:30 PM
HTCondor discussion of what's new and what coming up.
Go to contribution page -
Greg Thain (Center for High Throughput Computing), Todd Tannenbaum (University of Wisconsin)6/11/26, 4:15 PM
-
Jingyan Shi (INSTITUTE OF HIGH ENERGY PHYSICS, Chinese Academy of Science)6/12/26, 9:00 AM
The INK has been deployed at IHEP as a unified scientist workbench, providing HTCondor-driven job submission, interactive analysis environments, and cross-cluster resource access under a single entry point. Since its release, user feedback has been quite positive.
Go to contribution page
This talk presents recent main progress and optimizations: (1) per-experiment customization to meet diverse collaboration needs;... -
Ken Judd (Stanford University)6/12/26, 9:20 AM
HTCondor was used to develop DPSOL, a framework for solving dynamic programming problems. It was the foundation for DSICE which merged dynamic and stochastic factors in economics and the climate, and analyzed the social cost of carbon, the cost of a two-degree target and the value of carbon capture and sequestration.
Go to contribution page -
Hannah Wayment-Steele (UW-Madison)6/12/26, 9:45 AM
-
Ian Ross (U. Wisconsin)6/12/26, 10:10 AM
-
Khyathi Vagolu (UW-Madison)6/12/26, 11:00 AM
When network disruptions or worker node failures occur, HTCondor relies on a static lease timeout, traditionally 40 minutes, before abandoning a job. This static window creates a costly trade-off: waiting too long causes massive machine idle time on unrecoverable failures, while cutting it too short prematurely kills jobs that could have successfully reconnected. Can we use AI to solve this?...
Go to contribution page -
Ilija Vukotic (University of Chicago)6/12/26, 11:25 AM
-
Ron Tapia (Penn State University)6/12/26, 11:50 AM
Discussion of metrics that Condor can provide about the performance of external services. Condor has a unique view of the performance of the services that it uses on behalf of jobs. Examples of external services include file transfer plugins and credmons. Individual failures are not very interesting to cluster administrators, but widespread failures affecting many jobs are. What sort of...
Go to contribution page -
Frederick Luehring (Indiana University)
We need to explore our current cost environment and its implications for our facilities. With no additional "bump" of funding foreseen before HL-LHC, how can we optimize our resource expenditure to best meet (US)ATLAS needs between now and HL-LHC startup?
Go to contribution page
Choose timezone
Your profile timezone: