May 21 – 24, 2018
Fluno Center on the University of Wisconsin-Madison Campus
America/Chicago timezone

Managing Caffe Machine Learning Jobs with HTCondor

May 23, 2018, 2:05 PM
Howard Auditorium (Fluno Center on the University of Wisconsin-Madison Campus)

Howard Auditorium

Fluno Center on the University of Wisconsin-Madison Campus

601 University Avenue, Madison, WI 53715-1035


Michael Pelletier (Raytheon)

Availability of the Speaker<br>Let us know if there are times you CANNOT present,<br>prehaps because you need to leave for the airport early, etc.


Summary (2-4 sentences)<br>Just a few informal sentences describing what you want to present.<br>No need to spend a lot of time on this! You can change it later.

Caffe, a popular machine learning framework, has a number of characteristics - such as GPU acceleration and checkpoint/resume capability - which can make managing multiple users and runs on shared resources challenging for end users, and frustrating for IT staff seeking the highest possible utilization of extremely costly hardware

Using HTCondor, these challenges can be overcome by using a variety of powerful submit description features and capabilities tailored for Caffe's design, thereby simplifying the user experience while also maximizing utilization.

Primary author

Michael Pelletier (Raytheon)

Presentation materials