Jul 10 – 14, 2023
Fluno Center on the University of Wisconsin-Madison Campus
America/Chicago timezone

Questions about attending, speaking, accommodations, and other concerns

Developing Machine Learning for Metadata Standardization with HTCondor and Snakemake

Not scheduled
20m
Howard Auditorium (Fluno Center on the University of Wisconsin-Madison Campus)

Howard Auditorium

Fluno Center on the University of Wisconsin-Madison Campus

601 University Avenue, Madison, WI 53715-1035

Speaker

Yuriy Sverchkov (University of Wisconsin - Madison)

Availability of the Speaker

Cannot present July 10

Summary (2-4 sentences)

The problem: It is often challenging to query, filter, and analyze large biological sample repositories such as the SRA because the associated metadata is not standardized, consisting of free-text key-value pairs.

The approach: We developed machine learning models for mapping the free-text metadata to standardized ontology terms annotated with their relationships to the samples. To do this, we built a computational pipeline for training and evaluating our models.

The technology: Our pipeline is a YAML-configurable Snakemake workflow for dispatching HTCondor jobs that preprocess data, extract features, train ML models, evaluate them, and generate performance reports.

Primary authors

Prof. Colin Dewey (University of Wisconsin - Madison) Prof. Mark Craven (University of Wisconsin - Madison) Yuriy Sverchkov (University of Wisconsin - Madison)

Presentation materials

There are no materials yet.