Name: Statistical Inference for Cataloging the Visible Universe
Start: 2019-01-15T20:30:00
End: 2019-01-15T21:30:00

Submitted by rpc5102 on Tue, 08/06/2019 - 13:52

stat

Statistical Inference for Cataloging the Visible Universe

stat

Statistical Inference for Cataloging the Visible Universe

Presented By

Jeffrey Regier, University of California, Berkeley

Details

Start DateTue, Jan 15, 2019
3:30 PM

End DateTue, Jan 15, 2019
4:30 PM

Location

View larger map

Thomas Bldg

Add to Calendar 2019-01-15T20:30:00 2019-01-15T21:30:00 UTC Statistical Inference for Cataloging the Visible Universe Thomas Bldg

Start DateTue, Jan 15, 2019
3:30 PM

End DateTue, Jan 15, 2019
4:30 PM

Presented By

Jeffrey Regier, University of California, Berkeley

Event Series:

Jeffrey Regier is a postdoctoral researcher in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He completed his Ph.D. in statistics at UC Berkeley in 2016. Previously, he received an MS in computer science from Columbia University (2005) and a BA in computer science from Swarthmore College (2003). His research focuses on statistical inference for large-scale scientific applications, including applications in astronomy and in genomics. He has received the Hyperion HPC Innovation Excellence Award (2017) and the Google Ph.D. Fellowship in Machine Learning (2013).

Abstract: A key task in astronomy is to locate astronomical objects in images and to characterize them according to physical parameters such as brightness, color, and morphology. This task, known as cataloging, is challenging for several reasons: many astronomical objects are much dimmer than the sky background, labeled data is generally unavailable, overlapping astronomical objects must be resolved collectively, and the datasets are enormous -- terabytes now, petabytes soon. Existing approaches to cataloging are largely based on algorithmic software pipelines that lack an explicit inferential basis. In this talk, I present a new approach to cataloging based on inference in a fully specified probabilistic model. I consider two inference procedures: one based on variational inference (VI) and another based on MCMC. A distributed implementation of VI, written in Julia and run on a supercomputer, achieves petascale performance -- a first for any high-productivity programming language. The run is the largest-scale application of Bayesian inference reported to date. In an extension, using new ideas from variational autoencoders and deep learning, I avoid many of the traditional disadvantages of VI relative to MCMC, and improve model fit.