A Leader in Astrostatistics
In the 1980s Jogesh Babu in the Department of Statistics and Eric Feigelson in the Department of Astronomy began collaborating on bringing cutting-edge statistical methods to answer important questions in astronomy. While there had been a few other collaborations between statisticians and astronomers, notably Neyman and Scott in the 1950s, the Babu-Feigelson collaboration is among the earliest and most sustained since the revolution in statistics brought about by computer-intensive methods like the bootstrap.
The Babu-Feigelson collaboration led to a 1996 cross-disciplinary monograph that gave rise to the name Astrostatistics. This collaboration led to the establishment of cross-disciplinary Center for Astrostatistics (CASt) in 2003.
Over the years the number of astrostatistics faculty has grown. We now have several faculty in both departments who are actively engaged in various ways with this fast-growing interdisciplinary research area: Hyungsuk Tak (Statistics), Eric Ford (Astronomy), Joel Leja (Astronomy), Derek Fox (Astronomy), David Hunter (Statistics), Donghui Jeong (Astronomy), and Rebekah Dawson (Astronomy). The very popular Penn State Astrostatistics Summer School has, over close to two decades, educated hundreds of astronomers on statistical methods. Penn State has also been the host to numerous summer astrostatistics workshops and conferences, bringing together leading researchers from around the world.
Faculty and Student Research Collaborations
Identifying Galaxies with Unique Data Analysis
Assistant Professor Hyungsuk Tak's collaboration with graduate student Sarah Shy focuses on developing a new data analytic tool to quantify classification uncertainties with statistical and machine learning methods, such as random forests and support vector machines, in the unique context of astronomical data. They have successfully applied their method to identify a specific type of galaxy (high redshift quasars) from a large-scale data set with millions of astronomical objects. The challenge of this problem lies in the fact that the targeted objects are presumed to be dimmer (more uncertain) than other objects.
Their goal is to scale up the method so that it can be applied to even larger rectangular data sets with billions of lines (astronomical objects) with hundreds of columns (properties) in the near future because the Rubin Observatory Legacy Survey of Space and Time (known as LSST) will start monitoring the entire sky in a few years, producing terabytes of data per day.