Skip to main content


graphic image

A Leader in Astrostatistics

In the 1980s Jogesh Babu in the Department of Statistics and Eric Feigelson in the Department of Astronomy began collaborating on bringing cutting-edge statistical methods to answer important questions in astronomy. While there had been a few other collaborations between statisticians and astronomers, notably Neyman and Scott in the 1950s, the Babu-Feigelson collaboration is among the earliest and most sustained since the revolution in statistics brought about by computer-intensive methods like the bootstrap.

The Babu-Feigelson collaboration led to a 1996 cross-disciplinary monograph that gave rise to the name Astrostatistics. This collaboration led to the establishment of  cross-disciplinary Center for Astrostatistics (CASt) in 2003. As a result of this history and the continuing growth of collaborations with the addition of Hyungsuk Tak (Statistics), Eric Ford (Astronomy), Joel Leja (Astronomy), Ashley Villar (Astronomy), Ian Czekala (Astronomy), Derek Fox (Astronomy), David Hunter (Statistics), Donghui Jeong (Astronomy), Rebekah Dawson (Astronomy), and the remarkably successful summer astrostatistics workshops and conferences, Penn State's astrostatistics group is world renowned.


Faculty and Student Research Collaborations

Identifying Galaxies with Unique Data Analysis

Assistant Professor Hyungsuk Tak's collaboration with  graduate student Sarah Shy focuses on developing a new data analytic tool to quantify classification uncertainties with statistical and machine learning methods, such as random forests and support vector machines, in the unique context of astronomical data. They have successfully applied their method to identify a specific type of galaxy (high redshift quasars) from a large-scale data set with millions of astronomical objects.  The challenge of this problem lies in the fact that the targeted objects are presumed to be dimmer (more uncertain) than other objects.

Their goal is to scale up the method so that it can be applied to even larger rectangular data sets with billions of lines (astronomical objects) with hundreds of columns (properties) in the near future because the Rubin Observatory Legacy Survey of Space and Time (known as LSST) will start monitoring the entire sky in a few years, producing terabytes of data per day.