Left to right: Eric Ford, Eric Feigelson, Hyungsuk Tak, Jogesh Babu. Credit: Nate Follmer.
science-journal

Astronomy is better with better statistics

By adapting, developing, and improving analytical methods and predictive models, statistics is helping astronomers to harness the full potential of their data
9 April 2020

Astronomy is one of those rare scientific fields that seems to capture everyone's interest. It's innately human to gaze up at the heavens at night in wonder and awe, and we've sought since ancient times to understand the universe and our place in it. From the days of Nicolaus Copernicus, Johannes Kepler, and Galileo Galilei, the revolutions of modern astronomy have rocked the very foundations of science and society. In the 21st century, the advent of large astronomical surveys like the Sloan Digital Sky Surveys, the launch of space-based observatories like NASA's Kepler and the European Space Agency's Gaia, and the dawning of multimessenger and gravitational-wave astronomy have enabled some of the most remarkable discoveries of our lifetime. In this age of big data, astronomy has found an unlikely hero—statistics.

Left to right: Eric Ford, Eric Feigelson, Hyungsuk Tak, Jogesh Babu. Credit: Nate Follmer.
Left to right: Eric Ford, Eric Feigelson, Hyungsuk Tak, Jogesh Babu. Credit: Nate Follmer

"We've entered into an era when the data have become too big to look at," says Penn State astronomer Eric Feigelson. "We can't just plot a few points on a graph and begin to understand the phenomenon; there are too many variables, too many points, too much complexity. In order to extract scientific knowledge from astronomical data and test our astrophysical theories, we need to learn to do better analysis—so the need for statistics has greatly increased."

It was out of his own need for statistical insight that Feigelson—primarily an X-ray astronomer—first reached out to Penn State statistician Jogesh Babu more than 30 years ago, sparking a lasting friendship and longstanding collaboration that would eventually lead them to foundPenn State's groundbreaking Center for Astrostatistics in 2003. Together, they have organized a highly successful series of international astrostatistics conferences, authored a number of books—including one that received the Association of American Publishers' Award for Professional and Scholarly Excellence (PROSE) in cosmology and astronomy in 2012—and launched a succession of summer school programs that has taught advanced statistical methods for astronomy to thousands of graduate students from around the world. As a result, today Penn State is widely recognized as both a founder of and leader in astrostatistics. At the outset, though, success didn't come quite so easily.

"In the beginning, it was a bit of a struggle," Babu recalls."Even though we both speak English, we didn't understand each other, because what astronomers call certain statistical terms is different than what statisticians use those terms for. But by listening to each other patiently for some time—a couple of years—we came to understand each other well."

One could say that Babu's and Feigelson's story mirrors that of statistics and astronomy, and after years of talking and listening, the two fields have begun to understand each other well enough to yield fruit in the form of tangible research results. Now, at the leading edge of astronomy, statistics is key to the search for potentially habitable worlds beyond our own solar system and to our understanding of how the universe is evolving.

Astrostatistics in action

Gravitational waves are ripples in space-time (represented by the green grid), produced by accelerating bodies such as interacting supermassive black holes. These waves affect the time it takes for radio signals from pulsars (represented by the gray spheres) to arrive at Earth. Credit: David Champion/NASA/JPL.Penn State astronomer Eric Ford studies the formation and evolution of planetary systems. Using data from the Kepler and Gaia missions as well as various ground-based surveys, he combines advanced statistical and computational algorithms in complex models that could help to inform the design and planning of future exoplanet-hunting missions searching for potentially habitable planets orbiting sunlike stars.

"We're trying to apply fundamental physics to predict what's going on in the universe," Ford says, "Statistics allows us to go from qualitative inspection to quantifying 'How good are our predictions?'Are they acceptable? Is this model sufficiently accurate and precise to accomplish our science goals?"

Feigelson, too, is applying statistics to studying exoplanets, specifically focusing on data from the Kepler mission's observations. With its highly sensitive photometer, Kepler was capable of detecting infinitesimal dips in stars' brightness as orbiting planets transited those stars, passing in front of them along its line of sight. But Kepler's sensitivity also brought an unexpected problem to light: Most stars exhibit intrinsic variations in brightness that can obscure the signs of planetary transits.To address this challenge, Feigelson adapted—of all things—a modeling method more commonly used by economists to predict the stock market.

"Removing the complicated variations of these stars ends up being a statistical problem," he says. "Astronomers were developing methods to remove them in various ways, but they missed the most common approach used by statisticians and econometricians for this kind of problem since the 1970s, a form of regression known as autoregressive modeling. So I wondered if this would work on stars. My graduate student Gabriel Caceres tried it out and found it was very successful. We basically rewrote part of the NASA pipeline using methods that a time-series statistician would find totally normal but were largely unused in astronomy. With this procedure, we ended up uncovering several dozen new candidate planets orbiting Kepler stars."

When a planet passes directly between a star and its observer, it dims the star's light by a measurable amount. This image shows a single planet (orbiting from left to right) and the corresponding light curve. Credit: NASA's Jet Propulsion Laboratory.
When a planet passes directly between a star and its observer, it dims the star's light by a measurable amount. This image shows a single planet (orbiting from left to right) and the corresponding light curve. Credit: NASA's Jet Propulsion Laboratory.
This artist's concept shows NASA's Kepler Space Telescope observing four planets that are orbiting a single dwarf star. Credit: NASA/JPL-Caltech.
This artist's concept shows NASA's Kepler Space Telescope observing four planets that are orbiting a single dwarf star. Credit: NASA/JPL-Caltech.

An illustration of the Hubble Constant. Penn State's newest astrostatistician, Hyungsuk Tak, has developed a novel statistical model to refine the Hubble constant—scientists' estimate of the universe's rate of expansion—one of the most important parameters in cosmology, the study of the origin and evolution of the universe. Differing estimates and methods of calculating the Hubble constant are a source of constant debate within the astronomical research community, and Tak hopes to alleviate some of that by using his own independent methods.

"No one knows the exact answer," he says. "There are so many Hubble constant estimates, and there are tensions between some important physical properties of the data. In analyzing the data, what statisticians are interested in is whether we can also develop a new method, practically motivated by the science, to confirm whether these estimates are consistent with the observed data. Probably I'm not the person who finally solves this; but if more and more people contribute, then there maybe consensus in the future, and I hope to contribute in that direction."

Babu, on the other hand, is applying his statistical expertise to gravitational-wave astronomy, in a collaboration that is using methods much different from those employed by the LIGO (Laser Interferometer Gravitational-Wave Observatory) Scientific Collaboration, which made the first-ever observation of gravitational waves in 2015. The North American Nanohertz Observatory for Gravitational Waves—known colloquially as NANOGrav—is a National Science Foundation Physics Frontier Center that aims to use radio telescopes to detect low-frequency, nanohertz, gravitational waves by measuring the waves' effects on the timing of light pulses from rotating neutron stars known as pulsars.

"Rapidly rotating pulsars keep precise time periods," Babu explains, "and if a gravitational wave passes between us and the pulsar, there is a delay in the signal coming to us. We are developing statistical methods that will help us use that information to detect nanohertz gravitational waves—a complimentary effort to LIGO, covering an entirely different region of the gravitational-wave spectrum."

 

What lies ahead

All of this knowledge is crucial to better understanding our universe, and while its scope may be largely confined to astronomy and other, closely related fields, the impact of the underlying research ripples outward across the whole of science.

"Questions about how our solar system formed, how Earth fits in, whether life in the universe is common or rare are intrinsically interesting in themselves," Ford says. "But to me, the process of how we're learning those things, the techniques we're developing, and the students we're training, those are equally important and potentially even a bigger legacy in terms of our impact on society. This knowledge can be applied to a wide range of issues beyond astronomy."

In truth, we all benefit from better statistics. Brought to bear on all manner of data, its methods and mindsets are crucial to science advancing society toward a better and sustainable future. Its insights are furthering far-reaching initiatives in personalized medicine and public health, green energy and global economics, meteorology and climate change mitigation—the list goes on. Data are everywhere. Rapidly evolving technology is enabling us to collect them at an ever-increasing rate. And now the grand challenge is to elicit the greatest meaning from those data. Succeeding, statistics may yet be rightly counted among the heroes of 21st-century science.

 

Jogesh Babu, Distinguished Professor of Statistics and of Astronomy and Astrostatistics, is the director of the Center for Astrostatistics.

Eric Feigelson, Distinguished Senior Scholar and professor of astronomy and astrostatistics and of statistics, is an associate director of the Center for Astrostatistics.

Eric Ford, professor of astronomy and astrophysics, is a co-hire with the Institute for Computational and Data Sciences, the director of Penn State's Center for Exoplanets and Habitable Worlds, and an associate director of the Center for Astrostatistics.

Hyungsuk Tak, assistant professor of statistics and of astronomy and astrophysics, is a co-hire with the Institute for Computational and Data Sciences and a member of the Center for Astrostatistics.