Shifting gears on the fly | Eberly College of Science

Penn State statistician builds better models for disease genetics by incorporating multiple types of data

Joel Ranck

9 April 2020

Daisy Philtron There is a child’s toy made of gears of different diameters and teeth sizes.Two gears can move in unison until a third, larger gear is added that increases power but appears to move at a slower pace. As you add different-sized gears, the machine changes speed, torque, and direction. This is how Daisy Philtron, assistant research professor of statistics, describes how her team uses multiple sources of data to pinpoint the genes that influence diseases like Parkinson’s.

Philtron and her team are a single cog in a collaborative effort to tackle diseases like Parkinson’s using data models. They combine genome-wide association studies (GWASs), which look for associations between genetic variants and disease traits in large samples of individuals, with RNA-sequencing data, which can tell them about how specific genes are expressed, collected from patients with Parkinson’s disease.The study is part of five-year collaborative grant between Penn State and the Gladstone Institutes, funded by the National Science Foundation (NSF).

Philtron explains that when RNA-sequencing data and GWAS data are analyzed jointly, researchers begin to get a clearer picture of how certain genetic markers are associated with disease occurrence or progression. The model the team is building is flexible, allowing for the addition of many different data sources, like the gears in the child’s toy. Each added data source influences the larger model to help the researchers implicate important genes or signals that relate to the target they seek to better understand—in this case, the genes influencing Parkinson’s.

The classic way to analyze multiple datasets is to look at each dataset separately and then merge them all at the end. Philtron and her team analyze data on the fly, allowing each dataset to influence the overall model and produce one combined dataset at the end of the project that tells a more complete story.

Throughout the project, additional data sources are added to the model from the collaborative laboratory at the Gladstone Institutes, where researchers perform follow-up experiments to alter target genes in cell models of Parkinson’s and then observe them to see how long the cells live or how they change over time. The resulting data are sent back to Philtron and her team to further inform their analysis.

“If we look at the genes found in Parkinson’s patients and overlay them on the GWAS data, we find that there are markers that are silent in the GWAS data that appear in the RNA-sequencing data,”said Philtron.“Additional datasets open further avenues. So the model is built to be flexible to kind of plug in a lot of different data sources.Fora lot of diseases, there’s no one gene that’s causing the disease. There might be a gene mutation that’s causing 10 percent of the disease cases, but there are others who have this same mutation but do not have the disease, and we still don’t know why. We are interested in what genes are protecting those people from disease, too.”

As new data are added, the team hopes to get a better understanding of the biological basis of Parkinson’s disease so that other research can begin to develop therapies to target it. Beyond the benefit to research on Parkinson’s, Philtron’s team expects that the model can be replicated to study other heritable diseases, like breast cancer and ALS.

“Statistics itself is not going to cure the disease,” said Philtron. “But it can help narrow down the options for people in other fields who are studying drug therapies, gene therapies, or physical therapies to help them figure out what’s working or what has a lot of potential to work.”

Like a cog in a larger machine, statistics is playing a vital role fighting Parkinson’s and other diseases.

Joel Ranck