Skip to main content
red background with illustrations of clothing

Improving heart health by diversifying genetic data

Penn State statistician helps gather genetic data from underrepresented groups and improve statistical methods to make the most of the complex data.
1 February 2023

On the first Friday of February, many Americans wear red to kick off American Heart Month and bring attention to the prevalence of heart disease—the leading cause of death in the United States. Many factors can contribute to a person’s risk of developing heart disease, including high blood pressure, high cholesterol, diabetes, smoking, and being overweight. But a less visible factor also impacts risk: a person’s genes. 

The genetics of heart disease are very complex. For example, there isn’t a single “heart attack gene.” Instead, many genes interact with each other and with environmental factors to contribute to a person’s overall risk of heart disease. But which genes are involved in these “polygenic” traits and their relative contributions can vary among individuals and from population to population. Penn State statistician Xiang Zhu works with international collaborations to study these genes and their role in heart disease in as diverse a sample as possible to improve our understanding of risk and ultimately to improve heart health for everyone.

“If we know a disease is highly genetic, we can make predictions about an individual’s risk of contracting that disease based on genetics in the form of polygenic risk scores,” said Zhu, assistant professor of statistics, member of the Huck Institutes of the Life Sciences, and affiliate of the Institute for Computational and Data Sciences. “If our predictions are good enough, then doctors can use those predictions to make personalized recommendations to their patients. We cannot change a person’s genetics, but we can change other known risk factors like diet, exercise, and smoking.”

Ongoing research around the genetics of heart disease has greatly improved researchers’ abilities to construct polygenetic risk scores based upon which versions of genes and other genetic elements are present in a person’s genome. However, the basis for how these risk scores are calculated is built on data primarily collected from people of European ancestry. 

“Good knowledge about the genetics of heart disease has really helped us improve prevention and treatment of the disease,” said Zhu. “But the genetic architecture of the same disease can be different in different populations, and right now our knowledge is largely limited to one population. That means we have to try to apply the data we have to other populations, which is not ideal and can exacerbate health disparities. One of our first goals was to create datasets that cover multiple populations so we can improve our methodology for creating risk scores for everyone.”

Working with international teams of researchers, Zhu has played important roles in two of the largest, most genetically diverse studies related to the genetics of two factors the contribute to heart disease: coronary artery disease—the most common form of heart disease, which can lead to heart attack—and cholesterol levels—a measurable risk factor for heart disease.

In the first study, the research team used genetic data about coronary artery disease from the Million Veteran Program, which includes a healthcare system that serves a diverse population, as well as data from recently published studies. This resulted in information from nearly a quarter of a million people with coronary artery disease, including the largest samples to date of Black and Hispanic people, which allowed the researchers to characterize the disease in these populations for the first time. The researchers also created polygenetic risk scores based on their data, which performed as well or better than previous scores based primarily on data from populations of European ancestry.

“We are very grateful to the participation of U.S. veterans in the Million Veteran Program,” said Zhu. “Without their participation, we would not have been able to do this work.”

This data will also be incorporated into the CARDIoGRAMplusC4D consortium, where it can be combined with other data to maximize the power for discovery.

graphic with text about American Heart month

In the next set of studies, Zhu and colleagues worked alongside the Global Lipids Genetics Consortium to explore the genetics of cholesterol. They combined data from 201 previous studies involving 1.65 million people from 35 countries to produce the largest genetic study of cholesterol levels to date. Levels of cholesterol are highly predictive of heart disease, particularly levels of low-density lipoprotein (LDL) cholesterol, which is highly correlated with risk of heart attack.

“Just like with heart disease, these kinds of blood lipids are heritable, or highly genetic,” said Zhu. “Although heart attacks can’t be monitored—they are an event that happens—we can measure LDL to monitor a person’s health. It’s also treatable, and we can use drugs to lower LDL to minimize the risk of heart attack.”

The researchers found that polygenic risk scores based on diverse data are more predictive of whether a person will have elevated LDL compared to scores based on only European genomic data. The researchers then explored functional elements in the genome that contribute to how and when cholesterol-related genes are expressed, investigated how genes and functional elements interact to impact cholesterol, and identified potential targets for future drugs to lower LDL. They also explored how certain genes or sets of genes might regulate risk of multiple diseases at the same time, such as LDL alongside Alzheimer’s disease or obesity.

As a lead statistician of these studies, Zhu develops pipelines to use existing statistical techniques to reveal these insights. Because the data comes from many sources, a considerable effort is required to clean up and standardize the data, and because genetic data is so vast, with millions or even billions of data points, it falls into the category of “big data,” so even applying existing techniques can be challenging and must often be tweaked.

“There is still a lot of analysis we can do with the current methods on these datasets, but it is important to note that our current methods typically make assumptions that our data come from a single population, which work against researchers’ ongoing efforts of diversifying human genetic data,” said Zhu. “There is a critical need to develop more powerful methods to fully unleash the potential of the diverse datasets that we have created.”

Zhu recently received a seed grant from the Penn State Institute for Computational and Data Sciences to develop new statistical and computational methods to better leverage diverse datasets for cardiovascular disease. He also received a seed grant from the Penn State Consortium on Substance Use and Addiction to develop expanded methodology on diverse populations around the genetics of tobacco and alcohol use—two known and modifiable risk factors of heart disease.

“A lot of information in diverse datasets cannot be captured by existing methods, but developing new methods is not really the end goal,” he said. “We want to use those new methods to improve our understanding of the genetics of heart-related diseases and outcomes and ultimately to improve our health care.”

As genetic risk scores and the potential for personalized health recommendations continue to improve, National Wear Red Day and American Heart Month are reminders that most middle-age and young adults have at least one risk factor for heart disease. And, Zhu said, while we can’t change our genes, studies like these can help us better understand heart disease risk, guide recommendations for what we can change, and hopefully lead to happier, healthier hearts for everyone.

Media Contacts
Xiang Zhu
Assistant Professor of Statistics
Gail McCormick
Science Writer