New Characterization of the Human Genome's Ability to Mutate Catalyzes Biomedical Research
This image charts the rates at which four types of mutations segment human chromosomes: insertions, deletions, substitutions, and microsatellite-repeat-number changes. As part of their research, scientists led by Penn State University Professors Kateryna Makova and Francesca Chiaromonte compared human DNA with DNA from other primates including orangutans (illustrated in the background) and then processed the human DNA sequence using a statistical segmentation technique. Credit: K. Makova and F. Chiaromonte, Penn State University
As biomedical researchers continue to make progress toward the realization of personalized genomic medicine, their focus is increasingly tuned to highly mutable regions of the human genome that contribute significantly to genetic variation as well as many inherited disorders.
Accurately characterizing mutability has -- to date -- posed a serious challenge, but a team of Penn State University researchers has recently made a great step forward.
The results of an interdisciplinary study -- to be published this week in the journal Proceedings of the National Academy of Sciences -- provide a comprehensive geographic characterization of mutability in the human genome. The study is led by Professor of Biology Kateryna Makova and Professor of Statistics and Public Health Sciences Francesca Chiaromonte, who also are affiliates of the Huck Institutes of the Life Sciences at Penn State.
"In this project we combined genome-wide data on human-orangutan DNA differences, genetic variability within Homo sapiens, several features of the human genomic landscape, and detailed functional annotations of the human genome," said Makova, who is the director of the Penn State Center for Medical Genomics. "Such rich information allowed us to discern regions of the genome with particular mutational regimes. For example, we found some regions where rates of different mutation types are all elevated (hot regions), and others where the rates are all reduced (cold regions). The location of these regions in the genome is not random and can be associated with intragenomic differences in GC content, recombination rates, methylation, etc. Intriguingly, we found that protein-coding genes preferentially inhabit mutationally hot regions, likely because mutations of these genes can confer an adaptive advantage."
Estimating the rates of four common mutation types -- nucleotide substitutions, small (â≈€ 30bp) insertions and deletions, and mononucleotide microsatellite repeat number alterations -- across the human genome, the researchers analyzed and mapped the incidence of those mutations onto corresponding chromosomal segments, yielding a genome-wide profile of mutagenetic mechanisms and potential.
"Hidden Markov Models, which have a long history of applications in genomics, were instrumental in unveiling the biological implications of our rich data," said Chiaromonte. "Using these models, we were able to quantitatively characterize the different mutational regimes ("hidden states" in statistical jargon) and to partition the genome into contiguous segments governed by each such regime. Importantly, with this approach we are demarcating switches in mutational regimes along the genome -- the boundaries between segments -- based on the data. Moreover, since we utilize four mutation rates simultaneously, our results account for and exploit interdependencies among different types of change that affect the genome. We also employed simulations to assess associations between mutational regimes, genomic landscape features, and the spatial organization of functional elements."
The paper not only represents a significant contribution to scientists' understanding of the intricacies of human mutagenesis, but also provides a foundation for biomedical analyses -- such as screening genomes for cancer- and other disease-related variants -- which may assist in the validation of disease-causing sites across the genome and catalyze development of targeted, site-specific therapeutic strategies.
"Our results have far-reaching implications for several areas of biomedical sciences," said Makova. "First, knowledge about mutationally hot and cold regions can aid in screening disease variants, since hot regions are expected to give more false positives. Second, previous studies demonstrated that mutation rates are usually overestimated when pedigree data are used; we show that such overestimation occurs because of mutations located in hot regions. Third, information about mutationally hot and cold regions can improve predictions of functional noncoding elements in the genome, which are expected to be less conserved in mutationally hot regions. Ultimately, we and other researchers can utilize the results of our analysis (which are publicly available) to address these pressing questions in medical, evolutionary, and functional genomics."
Other key contributors to the study are Penn State doctoral students Prabhani Kuruppumullage Don, currently a candidate in the Statistics program, and Guruprasad Ananda, a graduate of the Huck Institutes' Bioinformatics and Genomics program who recently has accepted a position with Jackson Laboratory in Bar Harbor, Maine.
This research was supported by the National Science Foundation (NSF), National Institutes of Health (NIH), a Marie Curie Fellowship from the European Commission, the Penn State Clinical and Translational Science Institute (CTSI), Pennsylvania Tobacco Settlement Funds, and the Huck Institutes of the Life Sciences at Penn State.
[ Seth Palmer ]