Certain DNA sequences can form structures other than the canonical double helix. These alternative DNA conformations — referred to as non-B DNA — have been implicated as regulators of cellular processes and of genome evolution, but their DNA tends to be repetitive, which until recently made reliably reading and assembling their sequences difficult. Now, a team of researchers, led by Penn State biologists, has comprehensively predicted the location of non-B DNA structures in great apes. It’s the first step in understanding functions and evolution of such structures, known to contribute to genetic diseases and cancer, the team said.
The work depends on newly available telomere-to-telomere (T2T), or end-to-end, genomes of humans and other great apes that overcame sequencing and assembly difficulties associated with repetitive DNA to fill in any remaining gaps in the genomes. A paper describing the study, which shows that non-B DNA is enriched in the newly sequenced segments of the genomes and suggests potential new functions, appeared today (April 24) in the journal Nucleic Acids Research.
“When the human genome was first published in 2001, it actually wasn’t complete,” said Kateryna Makova, Verne M. Willaman Chair of Life Sciences, professor of biology at Penn State and the leader of the research team. “About 8% of the genome, largely repetitive DNA, was left undetermined because the available technology and computational algorithms were unable to reconstruct these regions. In 2022 and 2023, a massive effort by the Telomere-to-Telomere consortium filled in these gaps for the human genome, and this year, we did the same for all the great apes.”
For most genomes that have been sequenced, researchers used short-read DNA sequencing technologies. These techniques work by first breaking genomes into millions of tiny segments, which can be sequenced and then must be painstakingly reassembled like the world’s most complicated jigsaw puzzle.

“Much of the genome is made up of repetitive DNA, which could take the form of hundreds or even thousands of copies of the same short sequence back-to-back along a chromosome,” said Linnéa Smeds, a postdoctoral researcher in biology at Penn State and the first author of the paper. “This is a problem for assembling genomes from short reads, because there are so many puzzle pieces that look the same. The T2T genomes overcome this using new long-read sequences technologies, allowing us to sequence the genomes in fewer longer segments. This way we can explore these regions for interesting functional elements, like non-B DNA, for the first time.”
Non-B DNA can take many forms, including bent DNA, hairpins, G-quadruplexes (G4s) and Z-DNA based on certain sequence motifs, which tend to be repetitive. These structures have recently been implicated in several cellular processes, such as DNA replication initiation during cell division, gene expression regulation, and the function of telomeres — the caps at the ends of chromosomes — and centromeres, chromosomal structures that play a crucial role during cell division. The research team searched the T2T genomes for these sequence motifs to identify all potential non-B forming regions in the genomes of human, chimpanzee, bonobo, gorilla, two orangutan species and siamang, a lesser ape used as an outgroup.
“We now have a complete picture of the motifs that are prone to non-B DNA formation for these genomes,” Smeds said.
The research team found that newly deciphered sequences in the genomes are enriched for non-B motifs and that the patterns of non-B DNA distribution were largely similar across the ape species. The gorilla genome, known to have a higher percentage of repetitive DNA, also contained a higher number of potential non-B DNA motifs.
Non-B DNA also tends to have higher mutation rates and can be unstable, which could lead to DNA breakpoints and allow for chromosomal rearrangements, which the researchers suggested may be important for genome evolution and in certain genetic disorders.
“Recently, a type of repetitive DNA, known as satellite DNA, was shown to be the breakpoint of a translocation of chromosome 21 that is associated with one type of Down Syndrome,” Smeds said. “We found motifs for Z-DNA, a type of non-B DNA, to be 97 times more frequent in this region than the rest of the genome, which could indicate a role of non-B DNA in these types of chromosomal rearrangements, but additional research would be required to validate this relationship.”
Analyzing only a small number of motifs for now, the researchers experimentally confirmed that non-B DNA structures actually form but emphasized that the vast majority will require additional confirmation.
“The formation of non-B DNA structures at a given motif is almost certainly going to be context-dependent,” Makova said. “It could depend on cell type, developmental stage and genomic context, including DNA modifications like methylation. There has been a recent shift in how we think about the function of the genome to go beyond sequence to include structure. We hope our study will serve as a springboard for additional studies of the function of these novel structural characteristics in the genome.”
In addition to Makova and Smeds, the research team includes Kaivan Kamali, computer scientist at Penn State at the time of the study; Francesca Chiaromonte, Dorothy Foehr Huck and J. Lloyd Huck Chair in Statistics for the Life Sciences and professor of statistics at Penn State; and Iva Kejnovská and Eduard Kejnovský at the Institute of Biophysics of the Czech Academy of Sciences. The U.S. National Institute of General Medical Sciences and the Grantová Agentura České Republiky funded the research.