Most Genome Data Comes from White Folks. Scientists Are Trying to Fix That.

With a new genome sequence, more diverse data sets, and population-specific projects, scientists are making progress in representing humanity’s real DNA diversity.

Precision medicine, which aims to tailor medicine to each individual for better outcomes, is fueled by genomic data. Unfortunately, there’s a problem with the fuel. For the past couple of decades, genomic databases have been filled with information gathered from people of European descent — and precious little from other ancestries. That means white people have gotten the earliest benefits of precision medicine, while everyone else has to wait for genomic databases to catch up.

This was a predictable situation based on the early days of genomics, when some of the largest population studies were kicked off in European countries with little ethnic diversity. Since then, scientists have been working hard on programs designed to capture the full range of genetic diversity among humans, including the All of Us research project organized by the National Institutes of Health. Recently, several teams have reported progress in improving the diversity of genomic data.

A Whole New Genome

You probably know that the first human genome was sequenced during the Human Genome Project and declared complete in 2003. But unless you work in genomics, you probably didn’t know that the human genome was never really finished. If sequencing a genome is like reading a book, the book of the human genome is really old and badly neglected. Some sections are much harder to read than others. When the first human genome sequence was ‘complete,’ nearly 10% of it actually remained unread.

But now, all these years later, scientists deployed newer technologies to churn through the whole human genome, including those previously intractable regions. The final product, a “gap-free” sequence, is a critical new resource that will make it easier for other researchers to sequence population-specific genomes and quickly increase our understanding of genomic diversity.

“Truly finishing the human genome sequence was like putting on a new pair of glasses. Now that we can clearly see everything, we are one step closer to understanding what it all means,” said Adam Phillippy, an NIH scientist who helped lead the project, in a statement. He added, “In the future, when someone has their genome sequenced, we will be able to identify all of the variants in their DNA and use that information to better guide their healthcare.”

Screening in the City

For researchers looking to expand genomic data diversity, nothing beats a big city. In New York, a program called BioMe based at Mount Sinai’s medical school has been running to gather more diverse data and answer important questions about whether underrepresented communities are willing to participate in genomic research studies.

An update on this work was presented by Mount Sinai’s Noura Abul-Husn at the recent annual meeting of the American College of Medical Genetics and Genomics. In a BioMe-powered study that’s been running for the past two years, scientists have used genome screening to detect potentially dangerous genetic variants and then monitor patients for associated diseases. The idea is simple: rather than waiting for people to get sick and wind up in the emergency room where doctors have to figure out what’s wrong, why not use a genome-first approach to determine their biggest health risks and try to help them avoid getting sick in the first place?

The BioMe effort, which is broader than this particular program, represents the diversity of New York. In a typical genomic collection, some 70% of samples come from people of European descent; in BioMe, just 27% of samples fall into this category, Abul-Husn said at the conference.

The genomic screening program tests for genetic variants linked to five conditions, delivering results back to patients who have opted in to learn about their susceptibilities. Abul-Husn noted that across all ethnic groups, at least 90% of participants wanted to know their results, challenging assumptions that some groups — especially those historically mistreated by the medical community — would be reluctant to learn about their DNA.

The study is still underway, but the team is already learning from it. For example, one variant they’re testing for is related to a heart condition; the variant is more common in non-European populations. Among the patients found to have the variant, Abul-Husn said that not a single one had previously been diagnosed with the disease — not even those who had been treated for heart problems. A genome-first approach, then, has strong potential for helping to overcome existing disparities in healthcare.

Battling Cancer in Indigenous Populations

In the U.S., Native Americans suffer worse outcomes from cancer than any other ethnic group, in part due to poorer access to healthcare and to receiving fewer screening procedures for cancer. By working closely with tribal leaders and performing population-specific genome analysis, scientists are hoping to better understand cancer in these communities.

Cheryl Willman, who now runs the Mayo Clinic Comprehensive Cancer Center and previously led the University of New Mexico Comprehensive Cancer Center, reported on some of these efforts at the recent annual conference of the American Association for Cancer Research. For some types of cancer, she said, “cancer screening rates in some tribal communities are as low as 4%.” That means cancers aren’t caught early on; when they are eventually found, they are more advanced and less likely to respond well to treatment.

Making matters worse is that these groups are not often included in large-scale projects to study cancer. In one major cancer genome project, Willman said, less than 0.5% of samples came from American Indians, Alaskan Natives, and Hawaiian Pacific Islanders combined. To overcome this problem, she is part of a patient engagement project working with Native Americans in the southwest in which scientists will perform genome sequencing on cancer samples collected from tribal communities. The goal is to include as many as 1,000 cancer patients and cancer survivors, collecting several types of samples from each participant for a more comprehensive view of cancer and health in these communities. “It will be an extensive data set on each patient,” Willman said. Results will be returned to each participant.

In order to make this project successful, scientists are working closely with tribal leaders. Tribal representatives participate in an advisory council for the project, review its logistical details, and will have a say in how research results are reported to the public. Scientists must receive consent not just from each participant, but also from tribal leaders, in order to proceed with sample analysis. Willman said the team also hopes to generate a representative genome sequence for this population; this would improve their ability to benefit from precision medicine.

These are just a few examples of the many projects researchers have launched to address the lack of diversity in genomic databases. With rapidly falling prices for genomic technologies, the good news is that incorporating significant volumes of data from diverse populations now costs just a fraction of what it did when these genomic databases were first set up.

Related Posts
See All

Expert Group Finds Covid Response a “Massive Global Failure”

With a new genome sequence, more diverse data sets, and population-specific projects, scientists are making progress in representing humanity’s real DNA diversity.

Solar Sovereignty: The Promise of Native-Led Renewables

With a new genome sequence, more diverse data sets, and population-specific projects, scientists are making progress in representing humanity’s real DNA diversity.

5 Most Sustainable Startups 2022

With a new genome sequence, more diverse data sets, and population-specific projects, scientists are making progress in representing humanity’s real DNA diversity.

Sustainability Software Takes the Spotlight

With a new genome sequence, more diverse data sets, and population-specific projects, scientists are making progress in representing humanity’s real DNA diversity.