Genomic Databases Biased Toward European Ancestry

October 11, 2016
Genomic Databases Biased Toward European Ancestry
Source: Thomas Northcut/Getty Images

A national group of researchers have found that the ClinVar and Human Gene Mutation databases—two of the most widely used databases—reflect a measurable bias toward genetic data derived from people of European ancestry over those of African ancestry, a situation that increases the difficulty of using genomic information to provide more focused healthcare to minority populations.

Detailed findings appeared October 11 in the journal Nature Communications, in an article entitled, “Challenges and Disparities in the Application of Personalized Genomic Medicine to Populations with African Ancestry.” The article described how 642 whole-genome sequences from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) project were used to evaluate typical filters and databases.

Essentially, the researchers, led by Timothy O'Connor, Ph.D., assistant professor at the University of Maryland School of Medicine, created the largest, high-quality non-European genome dataset ever assembled. It was representative of U.S., African, and Afro-Caribbean populations. When this dataset was compared with current clinical genomic databases, the researchers found a clearer preference in those databases for European genetic variants over non-European variants.

“The ability to accurately report whether a genetic variant is responsible for a given disease or phenotypic trait depends in part on the confidence in labelling a variant as pathogenic,” wrote the authors of the Nature Communications article. “Such determination can often be more difficult in persons of predominantly non-European ancestry, as there is less known about the pathogenicity of variants that are absent from or less frequent in European populations.”

In their analyses, the researchers distinguished between variant classes. For example, they evaluated pathogenic annotated variants (PAVs)—those identified as disease-causing leading online databases—as well as non-annotated variants (NAVs)—those not annotated as disease-causing. The researchers also went a little deeper to consider other members of the proverbial genomic haystack—so-called deleterious variants and prioritized variants.

“While we cannot be sure which of these variants are truly disease-causing (actual ‘needles’ rather than haystack members) without additional functional or association-based evidence, we believe that discrepancies between true pathogenicity and annotated pathogenicity are a major source of the biases we report,” the authors explained. “A likely contributor to this incongruity is that databases are missing population-specific pathogenicity information, and with regard to the results we report here, African-specific pathogenicity data."

"True causal variants for predominantly non-European patients are likely to fall into the NAV categories. Since NAVs have the highest degree of positive correlation with African ancestry (that is, bias), causal variants falling into this group are more difficult to distinguish, as they exist amongst a larger number of high-priority background variants (that is, larger haystack). This problem is compounded in individuals of substantial African ancestry, as their larger amount of overall genetic variation results in an even greater number of deleterious NAVs requiring adjudication.”

According to the authors, any biases and/or population specificities for African-ancestry patients that inflate the number of prioritized variants (that is, make the haystack bigger), would result in increased effort (that is, time and money) to identify a causative variant (that is, find the needle) in African-ancestry patients.

"By better understanding the important role of African ancestry in clinical genetics, we can begin to actually identify a disease that has been forgotten or is not part of an individual's self-identification," said Dr. O'Connor. "For example, if an African-American patient walks in the door, he might have 20% European ancestry, while another might have 20% African ancestry. That difference will dramatically change how many variants are found in their genome, and what disease risks they might encounter. That's why we need to expand these databases to include a broader range of ancestries, in order to produce more accurate medical genetic diagnoses."

Dr. O'Connor also pointed out that this shortfall in genomic data also comes at a financial cost. "If you translate the review time it takes for each one of these variants to be sequenced in terms of cost in a clinical setting, you're looking at a difference of about $1,000 more to analyze an African American's genome than a European American's genome—and you still receive less accurate results," he noted.


Oops! Please type your email in the following format:

You’re all set!
Thank you for subscribing to
Clinical OMICs Weekly