A group of researchers from the University of Oxford, Harvard, and the Broad Institute have built an artificial intelligence (AI) tool that can predict which genetic variants might go on to cause disease and which are insignificant.
EVE—Evolutionary model of Variant Effect—is an AI tool created by the team that uses a type of unsupervised machine learning combined with evolutionary data to more accurately predict which genetic variants are more likely to be disease causing. The program learns adaptively as more information is entered making it more useful, the more extensively it is used.
Researchers have tried to use AI to help identify variants in the past, but with minimal success. “State-of-the-art methods have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable,” write the authors in the Nature paper describing the research.
To try and create a better system that does not rely on these labels, Yarin Gal, an associate professor at the University of Oxford, and Debora Marks, an associate professor at Harvard Medical School, co-led a project to develop EVE. The new system scans how genes have developed across more than 100,000 different species and looks for “constraints on the protein sequences that maintain fitness.”
To train the system, Gal, Marks, and colleagues used the EVE tool to predict pathogenic variants in 3,219 disease-associated genes previously associated with disease and recorded in the ClinVar database. EVE accurately predicted clinical significance for all labelled variants across all the genes assessed with an average area under the curve (AUC) of 0.91.
“EVE outperforms all supervised and unsupervised methods at predicting known clinical labels,” write the authors. “This is despite a large fraction of these labels being used in training the top-performing methods, as well as, in some cases, being used extensively in defining labels.”
The team also compared the model with 40,000 experimentally measured variants found in 10 proteins and it was highly accurate at predicting function in this case. It also agreed with highly detailed experimental results produced about the function of variants in five genes—BRCA1, TP53, PTEN, MSH2 and SCN5A—with known disease links.
As different amounts of information are known about different variants, EVE has a ‘certainty/uncertainty’ predictor built in that allows users to estimate how likely the prediction is to be accurate. When the research team took out the 25% of prediction that were least certain, the accuracy of the other predictions went up to around 90%.
The researchers also combined EVE with other tools such as gnomAD and reclassified 256,000 variants that were previously ‘of unknown significance’ as pathogenic or harmless.
“Our results turned out to be far better than we expected,” Marks said. “It seems that by simply training a model to fit the distribution of sequences across evolution we extract information which enables us to make unexpectedly precise predictions about disease risk arising from a given genetic variant.”
The research team emphasizes that EVE is not supposed to be a diagnostic tool, but hope it will help provide more information for clinicians and other researchers about newly discovered or unknown variants.
“We believe our approach can be used as an added tool in current clinical assessments and offers a powerful new way to reduce uncertainty and clarify decision-making, particularly in the clinical setting,” said Marks.
“We’re not providing clinicians merely with a number but also giving them the degree of uncertainty that comes with it,” Gal said. “This is something that the expert can take and use in the decision-making process. The tool can say, ‘I think that variant belongs to that pile, but I’ve never seen any variants like that before so take that with a grain of salt.’ Or the tool can also say, ‘I think that that other variant belongs to this pile, and I’ve seen very similar variants to that in the past, and I saw them belonging to this pile and therefore I’m going to assign it to this pile with high confidence.’ Building trust between the tool and the expert is an important aspect of this work.”