Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun;18(6):608-17.
doi: 10.1038/gim.2015.137. Epub 2015 Nov 12.

Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency

Affiliations

Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency

William P Bone et al. Genet Med. 2016 Jun.

Abstract

Purpose: Medical diagnosis and molecular or biochemical confirmation typically rely on the knowledge of the clinician. Although this is very difficult in extremely rare diseases, we hypothesized that the recording of patient phenotypes in Human Phenotype Ontology (HPO) terms and computationally ranking putative disease-associated sequence variants improves diagnosis, particularly for patients with atypical clinical profiles.

Methods: Using simulated exomes and the National Institutes of Health Undiagnosed Diseases Program (UDP) patient cohort and associated exome sequence, we tested our hypothesis using Exomiser. Exomiser ranks candidate variants based on patient phenotype similarity to (i) known disease-gene phenotypes, (ii) model organism phenotypes of candidate orthologs, and (iii) phenotypes of protein-protein association neighbors.

Results: Benchmarking showed Exomiser ranked the causal variant as the top hit in 97% of known disease-gene associations and ranked the correct seeded variant in up to 87% when detectable disease-gene associations were unavailable. Using UDP data, Exomiser ranked the causative variant(s) within the top 10 variants for 11 previously diagnosed variants and achieved a diagnosis for 4 of 23 cases undiagnosed by clinical evaluation.

Conclusion: Structured phenotyping of patients and computational analysis are effective adjuncts for diagnosing patients with genetic disorders.Genet Med 18 6, 608-617.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Semantic phenotype matching. For each variant that passed the frequency and Mendelian inheritance filters, the patient's Human Phenotype Ontology terms were compared to all human diseases associated with the gene containing the variant in OMIM or Orphanet, (a) as well as any phenotypes associated with orthologs of the gene in mice or zebrafish (b). If there was no phenotypic match between the patient and any phenotypes associated with the gene, then the patient's phenotype was compared with phenotypes associated with nearby genes in the protein–protein association network (c). When calculating the phenotypic score, the network considered the similarity to the patient phenotype and proximity of the matching gene to the gene in which the patient had a mutation.
Figure 2
Figure 2
Benchmarking of Exomiser prioritization. (a) The contribution of human, mouse, and zebrafish phenotypes along with protein–protein association data to novel association discovery is shown for each alone and the various combinations. HGMD mutations were added to unaffected 1000 Genomes Project exomes and run through Exomiser under conditions where the known disease–gene association was removed from the database for each run to simulate novel discovery. Bars show percentage of exomes in which the true variant was prioritized as the top hit. Results shown are after filtering to remove common (>1% minor allele frequency by Exome Sequencing Project data), synonymous, and noncoding variants. (b) Performance on previously diagnosed Undiagnosed Diseases Program disease variants. Shown are rankings of 11 previously diagnosed variants from nine solved families when analyzed under different conditions as indicated in the table below the chart: prioritization was based on the variant score alone (allele frequency and pathogenicity) and/or in combination with the phenotype score, and filtering was run with and without inclusion of pedigree-defined Mendelian filtering and inclusion of the disease–gene (D-G) association. Bars show how many of the 11 previously diagnosed variants were on the list of the top 1, 5, or 10 candidate variants. The Exomiser scores are reflected in the last two columns, which incorporate variant and phenotype. The best performance is observed with inclusion of a known Mendelian inheritance model; all 11 variants were in the top 5 or 10, with or without prior knowledge of disease–gene associations, respectively.

Similar articles

Cited by

References

    1. Gahl WA, Tifft CJ. The NIH Undiagnosed Diseases Program: lessons learned. JAMA 2011;305:1904–1905. - PubMed
    1. Yuan H, Hansen KB, Zhang J, et al. Functional analysis of a de novo GRIN2A missense mutation associated with early-onset epileptic encephalopathy. Nat Commun 2014;5:3251. - PMC - PubMed
    1. Dias C, McDonald A, Sincan M, et al. Recurrent subacute post-viral onset of ataxia associated with a PRF1 mutation. Eur J Hum Genet 2013;21:1232–1239. - PMC - PubMed
    1. Markello TC, Adams DR. Genome-scale sequencing to identify genes involved in Mendelian disorders. Curr Protoc Hum Genet 2013;79:Unit 6.13. - PMC - PubMed
    1. MacArthur DG, Balasubramanian S, Frankish A, et al.; 1000 Genomes Project Consortium. A systematic survey of loss-of-function variants in human protein-coding genes. Science 2012;335:823–828. - PMC - PubMed

Publication types