AIM Output Details#
Output Files#
Following the execution of AIM, you will receive four output files, which are saved to the designated output directory.
File Name |
Explanation |
|---|---|
*_default_predictions.csv |
Prediction results from AIM Default mode. |
*_recessive_predictions.csv |
Prediction results from AIM Recessive mode. Each row is a pair of variants. |
*_nd_predictions.csv |
Prediction results from AIM Novel Disease Gene mode. |
*_nd_recessive_predictions.csv |
Prediction results from AIM Novel Disease Gene + Recessive mode. |
Output Explanation#
Each of the output files mentioned above, contains:
Variant ID
Annotation information (features)
Prediction results: score, rank, and confidence
Prediction Results#
predictPrediction score from AIM
rankingMaximum ranking of all variants by prediction scores
confidenceConfidence score. It’s the z-score of AIM score in our inner cohort.
confidence levelHigh / Medium / Low. We set confidence score cutoff based on precision and recall.
Variant Annotation / AIM Features#
AIM takes information from various database and perform feature engineering, to make the final prediction. Here, we provide both raw and engineered features as part of the output.
In making the decision, AIM takes various types of information into consideration.
Conservation
GERPpp_RSGERP++ RS score, the larger the score, the more conserved the site. Scores range from -12.3 to 6.17
LRT_OmegaEstimated nonsynonymous-to-synonymous-rate ratio (Omega, reported by LRT)
LRT_scoreThe original LRT two-sided p-value (LRTori), ranges from 0 to 1.
phyloP100way_vertebratephyloP (phylogenetic p-values) conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site. Scores range from -20.0 to 10.003 in dbNSFP.
Constrain
homNumber of homozygotes variant in gnomAD
decipherVarFound0/1. Whether the variant is found in the a deletion of the DECIPHER control database.
dgvVarFound0/1. Whether the variant is found in a deletion of the DGV database.
conservationScoreDGV1/3 (Low/High). If DGV subtype is Loss or Deletion, score will be 1. Otherwise 3.
gnomadGeneOELofobserved/expected ratio of loss-of-function variants in gnomAD database.
gnomadGeneOELofUpperThe upper bound of the confidence interval for OE LoF.
conservationScoreOELof1/2 (Low/High). If
gnomadGeneOELofUpper < 0.35, score is 1; otherwise 2.gnomadGenePLIpLI score stands for the “probability of being loss-of-function intolerant” in gnomAD.
gnomadGeneZscoreThe gene z-score in gnomAD is related to missense variants and reflects how many standard deviations the observed count of missense variants is from the expected count. This metric can help identify genes that are under selective pressure and may be related to diseases.
conservationScoreGnomad1/2 (Low/High). If both gnomadAF and gnomadAFg are less than 0.01, score is high; otherwise low.
Disease Database
CLASS- CLASS from HGMD
DM: disease-causing mutation; DM? Likely disease-causing, but with questionable pathogenicity
clinVarGeneFound0/1, whether or not variant gene is found in ClinVar.
clinVarVarFound0/1, whether or not variant itself is found in ClinVar.
curationScoreClinVar1/2/3 (Low/Medium/High), curated using ClinVar significance description.
isB/LB0/1. Whether ClinVar significance description contains benign and no conflicting interpretation.
isP/LPFloat ranging 0-1. Among all descriptions in ClinVar about this variant, proportion of pathogenic ones.
clinvarNumBProportion of benign variants in the variant gene.
clinvarNumLBProportion of likely benign + benign variants in the variant gene.
clinvarNumLPProportion of likely pathogenic + pathogenic variants in the variant gene.
clinvarNumPProportion of pathogenic variants in the variant gene.
hgmdGeneFound0/1, whether or not variant gene is found in HGMD.
hgmdVarFound0/1, whether or not variant itself is found in HGMD.
curationScoreHGMD1/2/3 (Low/Medium/High), curated with
hgmdGeneFoundandhgmdVarFound.omimGeneFound0/1, whether or not variant gene is found in OMIM.
omimVarFound0/1, whether or not variant itself is found in OMIM.
curationScoreOMIM1/2/3 (Low/Medium/High), curated with
omimGeneFoundandomimVarFound.dominant0/1. Whether the variant gene is annotated as dominant in OMIM.
recessive0/1. Whether the variant gene is annotated as recessive in OMIM
hgmd_rsHGMD rank score, interpreted as relative probabilities of pathogenicity.
c_ClinVar_*Expansions of variant annotation from ClinVar. One-hot encoded.
c_CLNREVSTATThe ClinVar Review status for the same protein change in ClinVar
c_HGMD_Exp_*Expansions of variant annotation from HGMD. One-hot encoded.
c_isBLBThe original variant is annotated as Benign in ClinVar
c_isPLPThe original variant is annotated as Pathogenic or likely pathogenic in ClinVar
c_RANKSCOREThe HGMD RANKSCORE adapted from the original HGMD database
nc_ClinVar_ExpNon-coding variant expansion (2bp upstream or downstream of the original variants position)
nc_CLNREVSTATnon-coding variant expansion (2bp upstream or downstream of the original variants position)
nc_HGMD_Expnon-coding variant expansion (2bp upstream or downstream of the original variants position)
nc_isBLBThe original variant is annotated as Benign in ClinVar
nc_isPLPThe original variant is annotated as Pathogenic or likely pathogenic in ClinVar
nc_RANKSCORE
Variant Impact
cons_*Variant consequence type is one-hot encoded. Complete list:
‘transcript_ablation’, ‘splice_acceptor_variant’, ‘splice_donor_variant’, ‘stop_gained’, ‘frameshift_variant’, ‘stop_lost’, ‘start_lost’, ‘transcript_amplification’, ‘inframe_insertion’, ‘inframe_deletion’, ‘missense_variant’, ‘protein_altering_variant’, ‘splice_region_variant’, ‘splice_donor_5th_base_variant’, ‘splice_donor_region_variant’
IMPACTInteger 0-4 (None, Modifier, Low, Moderate, High). Subjective impact classification of consequence type.
IMPACT.from.Tier
In Silico Prediction
CADD_phredCADD Phred score
DANN_scoreDANN score
fathmm_MKL_coding_scorefathmm-MKL coding socre from dbNSFP
FATHMM_scoreFATHMM score from dbNSFP, minimum value selected.
M_CAP_scoreM-CAP score
MutationAssessor_scoreMutationAssessor score, maximum value selected.
Polyphen2_HDIV_scorePolyphen2 HDIV score, maximum value selected.
Polyphen2_HVAR_scorePolyphen2 HVAR score, maximum value selected.
REVEL_scoreREVEL score, maximum value selected.
SIFT_scoreSIFT score, minimum value selected.
Inferred Inheritance
No.Var.HGene level, Number of High IMPACT variants in the patient for candidate gene
No.Var.HMGene level, Number of High or Moderate IMPACT variants in the patient for candidate gene
No.Var.LGene level, Number of Low IMPACT variants in the patient for candidate gene
No.Var.MGene level, Number of Moderate IMPACT variants in the patient for candidate gene
TierAD1~4, Dominant Inheritance Score. The lower the more pathogenic
TierAR1~4, Recessive Inheritance Score. The lower the more pathogenic
TierAR.adj1~4, Adjusted Recessive Inheritance Score. For a candidate gene, if a rare intronic variant observed together with a high IMPACT variant, adjusted
AD.matched0/1,
TierAD <= 2anddominant == 1AR.matched0/1,
TierAR <= 2andrecessive == 1zygVariant zygosity, 1: heterozygous, 2: homozygous.
Minor Allele Frequency
ESP6500_AA_AFESP6500 African American Allele Frequency
ESP6500_EA_AFESP6500 European American Allele Frequency
gnomadAFgnomAD exome Allele Frequency
gnomadAFggnomAD genome Allele Frequency
Phenotype Matching
clinVarSymMatchFlag0/1, whether OMIM variant phenotype matches condition in ClinVar.
hgmdSymptomSimScoreSimilarity score between patient phenotype and variant phenotype in HGMD.
hgmdSymMatchFlag0/1, whether
hgmdSymptomSimScore >= 0.2omimSymptomSimScoreSimilarity score between patient phenotype and variant phenotype in OMIM.
omimSymMatchFlag0/1, whether
omimSymptomSimScore >= 0.2phrankPhrank measures phenotype sets similarity of the patient phenotype with phenotype linked to a candidate gene.
diffuse_Phrank_STRINGA phenotype score is derived through network diffusion, utilizing the String network and employing the Phrank score as the initial seed score.
others
simple_repeat0/1, whether variant is in simple repeat regions.
spliceAImaxMaximum of SpliceAI score among DS_AG, DS_AL, DS_DG, and DS_DL.