With genome and exome sequencing becoming routine in clinical practice, geneticists are increasingly faced with the challenge of interpreting missense variants, changes in a single DNA base that alter one amino acid in a protein. Many of these are classified as variants of uncertain significance (VUS), leaving clinicians unsure whether to consider them harmful or not.
In such situations, in silico prediction tools like FATHMM (Functional Analysis through Hidden Markov Models) are used to assess the likely functional effect of a variant. FATHMM is one of the earliest and most widely used tools for estimating whether a missense mutation is pathogenic or benign, based on evolutionary and protein domain data.
Why Do We Need Tools Like FATHMM?
Not all missense mutations are harmful. Some may severely affect protein function, while others may have no impact at all. Experimental studies can answer this question, but they are slow, expensive, and not feasible for the thousands of variants identified in next-generation sequencing.
This is where computational tools come in. FATHMM helps prioritize variants for further study or clinical decision-making by using biological principles like sequence conservation and protein structure to predict the impact of an amino acid change.
How FATHMM Works
Hidden Markov Model (HMM)–Based Approach
FATHMM’s original version is built on a Hidden Markov Model (HMM)—a statistical method that analyzes patterns in evolutionary conservation across species.
In simple terms, FATHMM compares the position of the variant in a protein sequence with known patterns of conservation across different species. It assesses whether the amino acid at that position has remained unchanged through evolution, which usually means the site is functionally important. If a variant alters such a position, it’s more likely to be deleterious.
The tool was trained using known pathogenic and benign variants from databases like HGMD (Human Gene Mutation Database) and UniProt, and it uses this training to make predictions about new variants.
What Is a Hidden Markov Model?
A Markov model is a mathematical framework where the next event (or “state”) depends only on the current state—not on how you got there. This is known as the Markov property.
Example:
Imagine you’re modeling weather:
- If today is sunny, there’s a 70% chance tomorrow is sunny, and a 30% chance it rains.
- If today is rainy, there’s a 60% chance tomorrow is rainy, and 40% chance it’s sunny.
The model only uses today’s weather to predict tomorrow—it doesn’t need to know the weather two days ago. That’s a Markov chain.
Now, a Hidden Markov Model means:
- You don’t directly observe the true state (e.g., is a position in a protein “functionally important”?).
- You only observe clues—in this case, the amino acid seen at that position across species.
FATHMM uses these observed patterns to infer whether a variant is in a functionally important (hidden) region.
A Note on the Name “Markov”
The concept is named after Andrey Markov, a Russian mathematician (1856–1922) who formalized this idea of “memoryless” processes. His work laid the foundation for modern probabilistic models, including the Hidden Markov Models used in biology and genetics.
How HMM Differs from Deep Learning
Newer tools like PrimateAI, SpliceAI, and CADD use deep learning, which differs from the rule-based structure of Hidden Markov Models.
FATHMM’s HMM is a statistical, interpretable model based on evolutionary conservation and protein domains. It evaluates each variant using known biological rules and alignment data, making it transparent and easy to understand.
In contrast, deep learning models are data-driven. They learn patterns from massive datasets without being told specific rules. These models combine multiple inputs—sequence, structure, conservation, population data, regulatory elements—and often make highly accurate predictions. However, they are considered “black box” models, meaning the reasoning behind their predictions is often not clear.
While HMMs are ideal for conserved protein domains, deep learning tools are better suited for interpreting non-coding regions and detecting complex, subtle patterns.
FATHMM is a classic example of an HMM-based tool, whereas tools like PrimateAI and SpliceAI represent deep learning–based approaches. They are often used together in clinical variant interpretation.
Key Features of FATHMM
1. HMM-Based Scoring
- Relies on protein sequence alignments across species.
- Focuses on functional conservation within known protein domains.
2. Versions of FATHMM
- FATHMM (2010) – Original version using protein domain conservation.
- FATHMM-MKL (2015) – Adds regulatory data using multiple kernel learning.
- FATHMM-XF (2017) – Extends to non-coding variants, useful in cancer genomics and gene regulation.
3. Scoring System
FATHMM assigns a numerical score to each variant:
- Score ≤ –1.5: Predicted to be pathogenic
- Score ≥ +0.5: Predicted to be benign
- Score between –1.5 and +0.5: Considered of uncertain significance
4. Tissue-Specific Predictions
FATHMM-XF offers tissue-specific predictions, useful in diseases where gene expression varies across tissues (e.g., cancer, brain disorders).
Applications in Clinical Genetics
FATHMM is used in both research and clinical settings to:
- Interpret missense variants in exome sequencing.
- Filter variants of uncertain significance.
- Prioritize mutations for functional validation or reporting.
Limitations
FATHMM, while powerful, has some limitations:
- Best suited for missense variants in conserved coding regions.
- Does not handle indels, splice variants, or structural changes well.
- Its accuracy depends on:
- The availability of evolutionary data for the region.
- The presence of the variant’s context in training datasets.
Like all computational tools, FATHMM should be used in conjunction with clinical and functional data, rather than in isolation.
Example Interpretation
Variant | FATHMM Score | Interpretation |
---|---|---|
c.375G>A (p.Met125Ile) | –2.1 | Likely pathogenic |
c.599T>C (p.Val200Ala) | +1.0 | Likely benign |
c.1024C>T (p.Pro342Ser) | –0.8 | Variant of uncertain significance |
Key Publications & Development Timeline
Original FATHMM (2010)
- Authors: Mark F. Rogers, Yevgeniy Shamilov, et al.
- Paper: FATHMM: Predicting the Functional Consequences of Amino Acid Substitutions
- Journal: Bioinformatics
- DOI: 10.1093/bioinformatics/btq009
FATHMM-MKL (2015)
- Added Multiple Kernel Learning with regulatory and functional annotations
- Authors: Y. Shihab, J. Gough, et al.
- Paper: An integrative approach to predicting the functional effects of non-coding and coding sequence variation
- Journal: Bioinformatics
- DOI: 10.1093/bioinformatics/btv009
FATHMM-XF (2017)
- Extended model to non-coding regions using broader feature sets
- Authors: Y. Shihab, J. Gough, et al.
- Paper: FATHMM-XF: accurate prediction of pathogenic point mutations via extended features
- Journal: Bioinformatics
- DOI: 10.1093/bioinformatics/btx536
Key Contributors
- Mark F. Rogers – Lead developer of the original FATHMM.
- Y. Shihab & Julian Gough – Developers of FATHMM-MKL and XF.
- Shamil Sunyaev – Senior researcher in computational variant prediction.
Conclusion
FATHMM is a trusted, interpretable in silico tool that uses evolutionary conservation through Hidden Markov Models to assess the pathogenicity of missense variants. While deep learning tools have broadened the horizon of variant prediction, FATHMM remains a valuable component in variant interpretation pipelines, particularly for variants in conserved coding regions. Used appropriately, it helps clinicians and researchers navigate the uncertain territory of genetic variants with greater confidence.
Disclaimer
Genecommons uses AI tools to assist content preparation. Genecommons does not own the copyright for any images used on this website unless explicitly stated. All images are used for educational and informational purposes under the doctrine of fair use. If you are a copyright holder and want material removed, contact doctorsarath@outlook.com.
No responses yet