ESM-2 Zero-Shot Mutation Fitness Prediction with ProteinGym Benchmark Validation
TL;DR: Zero-shot prediction of mutation effects on protein fitness using ESM-2 masked marginal scoring — no training data required. Automatically validates against ProteinGym's 217+ DMS assays.
[](https://www.python.org/downloads/)
[](https://github.com/facebookresearch/esm)
[](https://github.com/OATML-Markslab/ProteinGym)
This repository implements a fully automated zero-shot mutation fitness prediction pipeline using ESM-2 protein language models. It generates all single-point mutants for a given protein, scores each using masked marginal log-likelihood ratio (LLR), and optionally validates predictions against the ProteinGym DMS benchmark.
Masked Marginal Scoring (Meier et al., 2021, NeurIPS):
score(X_i → Y_i) = log p(Y_i | x_{-i}) − log p(X_i | x_{-i})
This is the best-performing zero-shot strategy for ESM models — outperforming wild-type marginal and PPPL approaches. The score measures how much more (or less) likely the mutant amino acid is compared to wild-type, conditioned on all other positions.
pip install -r requirements.txt
python run.py # Demo: GFP, 35M model (~5 min)
python run.py --uniprot P42212 --model 650M # GFP with 650M + ProteinGym validation
python run.py --sequence MKTIIALSYIFCLVFA... # Custom protein
ESM-Scan (Totaro et al., 2024, Protein Science) validated that masked marginal achieves the highest Spearman correlation (~0.48–0.56) among zero-shot ESM strategies, comparable to or better than Rosetta ΔΔG.
Reference: [Wiley Online Library — ESM-Scan](https://onlinelibrary.wiley.com/doi/full/10.1002/pro.5221)
GFP (UniProt P42212) with the Sarkisyan et al. 2016 DMS dataset is the most complete single-protein assay in ProteinGym. Using GFP as the demo ensures the highest possible automated validation success rate and reproducibility.
Reference: [AWS Open Data Registry — ProteinGym](https://registry.opendata.aws/proteingym/)
| File | Purpose |
mutation_scores.csv | Full ranked mutant list for agent logging |