Executable Agent Skill

ESM2-small

ESM-2 style protein language model (9.6M params) trained on Swiss-Prot with MLU370

README

9.6M parameter protein language model trained on Swiss-Prot with MLU370 (Cambricon).

Architecture mirrors [ESM-2](https://facebookresearch.github.io/esm/):

Training

Parameter	Value
Data	Swiss-Prot (456,404 train / 22,821 val)
Device	MLU370 (Cambricon) — 1 card
Batch	32 × 512 tokens
Speed	~30K tokens/s
Epochs	5 (~2h/epoch, total ~10h)
Optimizer	AdamW (lr=1e-4, warmup=1000 steps, cosine decay)
Final val loss	0.4170