Executable Agent Skill

ESM2-small

ESM-2 style protein language model (9.6M params) trained on Swiss-Prot with MLU370

README

ESM2-small

9.6M parameter protein language model trained on Swiss-Prot with MLU370 (Cambricon).

Architecture mirrors [ESM-2](https://facebookresearch.github.io/esm/):

Training

ParameterValue
DataSwiss-Prot (456,404 train / 22,821 val)
DeviceMLU370 (Cambricon) — 1 card
Batch32 × 512 tokens
Speed~30K tokens/s
Epochs5 (~2h/epoch, total ~10h)
OptimizerAdamW (lr=1e-4, warmup=1000 steps, cosine decay)
Final val loss0.4170

Training Progress

EpochVal LossNotes
10.4195checkpoint ~38MB (EMA)
20.4235
30.4182best so far
40.4185
50.4179final epoch
Final0.4170checkpoint_final_best.pt

Quick Start