AbDev — Antibody Developability Assessment

01 — Three-Layer Pipeline

How AbDev Works

AbDev evaluates antibody sequences through three complementary assessment layers, each covering a distinct developability dimension.

Layer 01

Chemical Liability Scanning

Regex-based motif scanning across six liability categories, stratified by CDR vs. framework region. Detects deamidation, oxidation, isomerization, unpaired cysteines, RGD motifs, and N-glycosylation sites.

Sequence-based · O(N) complexity

Layer 02

TAP Physicochemical Profiling

Five TAP metrics from Raybould et al. PNAS 2019 — CDR length, Kyte-Doolittle hydrophobicity, charge asymmetry, isoelectric point, and proline density — benchmarked against 242 clinical-stage antibodies.

Reference-based · z-score classification

Layer 03

Thera-SAbDab Benchmarking

k-mer ($k=3$) sequence identity search against the Oxford OPIG Thera-SAbDab database. Identifies nearest approved/clinical antibody and flags IP risk for highly similar sequences.

k-mer matching · ~30s per query

Step 01

Input Sequence

Provide a single amino acid sequence (VHH nanobody or VH:VL pair for IgG). AbDev auto-detects chain type and assigns IMGT numbering.

from abdev_pipeline import run_abdev

result = run_abdev(
    sequence_input="EVQLVESGGGLVQPGGSL...",
    name="my_antibody",
    out_dir="abdev_results",
)

Step 02

Three-Layer Assessment

Layer 1 scans 6 liability categories. Layer 2 computes 5 TAP metrics with z-scores. Layer 3 queries Thera-SAbDab. All layers run in parallel and converge in ~60s total.

# CLI usage
python abdev_pipeline.py --seq "EVQLV..." --name my_ab
# Batch mode
python abdev_pipeline.py --fasta antibodies.fasta

Step 03

Traffic-Light Scorecard

Composite 0–100 developability score with GREEN / AMBER / RED classification. Each flag includes specific residue positions and chemical rationale for targeted optimization.

result = {
  "developability_score": 0.585,  # Amber
  "warnings": [
    "Low germline identity may increase immunogenicity"
  ]
}

02 — Benchmark Results

Trastuzumab Case Study

FDA-approved HER2-targeting monoclonal antibody (GenBank: AY124691). AbDev completed full assessment in 60 seconds. Overall score: 0.585 — Amber.

0.585

Overall Score

Amber (0.4–0.6)

LOW

Aggregation Risk

VH: 0.025 · VL: 0.009

HIGH

Immunogenicity Risk

Low germline identity

HIGH

Expression Level

Stability score: 0.9

Trastuzumab — Full Scorecard

Layer 1 — Chemical Liability

Aggregation propensity (VH) 0.025 GREEN

Aggregation propensity (VL) 0.009 GREEN

Unpaired cysteines 0 GREEN

N-Glyc sites VH:1 · VL:0 AMBER

Layer 2 — TAP Profiling

GRAVY hydrophobicity (VH) -0.444 GREEN

GRAVY hydrophobicity (VL) -0.349 GREEN

Net charge (VH) +2.5 GREEN

Isoelectric point (VH) 7.41 GREEN

Proline content (VL) 7 residues AMBER

Layer 3 — SAbDab Benchmark

Nearest approved antibody Trastuzumab GREEN

k-mer similarity 100% GREEN

Overall Developability Score

0.585 AMBER

03 — What AbDev Checks

Liability Categories

Layer 1 scans six chemical liability categories, stratified by CDR vs. framework region. Each flag includes the exact IMGT residue position.

Deamidation Risk

N-linked glycosylation sequons: Asn-Gly-Ser/Thr
Asn residues in CDR regions (highest risk)
Products: isoAsp or succinimide → altered binding

Oxidation Risk

Met, Trp, His residues in CDRs
CDR Met/Trp oxidation alters antigen binding
Oxidized Abs can trigger immunogenicity

Isomerization Risk

Asp-Gly (DG) and Asp-Ser (DS) motifs
Common in CDR-L3 and CDR-H3
Can create neo-epitopes

Unpaired Cysteines

Free thiols not in disulfide bonds
Cause misfolding or half-antibody formation
Detected via odd cysteine count per chain

RGD Motifs

Arg-Gly-Asp sequences
Promote non-specific cell adhesion
Safety risk in therapeutic applications

N-Glycosylation Sites

Asn-X-Ser/Thr (X ≠ Pro)
CDR N-glyc sites affect antigen binding
Framework sites less critical

Sequence-only limitation

Layer 1 liability scanning is purely sequence-based and does not account for 3D structural context. The same NXT motif on an exposed CDR loop surface carries different risk than one buried in the protein interior. For surface-exposure-aware analysis, consider combining with ESMFold/ColabFold structure prediction.

04 — Important Notes

Known Limitations

Sequence-only analysis in Layer 1

Liability scanning is regex-based and does not account for 3D structural context. The same NXT motif on an exposed CDR loop vs. buried interior carries different risk.

TAP reference from 2019 clinical antibodies

TAP thresholds are derived from 242 clinical-stage antibodies as of 2019. For bispecific antibodies, nanobody-Fc fusions, or ADC formats, some reference ranges may not apply.

k-mer ≠ structural similarity

Two antibodies with high k-mer sequence similarity may have entirely different CDR loop conformations and thus different antigen-binding characteristics. Layer 3 is a fast screen, not a structural analog.

First run downloads Thera-SAbDab (~50MB)

Layer 3 requires downloading the Thera-SAbDab database on first run. Use --skip-benchmark to run offline for Layers 1 and 2 only.

05 — The Skill

Full SKILL.md Content

The complete executable skill file used by AI agents. Reproduces the full AbDev pipeline from sequence input to Traffic-Light scorecard.

---
name: abdev
description: In-silico developability assessment for therapeutic antibodies and nanobodies. Three-layer pipeline: chemical liability scan, TAP profiling, Thera-SAbDab benchmarking. No GPU, no API keys. Works on CPU in ~60s.
version: 1.0.0
author: Junior Yu, Max Zhang
license: MIT
dependencies: [requests, pandas, numpy, matplotlib, abnumber]
metadata:
  hermes:
    tags: [antibody, developability, drug discovery, nanobody, protein engineering, TAP, PNAS 2019]
    repo: https://github.com/junior1p/AbDev
---

# AbDev: Antibody Developability Assessment

In-silico developability profiling for therapeutic antibodies and nanobodies.

## When to Use This Skill

- Screening antibody or nanobody sequences before wet-lab validation
- Identifying chemical liabilities (deamidation, oxidation, isomerization) early
- Benchmarking new antibody designs against approved clinical therapeutics
- AI agent-driven protein engineering workflows

## Quick Start

```python
from abdev_pipeline import run_abdev

# Single nanobody (VHH)
result = run_abdev(
    sequence_input="EVQLVESGGGLVQPGGSLRLSCAASGFNIKDTYIHWVRQAPGKGLEWVARIYPTNGYTRYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCSRWGGDGFYAMDYWGQGTLVTVSS",
    name="my_nanobody",
    out_dir="abdev_results",
)

# Full IgG (VH:VL pair)
result = run_abdev(
    sequence_input="EVQLVESGGGLVQPGGSLRLSCAAS...:DIQMTQSPSSLSASVGDRVTIT...",
    name="my_igG",
    out_dir="abdev_results",
)
```

## CLI

```bash
# Single sequence
python abdev_pipeline.py --seq "EVQLV..." --name my_vhh

# VH:VL pair
python abdev_pipeline.py --vhvl "VH_sequence:VL_sequence" --name my_igG

# Batch from FASTA
python abdev_pipeline.py --fasta antibodies.fasta

# Skip Thera-SAbDab download (offline mode)
python abdev_pipeline.py --seq "..." --skip-benchmark
```

## Installation

```bash
pip install abdev
# or
git clone https://github.com/junior1p/AbDev.git
cd AbDev && pip install -r requirements.txt
```

## Three-Layer Assessment

### Layer 1: Chemical Liability Scanning

Six liability categories with IMGT numbering:

| Liability | Pattern | High-Risk Location |
|---|---|---|
| Deamidation | N[ST], Asn alone | CDR |
| Oxidation | Met, Trp, His | CDR |
| Isomerization | DG, DS motifs | CDR |
| Unpaired Cys | Odd cysteine count | Any |
| RGD Motif | Arg-Gly-Asp | Any |
| N-Glyc Site | Asn-X-Ser/Thr (X≠P) | CDR |

### Layer 2: TAP Profiling

Five metrics from Raybould et al. PNAS 2019, benchmarked against 242 clinical antibodies:

| Metric | Method | Amber Threshold |
|---|---|---|
| CDR Length | IMGT numbering | VH>17 or VL>17 |
| Hydrophobicity | Kyte-Doolittle GRAVY | z > 1.5 |
| Charge Asymmetry | \|net charge VH - VL\| | > 4.0 |
| Isoelectric Point | Theoretical pI | pI < 6.5 or > 9.0 |
| Proline Density | Framework prolines | > 8% |

### Layer 3: Thera-SAbDab Benchmarking

- k-mer ($k=3$) sequence similarity search
- ~30s per query against数千种临床抗体
- Flags: IP risk (similarity > 90%) or novelty risk (similarity < 40%)

## Output

```json
{
  "antibody_name": "Trastuzumab",
  "developability_score": 0.585,
  "warnings": [
    "Low germline identity may increase immunogenicity"
  ],
  "layer1": { "aggregation_risk": "LOW", "liability_count": 0 },
  "layer2": { "tap_z_scores": {...}, "amber_flags": ["Proline Density"] },
  "layer3": { "nearest_ab": "Trastuzumab", "similarity": 1.0 }
}
```

## Limitations

- Layer 1 is sequence-only, no structural context
- TAP thresholds from 2019 data, may not fit novel formats
- k-mer similarity cannot replace structural similarity analysis
- First run requires Thera-SAbDab download (~50MB)

06 — Reproduce

Clone and Run

Full reproducibility in three commands. AbDev handles everything from sequence input to Traffic-Light scorecard.

# Install via pip (recommended)
pip install abdev

# Or clone the repository
git clone https://github.com/junior1p/AbDev.git
cd AbDev
pip install -r requirements.txt

# Run with Trastuzumab as demo
python abdev_pipeline.py

# Analyze your own antibody
python abdev_pipeline.py --seq "EVQLV..." --name my_antibody

# Batch mode from FASTA
python abdev_pipeline.py --fasta antibodies.fasta

# Start web interface
python abdev_api.py --port 5001

Junior Yu & Max Zhang

Antibody developability assessment, pipeline development, and skill authoring

GitHub AbDev Repo PyPI