The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities

Comprehensive critical review of MM/PBSA and MM/GBSA methods. Covers method formulation, precision, accuracy, solvation, electrostatics, entropy treatment, and performance across systems. The authoritative citation for characterizing the error envelope of these methods.

TL;DR

MM/PBSA sits between docking (fast, low accuracy) and alchemical perturbation (slow, high accuracy). Standard error of the mean for 20 snapshots is 2.6–3.3 kcal/mol; reproducible precision requires 20–50 independent simulations. The method is system-dependent and not predictive for pairs differing by <2.9 kcal/mol.

Key finding

The method works for ranking within a congeneric series but not for absolute binding affinity prediction. Virtual screening enrichment is improved vs raw docking scores, but not reliably so across different targets.

Numbers that matter

ParameterValueUnitsContext
SD of ΔG_bind (20 snapshots, avidin)47–62kJ/mol (11–15 kcal/mol)Worst case charged ligands
Standard error of mean (20 snapshots)11–14kJ/mol (2.6–3.3 kcal/mol)With 20 MD snapshots
Standard error target (1 kJ/mol)requires 20–50 independent simulationsAvidin 7-ligand series
MAD vs experiment (across solvation methods)10–43kJ/mol (2.4–10.3 kcal/mol)After systematic error removal
Correlation coefficient (r²) range0.59–0.93Depending on solvation model
Minimum discriminable ΔG_bind difference~2.9kcal/mol (~12 kJ/mol)Below this, ranking is unreliable
Optimal dielectric constant ε2–4For most systems

What this means for h01 Phase 5 gate

The MM-PBSA ΔG_bind ≤ −6.0 kcal/mol gate in Phase 5 is a pipeline-specific empirical threshold, not a literature-derived universal cutoff. Given the 2.6–3.3 kcal/mol standard error, the −6 kcal/mol gate provides ~2σ separation from the ~0 kcal/mol non-binder baseline. This is the correct framing: the gate tests for nominally favorable binding and filters obvious non-binders, not for absolute affinity prediction.

Limitations

  • Lacks conformational entropy treatment (omitted in most practical applications).
  • Performance varies strongly across systems — no universal threshold exists.
  • Absolute ΔG_bind values carry ≥2 kcal/mol systematic uncertainty.
  • Not accurate enough for predictive drug design; best used for relative ranking within a series.

Connections