2014 Schneider (Ed.) — De novo Molecular Design

Edited monograph, 21 chapters, Wiley-VCH 2014. Provides the layer-above complement to 2007-chipot-free-energy-calculations-book: where Chipot is canonical for binding-free-energy computation, Schneider is canonical for compound construction (de novo, fragment-based, peptide), scoring (receptor-based, ligand-based, multiobjective), and peptide / protein redesign (SME, PSO, ACO; DEE; bioisosterism).

Citation

Schneider, G. (Ed.). De novo Molecular Design. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2014. ISBN-13 978-3-527-33461-2. xxiv + 545 pp.

TL;DR

Twenty-one expert chapters covering the algorithmic side of computational compound construction. Three load-bearing themes for STRC:

  1. Scoring families and their valid use. Receptor-based scoring divides cleanly into force-field (Eq. 1.10), empirical (Eq. 1.11) and knowledge-based (Eq. 1.12) classes. Each has a distinct error mode; consensus or method-choice based on system properties is now standard. Ligand-based scoring uses pharmacophore / pseudoreceptor / shape similarity. Free-energy methods (Ch. 16) are reserved for late-stage congeneric optimization.
  2. Fragment-based discovery as a property-headroom strategy. Lipinski rule of 5 (drugs) and Congreve rule of 3 (fragments) — verbatim in Table 5.1. Ligand efficiency (LE) is the core metric: optimization should aim at ≥0.3 kcal/mol per heavy atom; a plethora of derived metrics (FQ, SILE, GE, LLE, LELP, LLE_AT, KE) handle context (size, lipophilicity, kinetics). A diverse 1000–20000-member fragment library samples fragment-like chemical space better than millions of HTS compounds sample drug-like space.
  3. Sequence-space search for peptide design. For h09’s peptide-hydrogel hypothesis, Ch. 18 is the methods spine: Shannon-entropy library diversity (Eqs. 18.2–18.4), modified Grantham amino-acid distance matrix (Table 18.1), Simulated Molecular Evolution (SME) with Gaussian mutation (Eq. 18.5), Particle Swarm Optimization, and a worked Ant Colony Optimization example (verbatim pseudocode) producing MHC-I octapeptides at 89%/95% accuracy. Peptide stability modifications (cyclization, stapling, end-capping, glycosylation, PASylation) are catalogued in §18.4.

Numbers that matter

The book is methods-heavy; load-bearing numbers are filter cutoffs, illustrative success rates, and one calibrated benchmark. Force-field parameters and binding constants are not tabulated here — cite the original force-field / SAR papers.

ParameterValueUnitsSource (page/§/table)Notes
Lipinski rule of 5 — molecular mass≤500DaTable 5.1 (Ch.5)drug-like cutoff
Lipinski rule of 5 — H-bond acceptors≤10Table 5.1drug-like
Lipinski rule of 5 — H-bond donors≤5Table 5.1drug-like
Lipinski rule of 5 — partition coefficient (clogP)≤5.0Table 5.1drug-like
Congreve rule of 3 — molecular mass≤300DaTable 5.1fragment-like
Congreve rule of 3 — H-bond acceptors≤3Table 5.1fragment-like
Congreve rule of 3 — H-bond donors≤3Table 5.1fragment-like
Congreve rule of 3 — clogP≤3.0Table 5.1fragment-like
Congreve rule of 3 — rotatable bonds≤3Table 5.1fragment-like (optional)
Congreve rule of 3 — polar surface area≤60ŲTable 5.1fragment-like (optional)
LE optimization target (Hopkins 2004 convention)≥0.3kcal/mol per heavy atom§6.4.1typical for fragments selected for elaboration
LE plateau onset (Kuntz max-affinity ceiling)≈ −1.5 (LE per HA at 15+ HA)kcal/mol per heavy atom§6.4.1 [221, 226]binding-energy contribution levels off
Optimized-drug LE expectation (MW 500 Da, IC50 10 nM)0.3kcal/mol per heavy atom§6.4.1implies ~38 heavy atoms
Mean per-atom binding-affinity contribution during optimization0.29kcal/mol per non-H atom§6.4.1 [146, 222]linearity assumption baseline
Target LLE (LiPE) range5–7 (or higher)log units§6.4.1 (4)optimization goal
LELP “Lipinski-compliant” ceiling<16.5§6.4.1 (5)log P / LE ceiling
LELP lead range−10 to +10§6.4.1 (5)optimize toward 0
FQ_Scale formulaLE_Scale = −0.064 + 0.873·exp(−0.026·HAC)§6.4.1 (1) [226]size-corrected LE rescaling
SILE formulaaffinity / HAC^0.3§6.4.1 (2) [227]size-independent LE
LLE_AT formula(0.11·ln10·RT·(LogP − Log(activity)))/HACkcal/mol per HA§6.4.1 (6) [230]Astex size-corrected LLE
KE formulat½ / (0.693·HAC)(time per HA)§6.4.1 [233]kinetic efficiency
Astex generic fragment library size327compounds§6.2.1 [53]drug-fragment library
Mazanetz et al. biochemical fragment library size20,000compounds§6.2.5high-concentration FCS+plus screen
Aqueous solubility cutoff (Vernalis fragment library)≥2mM§6.2.2 [66, 69]removes >50% of vendor fragments
Aqueous solubility cutoff (Mazanetz library)≥1mM§6.2.2 [68]in-house QSAR filter
Vemurafenib starting library size20,000fragments§6.1 [31, 39–41]screened at 200 µM
Vemurafenib hit-call threshold≥30% inhibition at 200 µM§6.1initial 7-azaindole
Drug-like compound count, MW 300–500 Da (Bohacek estimate)10²⁰–10²⁰⁰molecules§6.2.5 [86]combinatorial estimate
Reymond chemical universe (GDB-17, ≤17 heavy atoms)166 × 10⁹molecules§6.2.5 [87], §1.5enumerated
GDB-13 virtual library970 × 10⁶molecules§6.2.1 [64, 65]for MPO de novo design
Drug-like 30-atom space (Durrant ch.5)10⁶³molecules§5.2.2 [19]combinatorial estimate
Fragment 12-atom-or-less space~10⁷molecules§5.3.1.1 [66]combinatorial estimate
Fragment hit rate (LE > 0.3 in TSA/fxnal screen)~5%§5.3.3 [8]typical
HTS hit rate (Pilzulkil case study)0.1%§5.2.3 [26]typical, low
Fragment soaking concentration (X-ray crystallography)25–100mM§5.3.2.6 [18, 64]cocktail soaking
SPR fragment detectability lower bound≥100Da§5.3.2.3 [8]mass change limit
NMR fragment detectability protein-mass upper bound≤40kDa§5.3.2.5 [23]protein-detected NMR
MS fragment detectability protein-mass upper bound≤100kDa§5.3.2.4 [23]electrospray
Receptor concentration for protein-NMR fragment screen>2mg§5.3.2.5 [23, 73]unless cryoprobes
Ligand concentration for ligand-detected NMR1–5mM§5.3.2.5 [1, 18]cocktail screening
Phenprocoumon as Astex-rule-of-3 starting fragmentMW≤300, clogP≤3, HBD≤3§6.1 [28, 30]first FBDD-derived drug (Tipranavir, 2005)
Vemurafenib (PLX4032) reached market2011year§6.1 [31]first drug developed de novo from fragment screen
HCV helicase inhibitor 5 IC50260nM§1.7 [206]LigBuilder optimization
TOPAS CB1 inverse agonist 7 Ki4nM§1.7 [207, 209]from Ki=1500 nM design 6
Plk1 inhibitor compound 16 EC504µM§1.7 [217, 218]LE = 0.66 (Eq. 1.8) — DOGS scaffold-hop
Aurora A inhibitor 18 IC503µM§1.7 [219]DOGS molecule-grow from 17 (~10 µM)
ER de novo design — explicit-solvent FE hit-recovery rate83.3%§16.6 Example 16.1 [4]5/6 actives top-ranked (vs 37.5% for de novo scoring)
ER FE std-dev (explicit / implicit solvent)1.0 / 0.7kcal/mol§16.6 Example 16.1 [4]comparable accuracy
HIV-RT optimization endpoint potency55pM§16.6 Example 16.2 [261]from 5 µM docking hit (Jorgensen group)
FEP precision in well-localized perturbation (STA vs DTA)STA 8–10× more precise§16.6 Example 16.3 [268]for congeneric series
T4 lysozyme L99A/M102Q absolute-FE RMS error1.8kcal/mol§16.6 [219]ITC reference
T4 lysozyme L99A/M102Q relative-FE RMS error (catechol)1.1kcal/mol§16.6 [219]most accurate class
Predicted-vs-experimental pose RMSD (catechol class)1.2ŧ16.6 [219]post-hoc X-ray check
TI relative-FE precision (Westermaier outlook)<0.1kcal/mol§16.9 [101]maximum reachable
Shannon entropy peptide-library cardinality4.32bit (max for 20 residues)§18.2.1 Fig.18.7log₂(20)
MHC-I octapeptide ACO sequence-space25.6 × 10⁹ (= 20⁸)sequences§18.3decision space
ACO H-2K^b stabilizing accuracy89%§18.3designed peptides confirmed
ACO H-2K^b nonstabilizing accuracy95%§18.3designed peptides confirmed
ACO pheromone initialization0.05per residue position§18.3 pseudocodeuniform prior
ACO pheromone bounds[0.1, 0.9]§18.3 pseudocodeescape early convergence
ACO update factor formula(Fitness − 0.5)/100§18.3 pseudocodelinear with fitness
α-conotoxin MII cyclic-derivative plasma stability gain+15–20%§18.4.1 [76]EndoGluc protease test
α-conotoxin cyclic distance to bridge (N-to-C terminus)~11 (≤15)ŧ18.4.1cyclization geometry constraint
α-conotoxin cMII-6/7 IC50 (nicotinic acetylcholine receptor)~1µM§18.4.1 [76]activity preserved post-cyclization
Bioster database transformation count (v12.1, 2014)~26,000bioisosteric pairs§17.3.1.1bioisostere knowledge base
Cambridge Structure Database (CSD) entries (2014)541,748crystal structures§17.3.1.2 [13]drug-like subset ~60,000
ChEMBL distinct compounds (2014)1,213,239compounds§17.3.1.3 [16]bioactivity database
ChEMBL bioactivity measurements (2014)10,129,256data points§17.3.1.3over 9,003 targets
CATS pharmacophore-pair vector length150bits§17.3.2.2 [25]1–10 bond distances × 15 type pairs

Method essentials

Per-chapter takeaways (only what STRC needs to either use or cite):

  • Ch. 1 (Schneider, Baringhaus): receptor-based scoring decomposes into three classes — physically motivated FFs (Eq. 1.10 LJ + Coulomb), empirical regression scoring (Eq. 1.11 weighted sum of interaction-type counts), and knowledge-based scoring (Eq. 1.12 Boltzmann inversion of atom-pair frequencies). Ligand-based scoring uses pharmacophore/pseudoreceptor/shape descriptors. Table 1.3 catalogs ~40 named de novo design programs by year and scoring class — useful provenance for any “we used X-style scoring” claim. Fragment-based assembly relies on additivity (linker bond ≈ free) — but Fig. 1.22 documents non-additivity to −14 kJ/mol (factor Xa Ki=2 nM). Reaction-driven assembly (RECAP, DOGS, SYNOPSIS) suggests synthesis routes alongside structures.
  • Ch. 5 (Durrant, Amaro): Pilzulkil/Goode tutorial. Distills HTS vs FBDD into protocols. Three fragment-optimization strategies: linking (rare success — linker rigidity is hard), merging (requires overlap), and growing (most reliable — anchor fragment + medicinal-chemistry expansion). Click-chemistry (azide-alkyne Huisgen → 1,2,3-triazole) is the prototypical synthesis route for combinatorial fragment-grow. Detection-method matrix in §5.3.2 covers six biophysical assays with sensitivity/protein-consumption/MW-limit profiles.
  • Ch. 6 (Mazanetz, Law, Whittaker): the FBDD primer. Library-design constraints: rule of 3, ≥1–2 mM solubility, REOS substructure filter (PAINS), 2D fingerprint diversity selection (hole-filling or iterative removal). Screening-method choice: X-ray (highest information, ≤1000 fragments), NMR (40-kDa protein limit), SPR (KD + binding kinetics), thermal-shift (cheap, noisy), ITC (full thermodynamics, expensive in protein), high-concentration biochemical (functional readout, false-positive-prone, ≥20,000 compounds). Efficiency-metrics catalog §6.4.1 is the cross-cutting decision tool — captured verbatim in Ligand Efficiency Metrics Catalog.
  • Ch. 16 (Westermaier, Hubbard): decision matrix for FE methods. MM-PBSA / MM-GBSA: VS rescore, large structural change tolerated, 5–8 kcal/mol error band. LIE: empirical, depends on training set, treats electrostatics well. TI / FEP: best for congeneric small-modification series, ≤0.5 kcal/mol achievable on toy systems, ~1–2 kcal/mol on protein systems. PMF: needed when reaction coordinate matters (binding pathway, induced fit). Best practices §16.8: soft-core potentials (Shirts–Pande parameters); never leave partial charge while turning off LJ; transform electrostatics and LJ separately; insert/delete is less efficient than mutate. STA is 8–10× more precise than DTA for small congeneric perturbations; DTA is mandatory when geometries differ (e.g., size-changing mutations).
  • Ch. 17 (Firth, Blagg, Brown): bioisostere = “groups or molecules with chemical and physical similarities producing broadly similar biological properties” (Thornber 1979; Burger). Three replacement classes: knowledge-based (Bioster, CSD, ChEMBL/MMP, SwissBioisostere); descriptor-based (CATS pharmacophore pairs §17.3.2.2; Hammett σ + Hansch π Craig plot §17.3.2.1); shape-based. Drug Guru fully enumerates SMIRKS-rule space; IADE iteratively searches.
  • Ch. 18 (Hiss, Schneider): the spine for h09 peptide design. Shannon entropy (Eqs. 18.2–18.4) quantifies library diversity in bits; max for 20-residue alphabet is log₂(20) = 4.32 bit per position. Diversity vs hit-rate is monotonic — Fig. 18.4(b) shows 10 antibody-binding libraries, increasing entropy → decreasing hit count. Three nature-inspired algorithms: SME (Gaussian-distance mutation, Eq. 18.5, Table 18.1 distance matrix), PSO (social + personal memory; not yet applied to peptides), ACO (verbatim pseudocode in §18.3, MHC-I octapeptide design with ANN fitness — accuracies 89% / 95%). Peptide stability mods §18.4: backbone cyclization (≤15 Å termini distance, +15–20% plasma stability), all-hydrocarbon stapling (Verdine; (i,i+3) cross-links least helix-distorting), end-capping (N-acetylation / C-amidation; PASylation as PEG alternative), glycosylation (GlcNAc enzymatic transglycosylation).
  • Ch. 19 (Saven): sequence-search algorithms for protein design — Monte Carlo, Dead-End Elimination (DEE; Desmet 1992), Self-Consistent Mean Field (SCMF; Koehl–Delarue 1994), probabilistic / FASTER (Allen–Mayo 2010). Application zoo: Mayo’s Top7 fold (Kuhlman 2003), DeGrado’s Due Ferri / four-helix metalloproteins, Baker’s Kemp eliminase / Diels-Alder enzyme / Δgliadin peptidase, water-soluble KcsA / nicotinic-AChR analogs. Useful as method survey for h26 cysteine engineering and any future de novo bridge-domain design.

Limitations

  • Methods anthology, not a parameter handbook. For force-field constants, water models, and FF benchmarks, cite the original force-field papers.
  • Pre-AlphaFold (2014); zero coverage of structure prediction by deep learning. For h26 / h01 modern work, AF3 / RoseTTAFold / Boltz are NOT in this book — use 2024+ literature.
  • Chapter 1 software catalog (Table 1.3) ends in 2013. Modern de novo programs (REINVENT, Bidd, MolGAN, Pocket2Mol, DiffDock, etc.) absent.
  • Free-energy chapter (16) cites Chipot 2007 as reference and is complementary, not redundant. Use both: Chipot for theory, Westermaier-Hubbard for the LO-stage decision matrix and pharmaceutical case studies.
  • No quantitative benchmark tables for binding-affinity prediction methods. RMS errors quoted are illustrative single-system values (e.g., T4 lysozyme L99A/M102Q absolute FE = 1.8 kcal/mol).
  • Bioster, CSD, ChEMBL counts are 2014 snapshots. ChEMBL is now ChEMBL34+ (2024) with millions more activities.

Relevance to STRC

  • index — primary consumer.
    • Phase 3c v4 fragment-grow on 3-amino-benzofuran-2-COOH scaffold should follow §6.5–6.7 (linking/merging/growing) decision tree → see Recipe — Fragment Optimization Linking Merging Growing. Growing is the right strategy for a single-anchor pharmacophore.
    • Phase 4 receptor-based scoring (Vina, AutoDock) belongs to the force-field family (Eq. 1.10). Phase 4f MM-GBSA is a physically motivated continuum-corrected empirical scoring — distinct from Phase 4b Vina. Citing Schneider 2014 §1.6.1 + Westermaier-Hubbard §16.3 in phase4f.py docstring would close a long-standing literature gap.
    • Phase 3b fragment filter pipeline already uses Congreve rule-of-3 (per pharmacochaperone row 11, audit “fixed”). Now the citation backs it: Schneider-Baringhaus Ch. 1 + Durrant-Amaro Ch. 5 Table 5.1.
    • Hit-prioritization from v4 fragment library should track LE and LLE_AT (Astex size-corrected LiPE) per §6.4.1. See Ligand Efficiency Metrics Catalog — LE alone biases toward small fragments; LLE filters out lipophilic-promiscuity traps.
    • Phase 7 / 8 cross-target panel: §17.4 Drug Guru + IADE workflow templates are useful for proposing bioisostere replacements when off-target hits force scaffold modification.
  • index — primary consumer for peptide design.
  • index — secondary consumer.
    • Cysteine-engineering bioisostere search: §17.3.2 descriptor methods (CATS, Wagener-Lommerse fragment pharmacophore) provide a literature-first basis for proposing alternative crosslinking residues if the AF3-predicted A1078C/S1080C/S1579C disulfides do not stabilize the dimer.
    • Side-chain repacking strategy for the cysteine triple mutant: §19.2.2 + §19.2.6 (DEE / SCMF / probabilistic search) — useful method-list when justifying which Rosetta protocol to use.
  • STRC Computational Scripts Inventory — every receptor-based scoring script (phase4*, phase5*) can now claim a textbook citation for its scoring-function class via this paper note.

Connections