2014 Schneider (Ed.) — De novo Molecular Design
Edited monograph, 21 chapters, Wiley-VCH 2014. Provides the layer-above complement to 2007-chipot-free-energy-calculations-book: where Chipot is canonical for binding-free-energy computation, Schneider is canonical for compound construction (de novo, fragment-based, peptide), scoring (receptor-based, ligand-based, multiobjective), and peptide / protein redesign (SME, PSO, ACO; DEE; bioisosterism).
Citation
Schneider, G. (Ed.). De novo Molecular Design. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2014. ISBN-13 978-3-527-33461-2. xxiv + 545 pp.
TL;DR
Twenty-one expert chapters covering the algorithmic side of computational compound construction. Three load-bearing themes for STRC:
- Scoring families and their valid use. Receptor-based scoring divides cleanly into force-field (Eq. 1.10), empirical (Eq. 1.11) and knowledge-based (Eq. 1.12) classes. Each has a distinct error mode; consensus or method-choice based on system properties is now standard. Ligand-based scoring uses pharmacophore / pseudoreceptor / shape similarity. Free-energy methods (Ch. 16) are reserved for late-stage congeneric optimization.
- Fragment-based discovery as a property-headroom strategy. Lipinski rule of 5 (drugs) and Congreve rule of 3 (fragments) — verbatim in Table 5.1. Ligand efficiency (LE) is the core metric: optimization should aim at ≥0.3 kcal/mol per heavy atom; a plethora of derived metrics (FQ, SILE, GE, LLE, LELP, LLE_AT, KE) handle context (size, lipophilicity, kinetics). A diverse 1000–20000-member fragment library samples fragment-like chemical space better than millions of HTS compounds sample drug-like space.
- Sequence-space search for peptide design. For h09’s peptide-hydrogel hypothesis, Ch. 18 is the methods spine: Shannon-entropy library diversity (Eqs. 18.2–18.4), modified Grantham amino-acid distance matrix (Table 18.1), Simulated Molecular Evolution (SME) with Gaussian mutation (Eq. 18.5), Particle Swarm Optimization, and a worked Ant Colony Optimization example (verbatim pseudocode) producing MHC-I octapeptides at 89%/95% accuracy. Peptide stability modifications (cyclization, stapling, end-capping, glycosylation, PASylation) are catalogued in §18.4.
Numbers that matter
The book is methods-heavy; load-bearing numbers are filter cutoffs, illustrative success rates, and one calibrated benchmark. Force-field parameters and binding constants are not tabulated here — cite the original force-field / SAR papers.
| Parameter | Value | Units | Source (page/§/table) | Notes |
|---|---|---|---|---|
| Lipinski rule of 5 — molecular mass | ≤500 | Da | Table 5.1 (Ch.5) | drug-like cutoff |
| Lipinski rule of 5 — H-bond acceptors | ≤10 | — | Table 5.1 | drug-like |
| Lipinski rule of 5 — H-bond donors | ≤5 | — | Table 5.1 | drug-like |
| Lipinski rule of 5 — partition coefficient (clogP) | ≤5.0 | — | Table 5.1 | drug-like |
| Congreve rule of 3 — molecular mass | ≤300 | Da | Table 5.1 | fragment-like |
| Congreve rule of 3 — H-bond acceptors | ≤3 | — | Table 5.1 | fragment-like |
| Congreve rule of 3 — H-bond donors | ≤3 | — | Table 5.1 | fragment-like |
| Congreve rule of 3 — clogP | ≤3.0 | — | Table 5.1 | fragment-like |
| Congreve rule of 3 — rotatable bonds | ≤3 | — | Table 5.1 | fragment-like (optional) |
| Congreve rule of 3 — polar surface area | ≤60 | Ų | Table 5.1 | fragment-like (optional) |
| LE optimization target (Hopkins 2004 convention) | ≥0.3 | kcal/mol per heavy atom | §6.4.1 | typical for fragments selected for elaboration |
| LE plateau onset (Kuntz max-affinity ceiling) | ≈ −1.5 (LE per HA at 15+ HA) | kcal/mol per heavy atom | §6.4.1 [221, 226] | binding-energy contribution levels off |
| Optimized-drug LE expectation (MW 500 Da, IC50 10 nM) | 0.3 | kcal/mol per heavy atom | §6.4.1 | implies ~38 heavy atoms |
| Mean per-atom binding-affinity contribution during optimization | 0.29 | kcal/mol per non-H atom | §6.4.1 [146, 222] | linearity assumption baseline |
| Target LLE (LiPE) range | 5–7 (or higher) | log units | §6.4.1 (4) | optimization goal |
| LELP “Lipinski-compliant” ceiling | <16.5 | — | §6.4.1 (5) | log P / LE ceiling |
| LELP lead range | −10 to +10 | — | §6.4.1 (5) | optimize toward 0 |
| FQ_Scale formula | LE_Scale = −0.064 + 0.873·exp(−0.026·HAC) | — | §6.4.1 (1) [226] | size-corrected LE rescaling |
| SILE formula | affinity / HAC^0.3 | — | §6.4.1 (2) [227] | size-independent LE |
| LLE_AT formula | (0.11·ln10·RT·(LogP − Log(activity)))/HAC | kcal/mol per HA | §6.4.1 (6) [230] | Astex size-corrected LLE |
| KE formula | t½ / (0.693·HAC) | (time per HA) | §6.4.1 [233] | kinetic efficiency |
| Astex generic fragment library size | 327 | compounds | §6.2.1 [53] | drug-fragment library |
| Mazanetz et al. biochemical fragment library size | 20,000 | compounds | §6.2.5 | high-concentration FCS+plus screen |
| Aqueous solubility cutoff (Vernalis fragment library) | ≥2 | mM | §6.2.2 [66, 69] | removes >50% of vendor fragments |
| Aqueous solubility cutoff (Mazanetz library) | ≥1 | mM | §6.2.2 [68] | in-house QSAR filter |
| Vemurafenib starting library size | 20,000 | fragments | §6.1 [31, 39–41] | screened at 200 µM |
| Vemurafenib hit-call threshold | ≥30% inhibition at 200 µM | — | §6.1 | initial 7-azaindole |
| Drug-like compound count, MW 300–500 Da (Bohacek estimate) | 10²⁰–10²⁰⁰ | molecules | §6.2.5 [86] | combinatorial estimate |
| Reymond chemical universe (GDB-17, ≤17 heavy atoms) | 166 × 10⁹ | molecules | §6.2.5 [87], §1.5 | enumerated |
| GDB-13 virtual library | 970 × 10⁶ | molecules | §6.2.1 [64, 65] | for MPO de novo design |
| Drug-like 30-atom space (Durrant ch.5) | 10⁶³ | molecules | §5.2.2 [19] | combinatorial estimate |
| Fragment 12-atom-or-less space | ~10⁷ | molecules | §5.3.1.1 [66] | combinatorial estimate |
| Fragment hit rate (LE > 0.3 in TSA/fxnal screen) | ~5 | % | §5.3.3 [8] | typical |
| HTS hit rate (Pilzulkil case study) | 0.1 | % | §5.2.3 [26] | typical, low |
| Fragment soaking concentration (X-ray crystallography) | 25–100 | mM | §5.3.2.6 [18, 64] | cocktail soaking |
| SPR fragment detectability lower bound | ≥100 | Da | §5.3.2.3 [8] | mass change limit |
| NMR fragment detectability protein-mass upper bound | ≤40 | kDa | §5.3.2.5 [23] | protein-detected NMR |
| MS fragment detectability protein-mass upper bound | ≤100 | kDa | §5.3.2.4 [23] | electrospray |
| Receptor concentration for protein-NMR fragment screen | >2 | mg | §5.3.2.5 [23, 73] | unless cryoprobes |
| Ligand concentration for ligand-detected NMR | 1–5 | mM | §5.3.2.5 [1, 18] | cocktail screening |
| Phenprocoumon as Astex-rule-of-3 starting fragment | MW≤300, clogP≤3, HBD≤3 | — | §6.1 [28, 30] | first FBDD-derived drug (Tipranavir, 2005) |
| Vemurafenib (PLX4032) reached market | 2011 | year | §6.1 [31] | first drug developed de novo from fragment screen |
| HCV helicase inhibitor 5 IC50 | 260 | nM | §1.7 [206] | LigBuilder optimization |
| TOPAS CB1 inverse agonist 7 Ki | 4 | nM | §1.7 [207, 209] | from Ki=1500 nM design 6 |
| Plk1 inhibitor compound 16 EC50 | 4 | µM | §1.7 [217, 218] | LE = 0.66 (Eq. 1.8) — DOGS scaffold-hop |
| Aurora A inhibitor 18 IC50 | 3 | µM | §1.7 [219] | DOGS molecule-grow from 17 (~10 µM) |
| ER de novo design — explicit-solvent FE hit-recovery rate | 83.3 | % | §16.6 Example 16.1 [4] | 5/6 actives top-ranked (vs 37.5% for de novo scoring) |
| ER FE std-dev (explicit / implicit solvent) | 1.0 / 0.7 | kcal/mol | §16.6 Example 16.1 [4] | comparable accuracy |
| HIV-RT optimization endpoint potency | 55 | pM | §16.6 Example 16.2 [261] | from 5 µM docking hit (Jorgensen group) |
| FEP precision in well-localized perturbation (STA vs DTA) | STA 8–10× more precise | — | §16.6 Example 16.3 [268] | for congeneric series |
| T4 lysozyme L99A/M102Q absolute-FE RMS error | 1.8 | kcal/mol | §16.6 [219] | ITC reference |
| T4 lysozyme L99A/M102Q relative-FE RMS error (catechol) | 1.1 | kcal/mol | §16.6 [219] | most accurate class |
| Predicted-vs-experimental pose RMSD (catechol class) | 1.2 | Å | §16.6 [219] | post-hoc X-ray check |
| TI relative-FE precision (Westermaier outlook) | <0.1 | kcal/mol | §16.9 [101] | maximum reachable |
| Shannon entropy peptide-library cardinality | 4.32 | bit (max for 20 residues) | §18.2.1 Fig.18.7 | log₂(20) |
| MHC-I octapeptide ACO sequence-space | 25.6 × 10⁹ (= 20⁸) | sequences | §18.3 | decision space |
| ACO H-2K^b stabilizing accuracy | 89 | % | §18.3 | designed peptides confirmed |
| ACO H-2K^b nonstabilizing accuracy | 95 | % | §18.3 | designed peptides confirmed |
| ACO pheromone initialization | 0.05 | per residue position | §18.3 pseudocode | uniform prior |
| ACO pheromone bounds | [0.1, 0.9] | — | §18.3 pseudocode | escape early convergence |
| ACO update factor formula | (Fitness − 0.5)/100 | — | §18.3 pseudocode | linear with fitness |
| α-conotoxin MII cyclic-derivative plasma stability gain | +15–20 | % | §18.4.1 [76] | EndoGluc protease test |
| α-conotoxin cyclic distance to bridge (N-to-C terminus) | ~11 (≤15) | Å | §18.4.1 | cyclization geometry constraint |
| α-conotoxin cMII-6/7 IC50 (nicotinic acetylcholine receptor) | ~1 | µM | §18.4.1 [76] | activity preserved post-cyclization |
| Bioster database transformation count (v12.1, 2014) | ~26,000 | bioisosteric pairs | §17.3.1.1 | bioisostere knowledge base |
| Cambridge Structure Database (CSD) entries (2014) | 541,748 | crystal structures | §17.3.1.2 [13] | drug-like subset ~60,000 |
| ChEMBL distinct compounds (2014) | 1,213,239 | compounds | §17.3.1.3 [16] | bioactivity database |
| ChEMBL bioactivity measurements (2014) | 10,129,256 | data points | §17.3.1.3 | over 9,003 targets |
| CATS pharmacophore-pair vector length | 150 | bits | §17.3.2.2 [25] | 1–10 bond distances × 15 type pairs |
Method essentials
Per-chapter takeaways (only what STRC needs to either use or cite):
- Ch. 1 (Schneider, Baringhaus): receptor-based scoring decomposes into three classes — physically motivated FFs (Eq. 1.10 LJ + Coulomb), empirical regression scoring (Eq. 1.11 weighted sum of interaction-type counts), and knowledge-based scoring (Eq. 1.12 Boltzmann inversion of atom-pair frequencies). Ligand-based scoring uses pharmacophore/pseudoreceptor/shape descriptors. Table 1.3 catalogs ~40 named de novo design programs by year and scoring class — useful provenance for any “we used X-style scoring” claim. Fragment-based assembly relies on additivity (linker bond ≈ free) — but Fig. 1.22 documents non-additivity to −14 kJ/mol (factor Xa Ki=2 nM). Reaction-driven assembly (RECAP, DOGS, SYNOPSIS) suggests synthesis routes alongside structures.
- Ch. 5 (Durrant, Amaro): Pilzulkil/Goode tutorial. Distills HTS vs FBDD into protocols. Three fragment-optimization strategies: linking (rare success — linker rigidity is hard), merging (requires overlap), and growing (most reliable — anchor fragment + medicinal-chemistry expansion). Click-chemistry (azide-alkyne Huisgen → 1,2,3-triazole) is the prototypical synthesis route for combinatorial fragment-grow. Detection-method matrix in §5.3.2 covers six biophysical assays with sensitivity/protein-consumption/MW-limit profiles.
- Ch. 6 (Mazanetz, Law, Whittaker): the FBDD primer. Library-design constraints: rule of 3, ≥1–2 mM solubility, REOS substructure filter (PAINS), 2D fingerprint diversity selection (hole-filling or iterative removal). Screening-method choice: X-ray (highest information, ≤1000 fragments), NMR (40-kDa protein limit), SPR (KD + binding kinetics), thermal-shift (cheap, noisy), ITC (full thermodynamics, expensive in protein), high-concentration biochemical (functional readout, false-positive-prone, ≥20,000 compounds). Efficiency-metrics catalog §6.4.1 is the cross-cutting decision tool — captured verbatim in Ligand Efficiency Metrics Catalog.
- Ch. 16 (Westermaier, Hubbard): decision matrix for FE methods. MM-PBSA / MM-GBSA: VS rescore, large structural change tolerated, 5–8 kcal/mol error band. LIE: empirical, depends on training set, treats electrostatics well. TI / FEP: best for congeneric small-modification series, ≤0.5 kcal/mol achievable on toy systems, ~1–2 kcal/mol on protein systems. PMF: needed when reaction coordinate matters (binding pathway, induced fit). Best practices §16.8: soft-core potentials (Shirts–Pande parameters); never leave partial charge while turning off LJ; transform electrostatics and LJ separately; insert/delete is less efficient than mutate. STA is 8–10× more precise than DTA for small congeneric perturbations; DTA is mandatory when geometries differ (e.g., size-changing mutations).
- Ch. 17 (Firth, Blagg, Brown): bioisostere = “groups or molecules with chemical and physical similarities producing broadly similar biological properties” (Thornber 1979; Burger). Three replacement classes: knowledge-based (Bioster, CSD, ChEMBL/MMP, SwissBioisostere); descriptor-based (CATS pharmacophore pairs §17.3.2.2; Hammett σ + Hansch π Craig plot §17.3.2.1); shape-based. Drug Guru fully enumerates SMIRKS-rule space; IADE iteratively searches.
- Ch. 18 (Hiss, Schneider): the spine for h09 peptide design. Shannon entropy (Eqs. 18.2–18.4) quantifies library diversity in bits; max for 20-residue alphabet is log₂(20) = 4.32 bit per position. Diversity vs hit-rate is monotonic — Fig. 18.4(b) shows 10 antibody-binding libraries, increasing entropy → decreasing hit count. Three nature-inspired algorithms: SME (Gaussian-distance mutation, Eq. 18.5, Table 18.1 distance matrix), PSO (social + personal memory; not yet applied to peptides), ACO (verbatim pseudocode in §18.3, MHC-I octapeptide design with ANN fitness — accuracies 89% / 95%). Peptide stability mods §18.4: backbone cyclization (≤15 Å termini distance, +15–20% plasma stability), all-hydrocarbon stapling (Verdine; (i,i+3) cross-links least helix-distorting), end-capping (N-acetylation / C-amidation; PASylation as PEG alternative), glycosylation (GlcNAc enzymatic transglycosylation).
- Ch. 19 (Saven): sequence-search algorithms for protein design — Monte Carlo, Dead-End Elimination (DEE; Desmet 1992), Self-Consistent Mean Field (SCMF; Koehl–Delarue 1994), probabilistic / FASTER (Allen–Mayo 2010). Application zoo: Mayo’s Top7 fold (Kuhlman 2003), DeGrado’s Due Ferri / four-helix metalloproteins, Baker’s Kemp eliminase / Diels-Alder enzyme / Δgliadin peptidase, water-soluble KcsA / nicotinic-AChR analogs. Useful as method survey for h26 cysteine engineering and any future de novo bridge-domain design.
Limitations
- Methods anthology, not a parameter handbook. For force-field constants, water models, and FF benchmarks, cite the original force-field papers.
- Pre-AlphaFold (2014); zero coverage of structure prediction by deep learning. For h26 / h01 modern work, AF3 / RoseTTAFold / Boltz are NOT in this book — use 2024+ literature.
- Chapter 1 software catalog (Table 1.3) ends in 2013. Modern de novo programs (REINVENT, Bidd, MolGAN, Pocket2Mol, DiffDock, etc.) absent.
- Free-energy chapter (16) cites Chipot 2007 as reference and is complementary, not redundant. Use both: Chipot for theory, Westermaier-Hubbard for the LO-stage decision matrix and pharmaceutical case studies.
- No quantitative benchmark tables for binding-affinity prediction methods. RMS errors quoted are illustrative single-system values (e.g., T4 lysozyme L99A/M102Q absolute FE = 1.8 kcal/mol).
- Bioster, CSD, ChEMBL counts are 2014 snapshots. ChEMBL is now ChEMBL34+ (2024) with millions more activities.
Relevance to STRC
- index — primary consumer.
- Phase 3c v4 fragment-grow on
3-amino-benzofuran-2-COOHscaffold should follow §6.5–6.7 (linking/merging/growing) decision tree → see Recipe — Fragment Optimization Linking Merging Growing. Growing is the right strategy for a single-anchor pharmacophore. - Phase 4 receptor-based scoring (Vina, AutoDock) belongs to the force-field family (Eq. 1.10). Phase 4f MM-GBSA is a physically motivated continuum-corrected empirical scoring — distinct from Phase 4b Vina. Citing Schneider 2014 §1.6.1 + Westermaier-Hubbard §16.3 in
phase4f.pydocstring would close a long-standing literature gap. - Phase 3b fragment filter pipeline already uses Congreve rule-of-3 (per pharmacochaperone row 11, audit “fixed”). Now the citation backs it: Schneider-Baringhaus Ch. 1 + Durrant-Amaro Ch. 5 Table 5.1.
- Hit-prioritization from v4 fragment library should track LE and LLE_AT (Astex size-corrected LiPE) per §6.4.1. See Ligand Efficiency Metrics Catalog — LE alone biases toward small fragments; LLE filters out lipophilic-promiscuity traps.
- Phase 7 / 8 cross-target panel: §17.4 Drug Guru + IADE workflow templates are useful for proposing bioisostere replacements when off-target hits force scaffold modification.
- Phase 3c v4 fragment-grow on
- index — primary consumer for peptide design.
- For Phase 2c WH2-bundling and any RADA16 / EAK16 sequence variation, ACO with ANN fitness (Ch. 18.3) is a literature-first approach — captured verbatim in Recipe — Ant Colony Optimization for Peptide Sequence Design.
- Library-diversity decisions (e.g., how many sequences to test for self-assembly) should cite Shannon entropy framework — see Fragment Additivity Assumption and Superadditivity (relevance: peptide-blocked self-assembly is non-additive; treat carefully) and the diversity discussion in the ACO recipe.
- Therapeutic-stability path: any peptide entering rabbit / mouse delivery experiments needs §18.4 modifications — see Recipe — Therapeutic Peptide Stability Modifications.
- Amino-acid mutation severity for SME / ACO uses the modified Grantham matrix from Table 18.1 — verbatim in Amino Acid Physicochemical Distance Matrix Grantham Modified. Useful for h09 ablation series (Glu → Gln, Lys → Arg conservative swaps).
- index — secondary consumer.
- Cysteine-engineering bioisostere search: §17.3.2 descriptor methods (CATS, Wagener-Lommerse fragment pharmacophore) provide a literature-first basis for proposing alternative crosslinking residues if the AF3-predicted A1078C/S1080C/S1579C disulfides do not stabilize the dimer.
- Side-chain repacking strategy for the cysteine triple mutant: §19.2.2 + §19.2.6 (DEE / SCMF / probabilistic search) — useful method-list when justifying which Rosetta protocol to use.
- STRC Computational Scripts Inventory — every receptor-based scoring script (
phase4*,phase5*) can now claim a textbook citation for its scoring-function class via this paper note.
Connections
[part-of]pharmacochaperone[part-of]free-energy-methods[part-of]rada16-geometry[source]2014-schneider-baringhaus-de-novo-molecular-design-book[applies]index[applies]index[applies]index[see-also]2007-chipot-free-energy-calculations-book[see-also]Lipinski Rule of Fives vs Congreve Rule of Threes Reference Table[see-also]De Novo Design Software Scoring Strategy Catalog[see-also]Ligand Efficiency Metrics Catalog[see-also]Amino Acid Physicochemical Distance Matrix Grantham Modified[see-also]Recipe — Receptor-Based Scoring Function Selection[see-also]Recipe — Fragment Library Filtering Pipeline[see-also]Recipe — Fragment Optimization Linking Merging Growing[see-also]Recipe — Ant Colony Optimization for Peptide Sequence Design[see-also]Recipe — Therapeutic Peptide Stability Modifications[see-also]Fragment Additivity Assumption and Superadditivity[see-also]Druggability vs Ligandability Distinction