Recipe — Fragment Library Filtering Pipeline
P1 recipe — distilled from 2014-schneider-de-novo-molecular-design-book §5.3.1, §5.3.1.1, §6.2 (Durrant-Amaro and Mazanetz-Law-Whittaker chapters). The full filter chain that takes a vendor catalog (10⁶–10⁷ compounds) down to a screening-ready fragment library (10³–10⁴ compounds). Use this when h01 needs to refresh its fragment library beyond the current phase3b.py set, or when a new pharmacochaperone subpocket is opened up.
The filter chain (apply in order — each step removes 50–95%)
Step 1 — Vendor consolidation
- ZINC15/22 (~21M purchasable) is the canonical free source (Schneider 2014 §5.3.1 [31]).
- Evotec EVOsource 2014: 21.1 M from 182 suppliers. Of these, applying rule-of-three + TPSA ≤60 Ų + rotatable bonds ≤3 yields ~94,000 compounds (Schneider 2014 §6.2.5).
- Zuegg & Cooper 2009: 8.2 M from 102 suppliers → 432,000 rule-of-three compliant.
Step 2 — Rule of three filter (verbatim Table 5.1)
See Lipinski Rule of Fives vs Congreve Rule of Threes Reference Table:
- Molecular mass ≤300 Da
- H-bond acceptors ≤3
- H-bond donors ≤3
- clogP ≤3.0
- Rotatable bonds ≤3 (optional but recommended)
- Polar surface area ≤60 Ų (optional but recommended)
Cite: Congreve, Carr, Murray, Jhoti. Drug Discov. Today 8 (2003) 876–877; Schneider 2014 Table 5.1.
Step 3 — Aqueous solubility filter
- Schneider 2014 §6.2.2: ≥1 mM (Mazanetz / Vernalis QSAR-predicted) → in-house cutoff that removes >50% of vendor fragments. ≥2 mM is more stringent (Vernalis lab).
- Use an in-silico QSAR model (DataWarrior / RDKit
MolLogS/ proprietary) when wet measurement is impractical. - Cite: Schneider 2014 §6.2.2 [68, 69].
Step 4 — REOS / PAINS substructure filter
REOS = Rapid Elimination Of Swill (Walters & Murcko / Vertex). PAINS = pan-assay interference compounds (Baell & Holloway 2010).
- Remove epoxides, sulfonate / phosphonate esters, Michael acceptors, aldehydes (frequently false-positive in fragment screens), heteroatom–heteroatom single bonds (Schneider 2014 §5.2.2 [13–15]).
- Bruns & Watson Eli Lilly 2012 (cited in §6.2.4) provides a comprehensive structural-rule set.
- Cite: Schneider 2014 §6.2.4; Baell & Holloway 2010 J. Med. Chem.
Step 5 — 3D-character filter (optional)
- Fraction of sp3 carbons (Fsp3) > 0.3 — favors three-dimensionality (Schneider 2014 §6.2.2 [73]).
- Plane of best fit (PBF): mean heavy-atom distance from PBF, in Ångstroms; recommended for larger fragments only.
- For h01 (E1659A subpocket, 159 ų), Fsp3 is more relevant than PBF (small monocyclic fragments dominate).
Step 6 — Diversity selection (cluster-or-pick)
Two interchangeable strategies (Schneider 2014 §6.2.5 [42, 99–103]):
- Hole-filling: iteratively add the candidate fragment maximally distant from those already in the library. Maximizes coverage.
- Iterative removal: start with full set, iteratively remove the fragment with the most near-neighbors. Maximizes density-of-cluster representation.
Use 2D fingerprints (MDL MACCS keys, ECFP4, UNITY) + Tanimoto distance. Library size target = 1,000–20,000 depending on screening throughput (§6.2.3):
- X-ray crystal soaking: 100–1000 fragments
- NMR cocktail: 1,000–10,000 fragments
- High-concentration biochemical (FCS+plus, Mazanetz 20K library): 20,000+ fragments
Step 7 — Hypothesis-driven focused subset
For h01 (known pocket geometry, mutant hotspot), supplement diverse library with focused subset:
- Substructure search around known anchors (e.g., 3-amino-benzofuran-2-carboxylic acid for h01).
- Pharmacophore-based virtual screen of the diverse set against the receptor structure.
- Docking pre-filter at top 10% by score (Schneider 2014 §5.3.1.1 [22, 55–63]).
- Restrict to compounds carrying click-chemistry handles (azide / alkyne / acid / amine) to ease downstream fragment-grow.
STRC parameter table
| Filter step | h01 cutoff | Source | Output count target |
|---|---|---|---|
| Rule-of-three MW | ≤300 Da | Congreve 2003 / Schneider 2014 Table 5.1 | ~5% of vendor catalog |
| Rule-of-three clogP | ≤3.0 | Congreve 2003 | rejects ~30% remaining |
| Aqueous solubility | ≥1 mM (Mazanetz QSAR) | Schneider 2014 §6.2.2 [68] | rejects ~50% of remaining |
| Reactive substructures (REOS/PAINS) | exclude | Bruns & Watson 2012; Baell 2010 | rejects ~10–20% |
| Diversity selection | hole-filling, ECFP4 + Tanimoto | Schneider 2014 §6.2.5 [42] | 1,000–20,000 fragments |
| Pocket-fit MW (h01-specific) | 180–350 Da | h01 phase3b.py | secondary filter on 159 ų subpocket |
| Pocket-fit volume score | tunable | h01 in-silico | LE-coupled prioritization |
| Docking pre-filter | top 10% Vina score | Schneider 2014 §5.3.1.1 [22] | optional |
Citation pattern for phase3b.py docstring
# Fragment library filtering pipeline.
# Filters: Congreve rule-of-three (MW≤300, clogP≤3, HBD≤3, HBA≤3, rot≤3, TPSA≤60)
# per Schneider 2014 §5.2.2 / Congreve, Carr, Murray, Jhoti 2003 Drug Discov Today.
# Aqueous solubility ≥1 mM (in-silico QSAR) per Schneider 2014 §6.2.2.
# REOS / PAINS substructure exclusion per Bruns & Watson 2012; Baell & Holloway 2010.
# Pocket-fit MW window 180–350 Da is h01-specific (159 ų subpocket); narrower
# than rule-of-three. See [[literature/pharmacochaperone]] row 47.
Related recipes
- After filtering: Recipe — Fragment Optimization Linking Merging Growing for hit elaboration.
- During scoring: Ligand Efficiency Metrics Catalog (LE / LLE / LLE_AT).
- For peptide-fragment libraries (h09 self-assembly motifs): Recipe — Ant Colony Optimization for Peptide Sequence Design.
Connections
[part-of]pharmacochaperone[source]2014-schneider-de-novo-molecular-design-book[applies]index[see-also]Lipinski Rule of Fives vs Congreve Rule of Threes Reference Table[see-also]Ligand Efficiency Metrics Catalog[see-also]Recipe — Fragment Optimization Linking Merging Growing[see-also]Recipe — Receptor-Based Scoring Function Selection