Recipe — Fragment Library Filtering Pipeline

P1 recipe — distilled from 2014-schneider-de-novo-molecular-design-book §5.3.1, §5.3.1.1, §6.2 (Durrant-Amaro and Mazanetz-Law-Whittaker chapters). The full filter chain that takes a vendor catalog (10⁶–10⁷ compounds) down to a screening-ready fragment library (10³–10⁴ compounds). Use this when h01 needs to refresh its fragment library beyond the current phase3b.py set, or when a new pharmacochaperone subpocket is opened up.

The filter chain (apply in order — each step removes 50–95%)

Step 1 — Vendor consolidation

  • ZINC15/22 (~21M purchasable) is the canonical free source (Schneider 2014 §5.3.1 [31]).
  • Evotec EVOsource 2014: 21.1 M from 182 suppliers. Of these, applying rule-of-three + TPSA ≤60 Ų + rotatable bonds ≤3 yields ~94,000 compounds (Schneider 2014 §6.2.5).
  • Zuegg & Cooper 2009: 8.2 M from 102 suppliers → 432,000 rule-of-three compliant.

Step 2 — Rule of three filter (verbatim Table 5.1)

See Lipinski Rule of Fives vs Congreve Rule of Threes Reference Table:

  • Molecular mass ≤300 Da
  • H-bond acceptors ≤3
  • H-bond donors ≤3
  • clogP ≤3.0
  • Rotatable bonds ≤3 (optional but recommended)
  • Polar surface area ≤60 Ų (optional but recommended)

Cite: Congreve, Carr, Murray, Jhoti. Drug Discov. Today 8 (2003) 876–877; Schneider 2014 Table 5.1.

Step 3 — Aqueous solubility filter

  • Schneider 2014 §6.2.2: ≥1 mM (Mazanetz / Vernalis QSAR-predicted) → in-house cutoff that removes >50% of vendor fragments. ≥2 mM is more stringent (Vernalis lab).
  • Use an in-silico QSAR model (DataWarrior / RDKit MolLogS / proprietary) when wet measurement is impractical.
  • Cite: Schneider 2014 §6.2.2 [68, 69].

Step 4 — REOS / PAINS substructure filter

REOS = Rapid Elimination Of Swill (Walters & Murcko / Vertex). PAINS = pan-assay interference compounds (Baell & Holloway 2010).

  • Remove epoxides, sulfonate / phosphonate esters, Michael acceptors, aldehydes (frequently false-positive in fragment screens), heteroatom–heteroatom single bonds (Schneider 2014 §5.2.2 [13–15]).
  • Bruns & Watson Eli Lilly 2012 (cited in §6.2.4) provides a comprehensive structural-rule set.
  • Cite: Schneider 2014 §6.2.4; Baell & Holloway 2010 J. Med. Chem.

Step 5 — 3D-character filter (optional)

  • Fraction of sp3 carbons (Fsp3) > 0.3 — favors three-dimensionality (Schneider 2014 §6.2.2 [73]).
  • Plane of best fit (PBF): mean heavy-atom distance from PBF, in Ångstroms; recommended for larger fragments only.
  • For h01 (E1659A subpocket, 159 ų), Fsp3 is more relevant than PBF (small monocyclic fragments dominate).

Step 6 — Diversity selection (cluster-or-pick)

Two interchangeable strategies (Schneider 2014 §6.2.5 [42, 99–103]):

  • Hole-filling: iteratively add the candidate fragment maximally distant from those already in the library. Maximizes coverage.
  • Iterative removal: start with full set, iteratively remove the fragment with the most near-neighbors. Maximizes density-of-cluster representation.

Use 2D fingerprints (MDL MACCS keys, ECFP4, UNITY) + Tanimoto distance. Library size target = 1,000–20,000 depending on screening throughput (§6.2.3):

  • X-ray crystal soaking: 100–1000 fragments
  • NMR cocktail: 1,000–10,000 fragments
  • High-concentration biochemical (FCS+plus, Mazanetz 20K library): 20,000+ fragments

Step 7 — Hypothesis-driven focused subset

For h01 (known pocket geometry, mutant hotspot), supplement diverse library with focused subset:

  • Substructure search around known anchors (e.g., 3-amino-benzofuran-2-carboxylic acid for h01).
  • Pharmacophore-based virtual screen of the diverse set against the receptor structure.
  • Docking pre-filter at top 10% by score (Schneider 2014 §5.3.1.1 [22, 55–63]).
  • Restrict to compounds carrying click-chemistry handles (azide / alkyne / acid / amine) to ease downstream fragment-grow.

STRC parameter table

Filter steph01 cutoffSourceOutput count target
Rule-of-three MW≤300 DaCongreve 2003 / Schneider 2014 Table 5.1~5% of vendor catalog
Rule-of-three clogP≤3.0Congreve 2003rejects ~30% remaining
Aqueous solubility≥1 mM (Mazanetz QSAR)Schneider 2014 §6.2.2 [68]rejects ~50% of remaining
Reactive substructures (REOS/PAINS)excludeBruns & Watson 2012; Baell 2010rejects ~10–20%
Diversity selectionhole-filling, ECFP4 + TanimotoSchneider 2014 §6.2.5 [42]1,000–20,000 fragments
Pocket-fit MW (h01-specific)180–350 Dah01 phase3b.pysecondary filter on 159 ų subpocket
Pocket-fit volume scoretunableh01 in-silicoLE-coupled prioritization
Docking pre-filtertop 10% Vina scoreSchneider 2014 §5.3.1.1 [22]optional

Citation pattern for phase3b.py docstring

# Fragment library filtering pipeline.
# Filters: Congreve rule-of-three (MW≤300, clogP≤3, HBD≤3, HBA≤3, rot≤3, TPSA≤60)
#   per Schneider 2014 §5.2.2 / Congreve, Carr, Murray, Jhoti 2003 Drug Discov Today.
# Aqueous solubility ≥1 mM (in-silico QSAR) per Schneider 2014 §6.2.2.
# REOS / PAINS substructure exclusion per Bruns & Watson 2012; Baell & Holloway 2010.
# Pocket-fit MW window 180–350 Da is h01-specific (159 ų subpocket); narrower
# than rule-of-three. See [[literature/pharmacochaperone]] row 47.

Connections