Identifying and characterizing binding sites and assessing druggability
SiteMap is a computational binding-site identification and druggability scoring tool. It combines a grid-based site detection algorithm with SiteScore (site quality) and Dscore (druggability) composite metrics. The paper establishes SiteMap as the reference for physics-based druggability prediction.
TL;DR
SiteMap ranks the known binding site as top-1 in 86% of test cases; this rises to >98% for sites binding subnanomolar ligands. The method uses a scoring function that combines volume, hydrophobicity/polarity balance, and H-bond donor/acceptor character.
Key finding
SiteMap’s Dscore accurately reproduces Cheng et al.’s druggability classification of 63 sites (27 proteins): undruggable rank 1-10, difficult rank 11-15/17/18/21/25/45 — outperforming MAP_POD on separating “difficult” from “undruggable”. Key physical drivers: undruggable sites are small, solvent-exposed, and strongly hydrophilic (p > ~1.7) with little hydrophobic character; druggable sites have reasonable size, good enclosure, and moderate hydrophobicity.
Numbers that matter
Extracted from full text (p. 377-389, J Chem Inf Model 49(2), 2009). MinerU parse: ~/BookLibrary/mineru-output/2009-halgren-sitemap-druggability-jcim/auto/.
SiteScore formula (eq. from Methods §6, used for binding-site identification):
SiteScore = 0.0733·√n + 0.6688·e − 0.20·p
- n = number of site points, capped at 100
- e = enclosure score (average fraction of 110 radial rays striking protein within 10 Å; higher = more enclosed)
- p = hydrophilic score, capped at 1.0 for SiteScore
- Average SiteScore for submicromolar sites: 1.01
- SiteScore threshold for binding-site classification: ≥ 0.80 (~80% of average submicromolar SiteScore)
Dscore formula (eq. 2, used for druggability classification):
Dscore = 0.094·√n + 0.60·e − 0.324·p
- n = number of site points, capped at 100
- e = enclosure score (same definition as SiteScore)
- p = hydrophilic score, NOT capped (key difference from SiteScore — allows larger penalty for polar sites)
- The larger uncapped hydrophilic penalty (−0.324·p vs −0.20·p) is the critical distinction enabling druggability classification
Druggability thresholds (Table 8, footnote g/h/i):
| Category | Dscore range | Physical profile |
|---|---|---|
| Druggable | > 0.98 | Reasonable size, good enclosure, moderate hydrophobicity, unexceptional hydrophilicity |
| Difficult | 0.83 – 0.98 | Good size/enclosure but hydrophilic enough to require prodrug; less hydrophobic than druggable |
| Undruggable | < 0.83 | Very small OR very shallow OR strongly hydrophilic (p > ~1.7) with little/no hydrophobic character; OR requires covalent binding |
Site point geometry (Methods §1, 8):
| Property | Value | Notes |
|---|---|---|
| Site-point grid spacing | 1 Å | Grid placed around entire protein |
| Enclosure sampling | 110 evenly spaced radial rays | Fraction striking protein within 10 Å |
| Site merging gap | ≤ 6.5 Å (exposed regions) | Default merge threshold |
| Volume grid spacing | 0.7 Å | ”Shrink-wrap” approximation |
| Volume ray cutoff | < 60% rays within 8 Å → discard | Excludes solvent-protruding regions |
| Average site points (submicromolar) | 132 (median 124) | Rule of thumb: ~2-3 site points per ligand heavy atom |
| n cap for scoring | 100 | Sites larger than 100 site points get capped |
| Average enclosure (submicromolar) | 0.76 | |
| Average exposure (submicromolar) | 0.52 (lower = more buried) |
Benchmark validation (Tables 1-2, 538 PDBbind proteins):
| Binding affinity | Top-1 success | Notes |
|---|---|---|
| All sites | 85.9% | 538 proteins |
| < 1 nM (subnanomolar) | 98.5% | 67 proteins |
| 1 µM – 1 nM | 87.6% | 275 proteins |
| 1 mM – 1 µM | 81.2% | 170 proteins |
| > 1 mM (millimolar) | 65.4% | 26 proteins |
Reference PDB IDs benchmarked for druggability (Cheng set, 63 sites/27 proteins):
- Undruggable: 1qs4 (HIV integrase), 1pty/1onz/1g1f/1nny/1q1m/1jf7 (PTP1B), 1nlj/1mem (cathepsin K), 1bmq (caspase 1/ICE-1)
- Difficult: 1a4g/1nnc/1f8b/1mwe/2qwk (neuraminidase), 1nf7 (IMPDH), 1t03 (HIV-rt nucleotide site), 1086 (ACE-1), 1qmf (penicillin binding protein), 1kts (thrombin)
- Druggable: remaining 43 sites (17 proteins, marketed drugs or advanced candidates)
No v_opt (optimal pocket volume) threshold stated. Halgren’s approach uses number of site points (n, capped at 100) as the size term — not directly convertible to ų volume without the 0.7 Å grid and protein-specific site-finding geometry. The paper provides average site-point count (132) but no ų volume thresholds for druggability bands.
No feature-weight schemes for volume/hydrophobicity/polarity breakdown comparable to h01’s 0.5/0.3/0.2 or 0.4/0.25/0.2/0.15 in-house schemes — Halgren’s coefficients apply to composite grid scores (enclosure, hydrophilic score), not decomposed property weights.
Access status
Full text retrieved via Anna’s Archive (standalone paper, 261 KB PDF). Parsed by MinerU 2026-04-25: ~/BookLibrary/mineru-output/2009-halgren-sitemap-druggability-jcim/auto/2009-halgren-sitemap-druggability-jcim.md.
For h01 audit purposes: The druggability scripts in h01 phases 1, 2, 2b use custom heuristic scoring inspired by SiteMap concepts but do NOT implement the Halgren 2009 formula. Halgren uses grid-based site-point counts + enclosure + hydrophilic composite scores requiring Schrödinger Maestro; h01 uses open-source geometry-based approximations. The h01 scripts’ docstrings correctly label these as “SiteMap-inspired heuristic, in-house approximation” with Halgren PMID 19434839 as conceptual source. Key deltas documented in the Delta section below.
Delta: Halgren 2009 vs h01 script values
| Parameter | Halgren 2009 canonical | h01 scripts (in-house) | Status |
|---|---|---|---|
| Druggable threshold | Dscore > 0.98 | ~0.83+ (implicit from phase2 scoring) | h01 not using Dscore directly |
| Difficult band | Dscore 0.83–0.98 | N/A | h01 uses continuous composite score |
| Undruggable threshold | Dscore < 0.83 | N/A | h01 uses continuous composite score |
| Pocket size metric | n (site points, grid-based, cap 100) | Volume in ų | Incommensurable — different method |
| v_opt (optimal volume) | Not specified in paper | 250 ų (subpocket) / 300 ų (full pocket) | h01 values are in-house heuristics; no Halgren basis |
| Weight scheme (volume/phobic/philic) | 0.094/0.60/−0.324 on √n/e/p | 0.5/0.3/0.2 or 0.4/0.25/0.2/0.15 | Incommensurable — different property definitions |
| Enclosure | Fraction of 110 radial rays within 10 Å | Burial fraction (26-direction, ≥16/26 threshold) | Conceptually analogous, numerically different |
| SiteScore vs Dscore | Separate functions required | Single composite used | h01 uses single scorer for all purposes |
Conclusion: The h01 in-house druggability scores (phase1, phase2, phase2b) cannot be numerically compared to Halgren Dscore thresholds (0.83/0.98). They measure related but technically incommensurable quantities. The labeling “SiteMap-inspired heuristic” is accurate. The 250/300 ų subpocket/full-pocket v_opt values in h01 scripts have no direct basis in Halgren 2009 — they are geometry-based heuristics. The Halgren paper does not state an ų volume threshold for druggability.
Limitations
- SiteMap requires Schrödinger Suite license; Dscore formula itself is published but property definitions (enclosure, hydrophilic score) depend on proprietary grid-energy calculations.
- Halgren’s n and e terms are not directly replicable without the SiteMap grid algorithm.
- 63-protein Cheng validation set used in 2009; druggability thresholds may not generalize perfectly.
Connections
[source]STRC h01 Parameter Provenance Audit 2026-04-25 — druggability scoring attribution[see-also]STRC Pharmacochaperone K1141 Fragment Pocket — pocket characterization context[see-also]pharmacochaperone — druggability parameter table