Identifying and characterizing binding sites and assessing druggability

SiteMap is a computational binding-site identification and druggability scoring tool. It combines a grid-based site detection algorithm with SiteScore (site quality) and Dscore (druggability) composite metrics. The paper establishes SiteMap as the reference for physics-based druggability prediction.

TL;DR

SiteMap ranks the known binding site as top-1 in 86% of test cases; this rises to >98% for sites binding subnanomolar ligands. The method uses a scoring function that combines volume, hydrophobicity/polarity balance, and H-bond donor/acceptor character.

Key finding

SiteMap’s Dscore accurately reproduces Cheng et al.’s druggability classification of 63 sites (27 proteins): undruggable rank 1-10, difficult rank 11-15/17/18/21/25/45 — outperforming MAP_POD on separating “difficult” from “undruggable”. Key physical drivers: undruggable sites are small, solvent-exposed, and strongly hydrophilic (p > ~1.7) with little hydrophobic character; druggable sites have reasonable size, good enclosure, and moderate hydrophobicity.

Numbers that matter

Extracted from full text (p. 377-389, J Chem Inf Model 49(2), 2009). MinerU parse: ~/BookLibrary/mineru-output/2009-halgren-sitemap-druggability-jcim/auto/.

SiteScore formula (eq. from Methods §6, used for binding-site identification):

SiteScore = 0.0733·√n + 0.6688·e − 0.20·p
  • n = number of site points, capped at 100
  • e = enclosure score (average fraction of 110 radial rays striking protein within 10 Å; higher = more enclosed)
  • p = hydrophilic score, capped at 1.0 for SiteScore
  • Average SiteScore for submicromolar sites: 1.01
  • SiteScore threshold for binding-site classification: ≥ 0.80 (~80% of average submicromolar SiteScore)

Dscore formula (eq. 2, used for druggability classification):

Dscore = 0.094·√n + 0.60·e − 0.324·p
  • n = number of site points, capped at 100
  • e = enclosure score (same definition as SiteScore)
  • p = hydrophilic score, NOT capped (key difference from SiteScore — allows larger penalty for polar sites)
  • The larger uncapped hydrophilic penalty (−0.324·p vs −0.20·p) is the critical distinction enabling druggability classification

Druggability thresholds (Table 8, footnote g/h/i):

CategoryDscore rangePhysical profile
Druggable> 0.98Reasonable size, good enclosure, moderate hydrophobicity, unexceptional hydrophilicity
Difficult0.83 – 0.98Good size/enclosure but hydrophilic enough to require prodrug; less hydrophobic than druggable
Undruggable< 0.83Very small OR very shallow OR strongly hydrophilic (p > ~1.7) with little/no hydrophobic character; OR requires covalent binding

Site point geometry (Methods §1, 8):

PropertyValueNotes
Site-point grid spacing1 ÅGrid placed around entire protein
Enclosure sampling110 evenly spaced radial raysFraction striking protein within 10 Å
Site merging gap≤ 6.5 Å (exposed regions)Default merge threshold
Volume grid spacing0.7 Å”Shrink-wrap” approximation
Volume ray cutoff< 60% rays within 8 Å → discardExcludes solvent-protruding regions
Average site points (submicromolar)132 (median 124)Rule of thumb: ~2-3 site points per ligand heavy atom
n cap for scoring100Sites larger than 100 site points get capped
Average enclosure (submicromolar)0.76
Average exposure (submicromolar)0.52 (lower = more buried)

Benchmark validation (Tables 1-2, 538 PDBbind proteins):

Binding affinityTop-1 successNotes
All sites85.9%538 proteins
< 1 nM (subnanomolar)98.5%67 proteins
1 µM – 1 nM87.6%275 proteins
1 mM – 1 µM81.2%170 proteins
> 1 mM (millimolar)65.4%26 proteins

Reference PDB IDs benchmarked for druggability (Cheng set, 63 sites/27 proteins):

  • Undruggable: 1qs4 (HIV integrase), 1pty/1onz/1g1f/1nny/1q1m/1jf7 (PTP1B), 1nlj/1mem (cathepsin K), 1bmq (caspase 1/ICE-1)
  • Difficult: 1a4g/1nnc/1f8b/1mwe/2qwk (neuraminidase), 1nf7 (IMPDH), 1t03 (HIV-rt nucleotide site), 1086 (ACE-1), 1qmf (penicillin binding protein), 1kts (thrombin)
  • Druggable: remaining 43 sites (17 proteins, marketed drugs or advanced candidates)

No v_opt (optimal pocket volume) threshold stated. Halgren’s approach uses number of site points (n, capped at 100) as the size term — not directly convertible to ų volume without the 0.7 Å grid and protein-specific site-finding geometry. The paper provides average site-point count (132) but no ų volume thresholds for druggability bands.

No feature-weight schemes for volume/hydrophobicity/polarity breakdown comparable to h01’s 0.5/0.3/0.2 or 0.4/0.25/0.2/0.15 in-house schemes — Halgren’s coefficients apply to composite grid scores (enclosure, hydrophilic score), not decomposed property weights.

Access status

Full text retrieved via Anna’s Archive (standalone paper, 261 KB PDF). Parsed by MinerU 2026-04-25: ~/BookLibrary/mineru-output/2009-halgren-sitemap-druggability-jcim/auto/2009-halgren-sitemap-druggability-jcim.md.

For h01 audit purposes: The druggability scripts in h01 phases 1, 2, 2b use custom heuristic scoring inspired by SiteMap concepts but do NOT implement the Halgren 2009 formula. Halgren uses grid-based site-point counts + enclosure + hydrophilic composite scores requiring Schrödinger Maestro; h01 uses open-source geometry-based approximations. The h01 scripts’ docstrings correctly label these as “SiteMap-inspired heuristic, in-house approximation” with Halgren PMID 19434839 as conceptual source. Key deltas documented in the Delta section below.

Delta: Halgren 2009 vs h01 script values

ParameterHalgren 2009 canonicalh01 scripts (in-house)Status
Druggable thresholdDscore > 0.98~0.83+ (implicit from phase2 scoring)h01 not using Dscore directly
Difficult bandDscore 0.83–0.98N/Ah01 uses continuous composite score
Undruggable thresholdDscore < 0.83N/Ah01 uses continuous composite score
Pocket size metricn (site points, grid-based, cap 100)Volume in ųIncommensurable — different method
v_opt (optimal volume)Not specified in paper250 ų (subpocket) / 300 ų (full pocket)h01 values are in-house heuristics; no Halgren basis
Weight scheme (volume/phobic/philic)0.094/0.60/−0.324 on √n/e/p0.5/0.3/0.2 or 0.4/0.25/0.2/0.15Incommensurable — different property definitions
EnclosureFraction of 110 radial rays within 10 ÅBurial fraction (26-direction, ≥16/26 threshold)Conceptually analogous, numerically different
SiteScore vs DscoreSeparate functions requiredSingle composite usedh01 uses single scorer for all purposes

Conclusion: The h01 in-house druggability scores (phase1, phase2, phase2b) cannot be numerically compared to Halgren Dscore thresholds (0.83/0.98). They measure related but technically incommensurable quantities. The labeling “SiteMap-inspired heuristic” is accurate. The 250/300 ų subpocket/full-pocket v_opt values in h01 scripts have no direct basis in Halgren 2009 — they are geometry-based heuristics. The Halgren paper does not state an ų volume threshold for druggability.

Limitations

  • SiteMap requires Schrödinger Suite license; Dscore formula itself is published but property definitions (enclosure, hydrophilic score) depend on proprietary grid-energy calculations.
  • Halgren’s n and e terms are not directly replicable without the SiteMap grid algorithm.
  • 63-protein Cheng validation set used in 2009; druggability thresholds may not generalize perfectly.

Connections