program.md — Phase 5e-v2 ρ optimizer

Adapted from karpathy/autoresearch. Single goal, single scalar, single editable file. Your job: improve phase5e_v2_mutant_redock.py so its Vina ranking on the v5.2 shortlist agrees with τRAMD residence-time ranking (ground truth). Currently they are anti-correlated (ρ = −0.9). Move ρ toward +0.5 or higher.

Problem context (read once, don’t argue with it)

Today (2026-04-26) we discovered the Vina rigid-receptor mutant ensemble re-dock on the STRC E1659A K1141 pocket has two stacked artifacts that make its ranking the opposite of true biological ranking on anion-leads:

  1. Gasteiger zero-anionobabel --partialcharge gasteiger zeros formal anion charge. Already fixed in phase5e_v2_mutant_redock.py via dimorphite-DL + meeko default. Phosphonate now has q = −2.000 e in PDBQT. ρ moved from −1.0 to −0.9. Not enough.

  2. Rigid-receptor on per-snapshot static MD frames — the second artifact. Bulky anionic ligands (phosphonate, sulfonamide) need the K1141 pocket to exhale ~0.5–1 Å around them; flexible-MD (τRAMD) gets that for free, rigid Vina does not. NOT yet fixed. This is the load-bearing optimization target.

Full story: see ~/STRC/hypotheses/h01-pharmacochaperone/phases/STRC h01 Phase 5e-v2 v5.2 Shortlist Vina vs tauRAMD 2026-04-26.md.

Ground truth (DO NOT MODIFY)

τRAMD median dissociation time on the v5.2 shortlist (Phase 5m production, 5 lig × 5 replicas, 25 replicas total all UNBOUND):

Ligandτ_med (ps)
v5.2__aq3__adamantyl__CONHOH__-Cl15.7
v5.2__aq3__adamantyl__CONHOMe__-Cl15.8
v5.2__aq3__adamantyl__CONHOMe__-CN18.7
v5.2__aq3__1-indanyl__phosphonate__-CF319.7
v5.2__aq3__4-F-biphenyl__phosphonate__-CN20.0

Source: ~/STRC/hypotheses/h01-pharmacochaperone/artifacts/phase5m_production/aggregate_partial.json.

Higher τ ⇒ longer residence ⇒ tighter binding ⇒ should rank lower (more negative) on Vina ΔG. A perfect Vina would give Spearman ρ(Vina mean ΔG, −τ_med) = +1.0. We currently have −0.9.

Baseline (state of the world before you started)

File: phase5e_v2_mutant_redock.py at /Users/egorlyfar/STRC/hypotheses/h01-pharmacochaperone/scripts/
Wall: ~17 min for v5.2-shortlist (5 lig × 20 snap × ~10s, exh=8 cpu=4)
Vina mean ΔG (kcal/mol):
  adamantyl_CONHOH_-Cl: −4.87 (rank 1)
  1-indanyl_phosphonate_-CF3: −2.46 (rank 2)
  adamantyl_CONHOMe_-Cl: −1.61 (rank 3)
  adamantyl_CONHOMe_-CN: +1.07 (rank 4)
  4-F-biphenyl_phosphonate_-CN: +13.35 (rank 5)
ρ(Vina, −τ_med) = −0.9

What you may edit

Allowed:

  • phase5e_v2_mutant_redock.py (the orchestrator — full freedom on ligand prep, receptor prep, Vina invocation, scoring, aggregation)
  • New helper files in ~/STRC/autoresearch/phase5e_v2_rho_optimizer/ (this directory)
  • pip install <package> (NAGL, AmberTools, prepare_flexreceptor.py, etc.)
  • Shell out to obabel, antechamber, prepare_flexreceptor.py, mk_prepare_ligand.py
  • Modify how Vina is called (flex residues, exhaustiveness, rescoring, MMGBSA correction)

Not allowed:

  • Modify the ground-truth τRAMD numbers in this file or in program.md
  • Modify eval.py
  • Delete history.jsonl entries
  • Touch any other STRC vault files outside this optimizer’s scope

How experiments work

Every experiment is one round of:

  1. Edit phase5e_v2_mutant_redock.py (or add a helper in this directory)
  2. Run python eval.py --experiment-id <slug> (eval.py runs the script + computes ρ + appends to history.jsonl)
  3. Read the printed {"rho": ..., "wall_seconds": ..., "n_failures": ...}
  4. Decide: keep, revert, or refine
  5. Update history.jsonl with your reasoning + decision

eval.py enforces a 30-minute wall budget (1800 sec). If your script doesn’t finish in that, the experiment is marked timed-out and ρ = −∞.

Decision rules (gating)

  • ρ < −0.5discard, revert (worse than baseline, you broke something)
  • ρ ∈ [−0.5, +0.3]log + revert (no improvement, but study what failed and try different angle)
  • ρ ∈ [+0.3, +0.5]KEEP, this is the new baseline (PASS-1, paper-figure-grade)
  • ρ ∈ [+0.5, +0.7]KEEP + checkpoint (PASS-2, publishable methodology)
  • ρ ≥ +0.7KEEP + STOP, write summary.md (PASS-3, screen-scale Vina now usable on anion-leads)

Stop after 5 consecutive non-improving experiments. Write summary.md describing the best three approaches tried, what they imply about the artifact, and what the next human-driven step should be (probably Glide/HADDOCK induced-fit, or full-MD MMGBSA).

What’s likely to work (suggestions, not commands)

You are smart. Generate your own hypotheses. But to seed:

  1. NAGL am1bcc-class chargespip install openff-nagl proper (current 0.0.0 is a placeholder; check ~/STRC/tools/openff-nagl.md if it exists). Replaces meeko’s default Gasteiger-on-skeleton with quantum-derived charges.
  2. Soft-receptor Vinaprepare_flexreceptor.py --r receptor.pdbqt -s "K1141,K1135,K1137,K1165,K1167,K1175" then pass --receptor rigid.pdbqt --flex flex.pdbqt to Vina.
  3. AutoDock4 Meeko atom typingmk_prepare_ligand.py --add-charge ad4 (might already be the meeko default; verify with mk_prepare_ligand.py --help).
  4. MMGBSA rescoring on Vina top-1 pose per snap — Phase 4f tooling.
  5. GBNeck2 implicit-solvent rescoring — single-point energy on each Vina pose with a more accurate potential.
  6. Pose-cluster ranking — k-means top-N Vina poses per snap, rank ligands by best-cluster mean ΔG instead of grand mean.
  7. Rank by best-pose ΔG instead of mean — ignores the clash outliers.
  8. Increase exhaustiveness 8 → 32 — 4× wall but might tighten the rank.

Order them by your judgment of expected ρ-gain ÷ wall-cost. Try the cheap+plausible first.

Logging contract

history.jsonl is append-only. Each line is one experiment:

{"experiment_id": "exp_001_nagl_charges", "timestamp": "2026-04-27T03:14:22Z", "diff_summary": "swapped meeko default charges for openff-nagl am1bcc", "rho": -0.6, "wall_seconds": 1245, "n_failures": 0, "kept": false, "reason": "improvement from -0.9 to -0.6 but still anti-correlated; reverted; NAGL alone insufficient"}

eval.py writes the {rho, wall, n_failures} triple. You are responsible for the rest of the JSON object (diff_summary, kept, reason). Be honest. The goal is a useful experiment log for the human to read, not a self-flattering narrative.

When in doubt

Read recent history.jsonl. If you’re stuck after 3 attempts, write a notes.md describing what you’ve ruled out and what you’d want a human to investigate. Then continue trying.

When ρ ≥ +0.7 or 5 non-improving experiments in a row, STOP and write summary.md for the human. Don’t keep grinding.

Local-LLM mode

If running on local Ollama (qwen3:8b primary, fallback qwen3:32b for harder edits), the loop should still work — context per experiment is small (~3 KB program.md + ~10 KB phase5e_v2_mutant_redock.py + ~5 KB history). If local model can’t make progress for 3 consecutive iterations, escalate to Claude API (Anthropic) for the next iteration.