program.md — Phase 5e-v2 ρ optimizer
Adapted from karpathy/autoresearch. Single goal, single scalar, single editable file. Your job: improve
phase5e_v2_mutant_redock.pyso its Vina ranking on the v5.2 shortlist agrees with τRAMD residence-time ranking (ground truth). Currently they are anti-correlated (ρ = −0.9). Move ρ toward +0.5 or higher.
Problem context (read once, don’t argue with it)
Today (2026-04-26) we discovered the Vina rigid-receptor mutant ensemble re-dock on the STRC E1659A K1141 pocket has two stacked artifacts that make its ranking the opposite of true biological ranking on anion-leads:
-
Gasteiger zero-anion —
obabel --partialcharge gasteigerzeros formal anion charge. Already fixed inphase5e_v2_mutant_redock.pyvia dimorphite-DL + meeko default. Phosphonate now has q = −2.000 e in PDBQT. ρ moved from −1.0 to −0.9. Not enough. -
Rigid-receptor on per-snapshot static MD frames — the second artifact. Bulky anionic ligands (phosphonate, sulfonamide) need the K1141 pocket to exhale ~0.5–1 Å around them; flexible-MD (τRAMD) gets that for free, rigid Vina does not. NOT yet fixed. This is the load-bearing optimization target.
Full story: see ~/STRC/hypotheses/h01-pharmacochaperone/phases/STRC h01 Phase 5e-v2 v5.2 Shortlist Vina vs tauRAMD 2026-04-26.md.
Ground truth (DO NOT MODIFY)
τRAMD median dissociation time on the v5.2 shortlist (Phase 5m production, 5 lig × 5 replicas, 25 replicas total all UNBOUND):
| Ligand | τ_med (ps) |
|---|---|
v5.2__aq3__adamantyl__CONHOH__-Cl | 15.7 |
v5.2__aq3__adamantyl__CONHOMe__-Cl | 15.8 |
v5.2__aq3__adamantyl__CONHOMe__-CN | 18.7 |
v5.2__aq3__1-indanyl__phosphonate__-CF3 | 19.7 |
v5.2__aq3__4-F-biphenyl__phosphonate__-CN | 20.0 |
Source: ~/STRC/hypotheses/h01-pharmacochaperone/artifacts/phase5m_production/aggregate_partial.json.
Higher τ ⇒ longer residence ⇒ tighter binding ⇒ should rank lower (more negative) on Vina ΔG. A perfect Vina would give Spearman ρ(Vina mean ΔG, −τ_med) = +1.0. We currently have −0.9.
Baseline (state of the world before you started)
File: phase5e_v2_mutant_redock.py at /Users/egorlyfar/STRC/hypotheses/h01-pharmacochaperone/scripts/
Wall: ~17 min for v5.2-shortlist (5 lig × 20 snap × ~10s, exh=8 cpu=4)
Vina mean ΔG (kcal/mol):
adamantyl_CONHOH_-Cl: −4.87 (rank 1)
1-indanyl_phosphonate_-CF3: −2.46 (rank 2)
adamantyl_CONHOMe_-Cl: −1.61 (rank 3)
adamantyl_CONHOMe_-CN: +1.07 (rank 4)
4-F-biphenyl_phosphonate_-CN: +13.35 (rank 5)
ρ(Vina, −τ_med) = −0.9
What you may edit
Allowed:
phase5e_v2_mutant_redock.py(the orchestrator — full freedom on ligand prep, receptor prep, Vina invocation, scoring, aggregation)- New helper files in
~/STRC/autoresearch/phase5e_v2_rho_optimizer/(this directory) pip install <package>(NAGL, AmberTools, prepare_flexreceptor.py, etc.)- Shell out to
obabel,antechamber,prepare_flexreceptor.py,mk_prepare_ligand.py - Modify how Vina is called (flex residues, exhaustiveness, rescoring, MMGBSA correction)
Not allowed:
- Modify the ground-truth τRAMD numbers in this file or in
program.md - Modify
eval.py - Delete history.jsonl entries
- Touch any other STRC vault files outside this optimizer’s scope
How experiments work
Every experiment is one round of:
- Edit
phase5e_v2_mutant_redock.py(or add a helper in this directory) - Run
python eval.py --experiment-id <slug>(eval.py runs the script + computes ρ + appends to history.jsonl) - Read the printed
{"rho": ..., "wall_seconds": ..., "n_failures": ...} - Decide: keep, revert, or refine
- Update history.jsonl with your reasoning + decision
eval.py enforces a 30-minute wall budget (1800 sec). If your script doesn’t finish in that, the experiment is marked timed-out and ρ = −∞.
Decision rules (gating)
ρ < −0.5→ discard, revert (worse than baseline, you broke something)ρ ∈ [−0.5, +0.3]→ log + revert (no improvement, but study what failed and try different angle)ρ ∈ [+0.3, +0.5]→ KEEP, this is the new baseline (PASS-1, paper-figure-grade)ρ ∈ [+0.5, +0.7]→ KEEP + checkpoint (PASS-2, publishable methodology)ρ ≥ +0.7→ KEEP + STOP, write summary.md (PASS-3, screen-scale Vina now usable on anion-leads)
Stop after 5 consecutive non-improving experiments. Write summary.md describing
the best three approaches tried, what they imply about the artifact, and what the next
human-driven step should be (probably Glide/HADDOCK induced-fit, or full-MD MMGBSA).
What’s likely to work (suggestions, not commands)
You are smart. Generate your own hypotheses. But to seed:
- NAGL am1bcc-class charges —
pip install openff-naglproper (current 0.0.0 is a placeholder; check~/STRC/tools/openff-nagl.mdif it exists). Replaces meeko’s default Gasteiger-on-skeleton with quantum-derived charges. - Soft-receptor Vina —
prepare_flexreceptor.py --r receptor.pdbqt -s "K1141,K1135,K1137,K1165,K1167,K1175"then pass--receptor rigid.pdbqt --flex flex.pdbqtto Vina. - AutoDock4 Meeko atom typing —
mk_prepare_ligand.py --add-charge ad4(might already be the meeko default; verify withmk_prepare_ligand.py --help). - MMGBSA rescoring on Vina top-1 pose per snap — Phase 4f tooling.
- GBNeck2 implicit-solvent rescoring — single-point energy on each Vina pose with a more accurate potential.
- Pose-cluster ranking — k-means top-N Vina poses per snap, rank ligands by best-cluster mean ΔG instead of grand mean.
- Rank by best-pose ΔG instead of mean — ignores the clash outliers.
- Increase exhaustiveness 8 → 32 — 4× wall but might tighten the rank.
Order them by your judgment of expected ρ-gain ÷ wall-cost. Try the cheap+plausible first.
Logging contract
history.jsonl is append-only. Each line is one experiment:
{"experiment_id": "exp_001_nagl_charges", "timestamp": "2026-04-27T03:14:22Z", "diff_summary": "swapped meeko default charges for openff-nagl am1bcc", "rho": -0.6, "wall_seconds": 1245, "n_failures": 0, "kept": false, "reason": "improvement from -0.9 to -0.6 but still anti-correlated; reverted; NAGL alone insufficient"}eval.py writes the {rho, wall, n_failures} triple. You are responsible for the rest of the JSON object (diff_summary, kept, reason). Be honest. The goal is a useful experiment log for the human to read, not a self-flattering narrative.
When in doubt
Read recent history.jsonl. If you’re stuck after 3 attempts, write a notes.md describing
what you’ve ruled out and what you’d want a human to investigate. Then continue trying.
When ρ ≥ +0.7 or 5 non-improving experiments in a row, STOP and write summary.md for
the human. Don’t keep grinding.
Local-LLM mode
If running on local Ollama (qwen3:8b primary, fallback qwen3:32b for harder edits), the loop should still work — context per experiment is small (~3 KB program.md + ~10 KB phase5e_v2_mutant_redock.py + ~5 KB history). If local model can’t make progress for 3 consecutive iterations, escalate to Claude API (Anthropic) for the next iteration.