program.md — Phase 5e-v2 ρ optimizer

Adapted from karpathy/autoresearch. Single goal, single scalar, single editable file. Your job: improve phase5e_v2_mutant_redock.py so its Vina ranking on the v5.2 shortlist agrees with τRAMD residence-time ranking (ground truth). Currently they are anti-correlated (ρ = −0.9). Move ρ toward +0.5 or higher.

Problem context (read once, don’t argue with it)

Today (2026-04-26) we discovered the Vina rigid-receptor mutant ensemble re-dock on the STRC E1659A K1141 pocket has two stacked artifacts that make its ranking the opposite of true biological ranking on anion-leads:

Gasteiger zero-anion — obabel --partialcharge gasteiger zeros formal anion charge. Already fixed in phase5e_v2_mutant_redock.py via dimorphite-DL + meeko default. Phosphonate now has q = −2.000 e in PDBQT. ρ moved from −1.0 to −0.9. Not enough.
Rigid-receptor on per-snapshot static MD frames — the second artifact. Bulky anionic ligands (phosphonate, sulfonamide) need the K1141 pocket to exhale ~0.5–1 Å around them; flexible-MD (τRAMD) gets that for free, rigid Vina does not. NOT yet fixed. This is the load-bearing optimization target.

Full story: see ~/STRC/hypotheses/h01-pharmacochaperone/phases/STRC h01 Phase 5e-v2 v5.2 Shortlist Vina vs tauRAMD 2026-04-26.md.

Ground truth (DO NOT MODIFY)

τRAMD median dissociation time on the v5.2 shortlist (Phase 5m production, 5 lig × 5 replicas, 25 replicas total all UNBOUND):

Ligand	τ_med (ps)
`v5.2__aq3__adamantyl__CONHOH__-Cl`	15.7
`v5.2__aq3__adamantyl__CONHOMe__-Cl`	15.8
`v5.2__aq3__adamantyl__CONHOMe__-CN`	18.7
`v5.2__aq3__1-indanyl__phosphonate__-CF3`	19.7
`v5.2__aq3__4-F-biphenyl__phosphonate__-CN`	20.0

Source: ~/STRC/hypotheses/h01-pharmacochaperone/artifacts/phase5m_production/aggregate_partial.json.

Higher τ ⇒ longer residence ⇒ tighter binding ⇒ should rank lower (more negative) on Vina ΔG. A perfect Vina would give Spearman ρ(Vina mean ΔG, −τ_med) = +1.0. We currently have −0.9.

Baseline (state of the world before you started)

File: phase5e_v2_mutant_redock.py at /Users/egorlyfar/STRC/hypotheses/h01-pharmacochaperone/scripts/
Wall: ~17 min for v5.2-shortlist (5 lig × 20 snap × ~10s, exh=8 cpu=4)
Vina mean ΔG (kcal/mol):
  adamantyl_CONHOH_-Cl: −4.87 (rank 1)
  1-indanyl_phosphonate_-CF3: −2.46 (rank 2)
  adamantyl_CONHOMe_-Cl: −1.61 (rank 3)
  adamantyl_CONHOMe_-CN: +1.07 (rank 4)
  4-F-biphenyl_phosphonate_-CN: +13.35 (rank 5)
ρ(Vina, −τ_med) = −0.9

What you may edit

Allowed:

phase5e_v2_mutant_redock.py (the orchestrator — full freedom on ligand prep, receptor prep, Vina invocation, scoring, aggregation)
New helper files in ~/STRC/autoresearch/phase5e_v2_rho_optimizer/ (this directory)
pip install <package> (NAGL, AmberTools, prepare_flexreceptor.py, etc.)
Shell out to obabel, antechamber, prepare_flexreceptor.py, mk_prepare_ligand.py
Modify how Vina is called (flex residues, exhaustiveness, rescoring, MMGBSA correction)

Not allowed:

Modify the ground-truth τRAMD numbers in this file or in program.md
Modify eval.py
Delete history.jsonl entries
Touch any other STRC vault files outside this optimizer’s scope

How experiments work

Every experiment is one round of:

Edit phase5e_v2_mutant_redock.py (or add a helper in this directory)
Run python eval.py --experiment-id <slug> (eval.py runs the script + computes ρ + appends to history.jsonl)
Read the printed {"rho": ..., "wall_seconds": ..., "n_failures": ...}
Decide: keep, revert, or refine
Update history.jsonl with your reasoning + decision

eval.py enforces a 30-minute wall budget (1800 sec). If your script doesn’t finish in that, the experiment is marked timed-out and ρ = −∞.

Decision rules (gating)

ρ < −0.5 → discard, revert (worse than baseline, you broke something)
ρ ∈ [−0.5, +0.3] → log + revert (no improvement, but study what failed and try different angle)
ρ ∈ [+0.3, +0.5] → KEEP, this is the new baseline (PASS-1, paper-figure-grade)
ρ ∈ [+0.5, +0.7] → KEEP + checkpoint (PASS-2, publishable methodology)
ρ ≥ +0.7 → KEEP + STOP, write summary.md (PASS-3, screen-scale Vina now usable on anion-leads)

Stop after 5 consecutive non-improving experiments. Write summary.md describing the best three approaches tried, what they imply about the artifact, and what the next human-driven step should be (probably Glide/HADDOCK induced-fit, or full-MD MMGBSA).

What’s likely to work (suggestions, not commands)

You are smart. Generate your own hypotheses. But to seed:

NAGL am1bcc-class charges — pip install openff-nagl proper (current 0.0.0 is a placeholder; check ~/STRC/tools/openff-nagl.md if it exists). Replaces meeko’s default Gasteiger-on-skeleton with quantum-derived charges.
Soft-receptor Vina — prepare_flexreceptor.py --r receptor.pdbqt -s "K1141,K1135,K1137,K1165,K1167,K1175" then pass --receptor rigid.pdbqt --flex flex.pdbqt to Vina.
AutoDock4 Meeko atom typing — mk_prepare_ligand.py --add-charge ad4 (might already be the meeko default; verify with mk_prepare_ligand.py --help).
MMGBSA rescoring on Vina top-1 pose per snap — Phase 4f tooling.
GBNeck2 implicit-solvent rescoring — single-point energy on each Vina pose with a more accurate potential.
Pose-cluster ranking — k-means top-N Vina poses per snap, rank ligands by best-cluster mean ΔG instead of grand mean.
Rank by best-pose ΔG instead of mean — ignores the clash outliers.
Increase exhaustiveness 8 → 32 — 4× wall but might tighten the rank.

Order them by your judgment of expected ρ-gain ÷ wall-cost. Try the cheap+plausible first.

Logging contract

history.jsonl is append-only. Each line is one experiment:

{"experiment_id": "exp_001_nagl_charges", "timestamp": "2026-04-27T03:14:22Z", "diff_summary": "swapped meeko default charges for openff-nagl am1bcc", "rho": -0.6, "wall_seconds": 1245, "n_failures": 0, "kept": false, "reason": "improvement from -0.9 to -0.6 but still anti-correlated; reverted; NAGL alone insufficient"}

eval.py writes the {rho, wall, n_failures} triple. You are responsible for the rest of the JSON object (diff_summary, kept, reason). Be honest. The goal is a useful experiment log for the human to read, not a self-flattering narrative.

When in doubt

Read recent history.jsonl. If you’re stuck after 3 attempts, write a notes.md describing what you’ve ruled out and what you’d want a human to investigate. Then continue trying.

When ρ ≥ +0.7 or 5 non-improving experiments in a row, STOP and write summary.md for the human. Don’t keep grinding.

Local-LLM mode

If running on local Ollama (qwen3:8b primary, fallback qwen3:32b for harder edits), the loop should still work — context per experiment is small (~3 KB program.md + ~10 KB phase5e_v2_mutant_redock.py + ~5 KB history). If local model can’t make progress for 3 consecutive iterations, escalate to Claude API (Anthropic) for the next iteration.

STRC Research

Explorer

program