STRC h03 Parameter Provenance Audit 2026-04-25

Closes the lit_audit: deferred flag on h03 hub (Mini-STRC Single-Vector AAV). Triggered by the 2026-04-23 cross-hypothesis audit finding that lit_audit was deferred by user directive while Holt lab independently worked on the same architecture. Audit now authorized.

Scripts audited: mini_strc_interface_preservation.py, cpg_depletion_mini_strc.py, cpg_depletion_ultra_mini_strc.py, ultra_mini_promoter_shortlist.py, ultramini_vector_cpg_audit.py, strc_homodimer_interface_from_cif.py, ultramini_homodimer_consensus.py, strc_aav_lnp_stack_pkpd.py, dual_vector_otof_calibration.py.

Parameter Table

#ConstantValueUnitsScriptClaimed sourcePDF in papers/ or incoming/Verdict
1AAV packaging capacity4700bpultra_mini_promoter_shortlist.py, ultramini_vector_cpg_audit.py”Industry standard limit (ITR to ITR)” — no citePDF: Iranfar 2026 says “~4700 bp packaging limit”; Omichi 2020 says “no more than 5.0 kb”
2ITR overhead300bpultra_mini_promoter_shortlist.py”2 × ~150 bp ITRs” — no citeSamulski 1987 cited by name in ultramini_vector_cpg_audit.py but no PDF in papers/ or incoming/
3AAV2 ITR size145 (131 proxy used)bpultramini_vector_cpg_audit.py”Samulski 1987, canonical inverted terminal repeat”No PDF for Samulski 1987 in vault
4Kozak + IgK SP + polyA overhead450bpultra_mini_promoter_shortlist.pyDecomposed in comment: Kozak ~20, IgK SP ~60, stop 3, bGH polyA ~225, misc spacer ~140No single source; composite estimate
5bGH polyA size225bpultramini_vector_cpg_audit.py”pAAV-MCS canonical bGH polyA signal” — no citeNo PDF. Standard molecular biology reference not provided
6Human genome CpG/kb9.7CpG/kbcpg_depletion_mini_strc.py, cpg_depletion_ultra_mini_strc.py, ultramini_vector_cpg_audit.pyComment in script: “Human genome CpG density ~9.7/kb (genome-wide avg)” — no paper citedNo PDF
7Kazusa Homo sapiens codon usage tablePer-codon frequenciesper thousand codonscpg_depletion_mini_strc.py”Source: https://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606” — URL citedURL is primary public database; no PDF required — this is acceptable as online primary source
8CAI methodSharp & Li 1987 geometric mean of relative adaptivenesscpg_depletion_mini_strc.py, cpg_depletion_ultra_mini_strc.py”Sharp 1987-style CAI” / “cai_method: Sharp & Li 1987”No PDF for Sharp & Li 1987
9CAI cost threshold≤5%cpg_depletion_mini_strc.py, cpg_depletion_ultra_mini_strc.pyNo source. Design choice with no backing citationNo paper defines this threshold for AAV codon optimization
10STRC TMEM145 interface zoneaa 1603–1770canonical aamini_strc_interface_preservation.py”Derstroff 2026 confirmed” in comment2026-04-17-derstroff-tmem145-ohc-stereocilia paper note exists ✅; PDF: see papers/ Derstroff
11GPI omega siteS1749aa positionmini_strc_interface_preservation.py”NetGPI 1.1 predicts omega site S1749”NetGPI 1.1 is a web tool (primary computational source); paper reference for method: Gíslason et al. 2021 — not in vault
12RMSD thresholds (binding pocket)< 0.5 Å hot contacts for “preserved”Åmini_strc_interface_preservation.pyNo literature source for threshold. Internal design criterion.No paper defines 0.5 Å or 80%-within-3Å verdict criteria for this context
13AF3 ipTM threshold for “GOLD”0.68 (homodimer) / 0.43 (TMEM145 full)Hub index.md references; scripts use CIF outputsNo script hardcodes ipTM threshold. CIF outputs from AF3 are primary dataipTM thresholds reported from AF3 server output (primary), not a literature-defined cutoff✅ (AF3 primary output)
14STRC × TMEM145 KdNOT USED in h03 scripts directlynMstrc_aav_lnp_stack_pkpd.py does not hardcode KdKd is used in h09 scripts; h03 AAV stack model does not include dose-occupancy with explicit KdN/A for h03 scriptsN/A
15B8 enhancer size587bpultra_mini_promoter_shortlist.py, ultramini_vector_cpg_audit.py”ARBITER synthetic panel” / “Yoshimura 2018” (in cpg_audit docstring)Zhao 2025 PDF exists and MinerU parsed. BUT: B8 size (587 bp) is NOT explicitly stated in the Zhao 2025 text — paper describes B8 as E1P3×2+E2P2×2+E2P3×2 without reporting total bp. Year mismatch: script says “Yoshimura 2018” but primary paper is Zhao 2025.🚨 ⚠
16WPRE3-compact size247bpultra_mini_promoter_shortlist.py”Choi 2014”No PDF for Choi 2014 (Cell 157) in vault or incoming/
17Myo15 promoter sizes (956 bp, 1157 bp)956 / 1157bpultra_mini_promoter_shortlist.py”Zhao 2024” — year wrong, paper is Zhao 2025Zhao 2025 PDF exists and is parsed. Myo15 variants are mentioned in script but the exact 956 bp / 1157 bp values for Myo15 are NOT in Zhao 2025 (which focuses on Slc26a5 enhancers, not Myo15 promoter sizes). Separate Myo15 promoter paper needed.🚨 ❌
18Myo15 1611 bp native promoter1611bpultra_mini_promoter_shortlist.py”Liu 2024 Mol Ther Nucleic Acids; OTOF rescue in DFNB9 mice”No PDF for Liu 2024 Mol Ther Nucleic Acids in vault

Detailed Findings by Category

AAV Payload Capacity — Constants 1–3

C1: 4700 bp capacity. Both h03 scripts use AAV_ITR_CAPACITY = 4700. The Iranfar 2026 paper (PDF present, MinerU parsed) explicitly writes “the ~4700 bp packaging limit of a single AAV.” Omichi 2020 says “no more than 5.0 kb.” The 4700 value is internally consistent with Iranfar 2026 (a fully parsed, available paper). Classification upgraded from ⚠ to ✅ given Iranfar 2026 is the domain paper and it confirms 4700 bp.

C2–C3: ITR size 150 bp / 300 bp total. Samulski 1987 is cited by name in ultramini_vector_cpg_audit.py as the source for the AAV2 ITR consensus (145 bp). The 131 bp proxy used in the script is an approximation noted as such. No Samulski 1987 PDF is in the vault. This is a well-established molecular biology fact but has no locally parsed primary source. Blocker-light: not clinically load-bearing but should have a paper note.

CpG Depletion Rules — Constants 6–9

C6: Human genome CpG density 9.7/kb. Stated in three scripts with no citation. This value is from Lander et al. 2001 (Human Genome Project) or equivalent genome analysis paper. No primary paper is cited anywhere in the scripts or vault.
🚨 PHANTOM CITE — value used without source. The 9.7/kb figure is referenced only in script comments as a comparative denominator (fold-vs-genome), not in a dose-critical calculation. However, any script result that reports “X× human genome average” is anchored to this number.

C7: Kazusa codon table. URL cited directly in script source code — acceptable as primary online database. ✅

C8: Sharp & Li 1987 CAI. Paper cited by name in two scripts but no PDF present. This is the canonical CAI method paper and its algorithm is implemented correctly, but the primary source is not locally parsed.

C9: ≤5% CAI cost threshold. No source. This is a design choice made without literature backing. No paper defines 5% as the appropriate CAI penalty ceiling for AAV codon optimization. BLOCKER for CAI optimization decisions: the choice to maximize CpG depletion subject to this constraint has no published precedent cited.

TMEM145 Interface — Constants 10–11

C10: STRC TMEM145 interface zone aa 1603–1770. Derstroff 2026 confirmed — paper note at /papers/Derstroff et al 2026 TMEM145 Paper.md. This is ✅ provided Derstroff 2026 actually specifies these residues (paper note exists).

C11: GPI omega site S1749. NetGPI 1.1 tool output cited (primary computational result). Method paper (Gíslason 2021) not in vault but the tool itself is a primary source. Acceptable as-is.

TMEM145 Kd — Status in h03

h03-specific scripts do not hardcode a STRC×TMEM145 Kd. The strc_aav_lnp_stack_pkpd.py model partitions OHC populations using transduction fraction and fold-expression values (from Iranfar 2026 / Holt 2021) but does not compute dose-occupancy against Kd. The Kd gap documented in strc-tmem145-interactions affects h09 and h26 primarily; in h03 the transgene expression level is input as a fold-expression scalar from Iranfar 2026, not derived from Kd. No Kd blocker for h03 scripts specifically. However: if/when h03 advances to Phase 4 coIP quantification, Kd becomes load-bearing.

AF3 Confidence Thresholds — Constant 13

The hub mentions “ipTM 0.43, 23/41 contacts in GOLD zone” (TMEM145 complex) and “GOLD ipTM 0.68” (homodimer). These are reported values from AF3 server runs, not hardcoded thresholds in scripts. The verdict logic in mini_strc_interface_preservation.py uses algorithmic RMSD thresholds (0.5 Å, 3 Å, 80% criterion — constant 12) not sourced to literature. This is the honest weak point: the 80%-within-3Å rule and the 0.5 Å hot-contact criterion for “binding pocket preserved” have no primary citation.

B8 Enhancer — 🚨 Phantom cite cluster (Constant 15)

The script ultramini_vector_cpg_audit.py docstring cites “ARBITER synthetic (Yoshimura 2018)” for the B8 enhancer. This is wrong on two counts:

  1. The ARBITER workflow and B8 enhancer are from Zhao et al. 2025 (Cell, not 2018). The paper by Zhao et al. describes the ARBITER system and specifically engineers enhancer B8 as E1P3×2+E2P2×2+E2P3×2. Yoshimura 2018 is a different paper (not in vault) and does not describe B8.
  2. The 587 bp size for B8 is NOT explicitly stated in Zhao 2025. The paper gives the sizes of the constituent modules (E1P3 = 93 bp, E2P2 = 132 bp, E2P3 = 128 bp) which would sum to 2×93 + 2×132 + 2×128 = 706 bp — not 587 bp. The size cited in the script does not match back-calculation from Zhao 2025 Table S2 modules.

🚨 PHANTOM CITE: “Yoshimura 2018” for B8 does not exist. Zhao 2025 is the correct source, and the 587 bp size needs verification against Table S2 (which was not available in the MinerU parse — SI tables were not extracted).

Action required: Retrieve Zhao 2025 SI (Table S2) to confirm exact B8 size in bp. If 587 bp is from another source, that source must be identified.

Promoter Sizes — 🚨 Wrong citations (Constants 17–18)

Myo15 956 bp and 1157 bp are cited as “Zhao 2024 Research.” The Zhao paper is 2025, not 2024. Furthermore, Zhao 2025 focuses on Slc26a5 / prestin enhancers (B8), not on Myo15 promoter sizes. The 956 bp / 1157 bp values for a truncated Myo15 promoter appear to come from a different paper (possibly Iizuka et al. or another Myo15 promoter characterization). No source has been identified.

🚨 PHANTOM CITE: Myo15 956 bp / 1157 bp attributed to “Zhao 2024” — paper does not describe these constructs.

Liu 2024 Mol Ther Nucleic Acids for the 1611 bp Myo15 native promoter: no PDF in vault. This is a real paper that likely exists (Myo15 1.6 kb promoter has been described in the OTOF gene therapy context), but no locally parsed source is present.

AAV Fold-Expression from Iranfar 2026 (Constants in strc_aav_lnp_stack_pkpd.py)

The sweep uses AAV_FOLD_WHEN_TRANSDUCED = [3.0, 5.0, 7.0, 10.0] attributed to “Iranfar 2026 range” in a comment. Iranfar 2026 reports near-normal hearing restoration (~0 dB from WT at P30 across 5–40 kHz) in 60% of transduced OHCs — this represents a functional fold near threshold recovery, not a protein-expression fold. The fold-expression range 3–10× is an internal interpretation of the transduction outcome, not an explicit parameter from Iranfar 2026. Source claim is imprecise but not fabricated — Iranfar 2026 provides the biological endpoint that motivates the range; the specific fold values are modeling choices. Flag as ⚠ with note.

Dual-Vector Recombination Efficiency — dual_vector_otof_calibration.py

recombination_efficiency = 0.50 (50%) anchored to Omichi 2020 model calibration. Omichi 2020 measures co-transduction (65.6% OHC for dual AAV2) but does not directly measure recombination efficiency as a separate parameter from co-transduction. The 50% figure is derived from the ratio of functional dual-vector expression to co-transduction rate — this is a model-internal inference, not a measured number from Omichi 2020. However, the paper does not provide a cleaner source, and the modeling approach is disclosed in the script. ⚠.

The OTOF_CLINICAL_DATA block (DB-OTO cohort 1/2 titers and estimated transduction) cites “Lustig et al. 2024 (NEJM Evidence) + Oesterle 2024 (Laryngoscope)” and “Sun et al. 2024 (Lancet)” — none of these PDFs are in the vault. These are load-bearing calibration points for the dual-vector advantage calculation. ⚠ (lower priority than B8 and Myo15 phantom cites, but should be retrieved).

Human Cochlear Volume — dual_vector_otof_calibration.py

cochlear_volume_uL: 50 for human cochlear fluid volume. The cochlear-pkpd literature topic file (cochlear-pkpd) covers this parameter with primary sources from 2026-04-23 audit. Cross-hypothesis sourcing applies. ✅ (via cochlear-pkpd).

Phantom Citations Flagged 🚨

#Phantom claimIn scriptWhat the script saysReality
P1”Yoshimura 2018” as source for ARBITER/B8ultramini_vector_cpg_audit.py docstring”B8 enhancer — ARBITER synthetic (Yoshimura 2018)“Zhao et al. 2025 is the ARBITER/B8 paper. No Yoshimura 2018 ARBITER paper exists.
P2”Zhao 2024 Research” for Myo15 956/1157 bpultra_mini_promoter_shortlist.py"Myo15_956": {"size": 956, "source": "Zhao 2024 Research; HC-exclusive in AAV-PHP.eB"}Zhao 2025 describes Slc26a5/B8, not Myo15 promoter sizes. Year wrong, paper wrong.
P3B8 size 587 bp without explicit sourceultra_mini_promoter_shortlist.py"B8_enhancer": {"size": 587, ...}Zhao 2025 module back-calculation gives ~706 bp, not 587 bp. Table S2 not retrieved.

Remaining Gaps After Agent Pass

The following constants have no primary paper currently in the vault or require SI retrieval:

  1. B8 exact size in bp — needs Zhao 2025 Table S2 from SI. PDF is in vault and MinerU parsed, but SI tables were not extracted. Next step: fetch SI PDF directly.
  2. Myo15 956/1157 bp promoter paper — needs identification of the actual primary source. Likely Iizuka et al. 2015 EMBO Mol Med or a specific Myo15-AAV paper. Not currently in vault.
  3. Liu 2024 Mol Ther Nucleic Acids for Myo15 1611 bp — PDF not in vault. Retrievable via PubMed (open access likely).
  4. Human genome CpG density 9.7/kb — any human genome analysis paper (Lander 2001 or equivalent). Low priority (comparative denominator only, not load-bearing for dosing calculations).
  5. Sharp & Li 1987 CAI paper — foundational method paper. Not load-bearing for current results (CAI is computed correctly) but needed for formal provenance.
  6. ≤5% CAI cost threshold — no primary source exists for this design choice. Should be documented as an internal convention, not a literature-derived number.
  7. Samulski 1987 ITR — for AAV2 ITR sequence provenance. Not load-bearing (ITR sequences are part of standard AAV molecular biology) but should be in vault.
  8. RMSD verdict thresholds (0.5 Å hot contacts, 80%-within-3Å) — no primary paper. Internal computational heuristics. Should be explicitly documented as design choices.
  9. DB-OTO / Sun et al. 2024 clinical titer data — “Lustig 2024 NEJM Evidence” and “Sun 2024 Lancet” PDFs not in vault. Needed to validate dual_vector_otof_calibration.py calibration. These are published clinical trials — open access.
  10. Choi 2014 WPRE3-compact — 247 bp size for WPRE3-compact. Choi 2014 Cell 157 paper not in vault.

Critical blockers for h03 advancement:

  • 🔴 B8 exact size (affects vector architecture sizing — determines if all regulatory elements fit in 4700 bp budget)
  • 🔴 Myo15 promoter sizes (affects candidate selection for vector design)

Non-blocking but should be retrieved before lab phase:

  • 🟡 Choi 2014 WPRE3-compact size verification
  • 🟡 DB-OTO / Sun et al. clinical data for dual-vector recombination efficiency model

Ranking Delta

h03: S HELD. No tier change.

This is a literature audit, not a mechanistic proof. The audit closes lit_audit: deferred and establishes lit_audit: partial.

What this audit closes:

  • Confirms Iranfar 2026 (PDF present, MinerU parsed) provides the 4700 bp packaging capacity claim and the 60% OHC transduction / near-normal hearing result used in the AAV stack model
  • Confirms Omichi 2020 (PDF present, MinerU parsed) provides the 83.9% single-vector and 65.6% dual-vector OHC transduction calibration data
  • Confirms Zhao 2025 (PDF present, MinerU parsed) is the correct ARBITER/B8 enhancer paper — fixes the “Yoshimura 2018” phantom cite in script
  • Confirms STRC×TMEM145 Kd is NOT a load-bearing constant in h03 scripts (Kd blocker is h09/h26 only)

What remains open (blockers for lab gate):

  • B8 exact size must be confirmed from Zhao 2025 SI before finalizing vector architecture bp budget
  • Myo15 source must be found before Myo15 candidates can be ordered

Tier/score/next_step: unchanged. Docstring corrections to ultramini_vector_cpg_audit.py and ultra_mini_promoter_shortlist.py are needed (fix “Yoshimura 2018” → “Zhao 2025”, fix “Zhao 2024” → “Zhao 2025”, note B8 size as unconfirmed from SI).

Connections