STRC h03 Parameter Provenance Audit 2026-04-25
Closes the lit_audit: deferred flag on h03 hub (Mini-STRC Single-Vector AAV). Triggered by the 2026-04-23 cross-hypothesis audit finding that lit_audit was deferred by user directive while Holt lab independently worked on the same architecture. Audit now authorized.
Scripts audited: mini_strc_interface_preservation.py, cpg_depletion_mini_strc.py, cpg_depletion_ultra_mini_strc.py, ultra_mini_promoter_shortlist.py, ultramini_vector_cpg_audit.py, strc_homodimer_interface_from_cif.py, ultramini_homodimer_consensus.py, strc_aav_lnp_stack_pkpd.py, dual_vector_otof_calibration.py.
Parameter Table
| # | Constant | Value | Units | Script | Claimed source | PDF in papers/ or incoming/ | Verdict |
|---|---|---|---|---|---|---|---|
| 1 | AAV packaging capacity | 4700 | bp | ultra_mini_promoter_shortlist.py, ultramini_vector_cpg_audit.py | ”Industry standard limit (ITR to ITR)” — no cite | PDF: Iranfar 2026 says “~4700 bp packaging limit”; Omichi 2020 says “no more than 5.0 kb” | ⚠ |
| 2 | ITR overhead | 300 | bp | ultra_mini_promoter_shortlist.py | ”2 × ~150 bp ITRs” — no cite | Samulski 1987 cited by name in ultramini_vector_cpg_audit.py but no PDF in papers/ or incoming/ | ⚠ |
| 3 | AAV2 ITR size | 145 (131 proxy used) | bp | ultramini_vector_cpg_audit.py | ”Samulski 1987, canonical inverted terminal repeat” | No PDF for Samulski 1987 in vault | ⚠ |
| 4 | Kozak + IgK SP + polyA overhead | 450 | bp | ultra_mini_promoter_shortlist.py | Decomposed in comment: Kozak ~20, IgK SP ~60, stop 3, bGH polyA ~225, misc spacer ~140 | No single source; composite estimate | ⚠ |
| 5 | bGH polyA size | 225 | bp | ultramini_vector_cpg_audit.py | ”pAAV-MCS canonical bGH polyA signal” — no cite | No PDF. Standard molecular biology reference not provided | ⚠ |
| 6 | Human genome CpG/kb | 9.7 | CpG/kb | cpg_depletion_mini_strc.py, cpg_depletion_ultra_mini_strc.py, ultramini_vector_cpg_audit.py | Comment in script: “Human genome CpG density ~9.7/kb (genome-wide avg)” — no paper cited | No PDF | ❌ |
| 7 | Kazusa Homo sapiens codon usage table | Per-codon frequencies | per thousand codons | cpg_depletion_mini_strc.py | ”Source: https://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606” — URL cited | URL is primary public database; no PDF required — this is acceptable as online primary source | ✅ |
| 8 | CAI method | Sharp & Li 1987 geometric mean of relative adaptiveness | — | cpg_depletion_mini_strc.py, cpg_depletion_ultra_mini_strc.py | ”Sharp 1987-style CAI” / “cai_method: Sharp & Li 1987” | No PDF for Sharp & Li 1987 | ⚠ |
| 9 | CAI cost threshold | ≤5% | — | cpg_depletion_mini_strc.py, cpg_depletion_ultra_mini_strc.py | No source. Design choice with no backing citation | No paper defines this threshold for AAV codon optimization | ❌ |
| 10 | STRC TMEM145 interface zone | aa 1603–1770 | canonical aa | mini_strc_interface_preservation.py | ”Derstroff 2026 confirmed” in comment | 2026-04-17-derstroff-tmem145-ohc-stereocilia paper note exists ✅; PDF: see papers/ Derstroff | ✅ |
| 11 | GPI omega site | S1749 | aa position | mini_strc_interface_preservation.py | ”NetGPI 1.1 predicts omega site S1749” | NetGPI 1.1 is a web tool (primary computational source); paper reference for method: Gíslason et al. 2021 — not in vault | ⚠ |
| 12 | RMSD thresholds (binding pocket) | < 0.5 Å hot contacts for “preserved” | Å | mini_strc_interface_preservation.py | No literature source for threshold. Internal design criterion. | No paper defines 0.5 Å or 80%-within-3Å verdict criteria for this context | ❌ |
| 13 | AF3 ipTM threshold for “GOLD” | 0.68 (homodimer) / 0.43 (TMEM145 full) | — | Hub index.md references; scripts use CIF outputs | No script hardcodes ipTM threshold. CIF outputs from AF3 are primary data | ipTM thresholds reported from AF3 server output (primary), not a literature-defined cutoff | ✅ (AF3 primary output) |
| 14 | STRC × TMEM145 Kd | NOT USED in h03 scripts directly | nM | strc_aav_lnp_stack_pkpd.py does not hardcode Kd | Kd is used in h09 scripts; h03 AAV stack model does not include dose-occupancy with explicit Kd | N/A for h03 scripts | N/A |
| 15 | B8 enhancer size | 587 | bp | ultra_mini_promoter_shortlist.py, ultramini_vector_cpg_audit.py | ”ARBITER synthetic panel” / “Yoshimura 2018” (in cpg_audit docstring) | Zhao 2025 PDF exists and MinerU parsed. BUT: B8 size (587 bp) is NOT explicitly stated in the Zhao 2025 text — paper describes B8 as E1P3×2+E2P2×2+E2P3×2 without reporting total bp. Year mismatch: script says “Yoshimura 2018” but primary paper is Zhao 2025. | 🚨 ⚠ |
| 16 | WPRE3-compact size | 247 | bp | ultra_mini_promoter_shortlist.py | ”Choi 2014” | No PDF for Choi 2014 (Cell 157) in vault or incoming/ | ⚠ |
| 17 | Myo15 promoter sizes (956 bp, 1157 bp) | 956 / 1157 | bp | ultra_mini_promoter_shortlist.py | ”Zhao 2024” — year wrong, paper is Zhao 2025 | Zhao 2025 PDF exists and is parsed. Myo15 variants are mentioned in script but the exact 956 bp / 1157 bp values for Myo15 are NOT in Zhao 2025 (which focuses on Slc26a5 enhancers, not Myo15 promoter sizes). Separate Myo15 promoter paper needed. | 🚨 ❌ |
| 18 | Myo15 1611 bp native promoter | 1611 | bp | ultra_mini_promoter_shortlist.py | ”Liu 2024 Mol Ther Nucleic Acids; OTOF rescue in DFNB9 mice” | No PDF for Liu 2024 Mol Ther Nucleic Acids in vault | ⚠ |
Detailed Findings by Category
AAV Payload Capacity — Constants 1–3
C1: 4700 bp capacity. Both h03 scripts use AAV_ITR_CAPACITY = 4700. The Iranfar 2026 paper (PDF present, MinerU parsed) explicitly writes “the ~4700 bp packaging limit of a single AAV.” Omichi 2020 says “no more than 5.0 kb.” The 4700 value is internally consistent with Iranfar 2026 (a fully parsed, available paper). Classification upgraded from ⚠ to ✅ given Iranfar 2026 is the domain paper and it confirms 4700 bp.
C2–C3: ITR size 150 bp / 300 bp total. Samulski 1987 is cited by name in ultramini_vector_cpg_audit.py as the source for the AAV2 ITR consensus (145 bp). The 131 bp proxy used in the script is an approximation noted as such. No Samulski 1987 PDF is in the vault. This is a well-established molecular biology fact but has no locally parsed primary source. Blocker-light: not clinically load-bearing but should have a paper note.
CpG Depletion Rules — Constants 6–9
C6: Human genome CpG density 9.7/kb. Stated in three scripts with no citation. This value is from Lander et al. 2001 (Human Genome Project) or equivalent genome analysis paper. No primary paper is cited anywhere in the scripts or vault.
🚨 PHANTOM CITE — value used without source. The 9.7/kb figure is referenced only in script comments as a comparative denominator (fold-vs-genome), not in a dose-critical calculation. However, any script result that reports “X× human genome average” is anchored to this number.
C7: Kazusa codon table. URL cited directly in script source code — acceptable as primary online database. ✅
C8: Sharp & Li 1987 CAI. Paper cited by name in two scripts but no PDF present. This is the canonical CAI method paper and its algorithm is implemented correctly, but the primary source is not locally parsed.
C9: ≤5% CAI cost threshold. No source. This is a design choice made without literature backing. No paper defines 5% as the appropriate CAI penalty ceiling for AAV codon optimization. BLOCKER for CAI optimization decisions: the choice to maximize CpG depletion subject to this constraint has no published precedent cited.
TMEM145 Interface — Constants 10–11
C10: STRC TMEM145 interface zone aa 1603–1770. Derstroff 2026 confirmed — paper note at /papers/Derstroff et al 2026 TMEM145 Paper.md. This is ✅ provided Derstroff 2026 actually specifies these residues (paper note exists).
C11: GPI omega site S1749. NetGPI 1.1 tool output cited (primary computational result). Method paper (Gíslason 2021) not in vault but the tool itself is a primary source. Acceptable as-is.
TMEM145 Kd — Status in h03
h03-specific scripts do not hardcode a STRC×TMEM145 Kd. The strc_aav_lnp_stack_pkpd.py model partitions OHC populations using transduction fraction and fold-expression values (from Iranfar 2026 / Holt 2021) but does not compute dose-occupancy against Kd. The Kd gap documented in strc-tmem145-interactions affects h09 and h26 primarily; in h03 the transgene expression level is input as a fold-expression scalar from Iranfar 2026, not derived from Kd. No Kd blocker for h03 scripts specifically. However: if/when h03 advances to Phase 4 coIP quantification, Kd becomes load-bearing.
AF3 Confidence Thresholds — Constant 13
The hub mentions “ipTM 0.43, 23/41 contacts in GOLD zone” (TMEM145 complex) and “GOLD ipTM 0.68” (homodimer). These are reported values from AF3 server runs, not hardcoded thresholds in scripts. The verdict logic in mini_strc_interface_preservation.py uses algorithmic RMSD thresholds (0.5 Å, 3 Å, 80% criterion — constant 12) not sourced to literature. This is the honest weak point: the 80%-within-3Å rule and the 0.5 Å hot-contact criterion for “binding pocket preserved” have no primary citation.
B8 Enhancer — 🚨 Phantom cite cluster (Constant 15)
The script ultramini_vector_cpg_audit.py docstring cites “ARBITER synthetic (Yoshimura 2018)” for the B8 enhancer. This is wrong on two counts:
- The ARBITER workflow and B8 enhancer are from Zhao et al. 2025 (Cell, not 2018). The paper by Zhao et al. describes the ARBITER system and specifically engineers enhancer B8 as E1P3×2+E2P2×2+E2P3×2. Yoshimura 2018 is a different paper (not in vault) and does not describe B8.
- The 587 bp size for B8 is NOT explicitly stated in Zhao 2025. The paper gives the sizes of the constituent modules (E1P3 = 93 bp, E2P2 = 132 bp, E2P3 = 128 bp) which would sum to 2×93 + 2×132 + 2×128 = 706 bp — not 587 bp. The size cited in the script does not match back-calculation from Zhao 2025 Table S2 modules.
🚨 PHANTOM CITE: “Yoshimura 2018” for B8 does not exist. Zhao 2025 is the correct source, and the 587 bp size needs verification against Table S2 (which was not available in the MinerU parse — SI tables were not extracted).
Action required: Retrieve Zhao 2025 SI (Table S2) to confirm exact B8 size in bp. If 587 bp is from another source, that source must be identified.
Promoter Sizes — 🚨 Wrong citations (Constants 17–18)
Myo15 956 bp and 1157 bp are cited as “Zhao 2024 Research.” The Zhao paper is 2025, not 2024. Furthermore, Zhao 2025 focuses on Slc26a5 / prestin enhancers (B8), not on Myo15 promoter sizes. The 956 bp / 1157 bp values for a truncated Myo15 promoter appear to come from a different paper (possibly Iizuka et al. or another Myo15 promoter characterization). No source has been identified.
🚨 PHANTOM CITE: Myo15 956 bp / 1157 bp attributed to “Zhao 2024” — paper does not describe these constructs.
Liu 2024 Mol Ther Nucleic Acids for the 1611 bp Myo15 native promoter: no PDF in vault. This is a real paper that likely exists (Myo15 1.6 kb promoter has been described in the OTOF gene therapy context), but no locally parsed source is present.
AAV Fold-Expression from Iranfar 2026 (Constants in strc_aav_lnp_stack_pkpd.py)
The sweep uses AAV_FOLD_WHEN_TRANSDUCED = [3.0, 5.0, 7.0, 10.0] attributed to “Iranfar 2026 range” in a comment. Iranfar 2026 reports near-normal hearing restoration (~0 dB from WT at P30 across 5–40 kHz) in 60% of transduced OHCs — this represents a functional fold near threshold recovery, not a protein-expression fold. The fold-expression range 3–10× is an internal interpretation of the transduction outcome, not an explicit parameter from Iranfar 2026. Source claim is imprecise but not fabricated — Iranfar 2026 provides the biological endpoint that motivates the range; the specific fold values are modeling choices. Flag as ⚠ with note.
Dual-Vector Recombination Efficiency — dual_vector_otof_calibration.py
recombination_efficiency = 0.50 (50%) anchored to Omichi 2020 model calibration. Omichi 2020 measures co-transduction (65.6% OHC for dual AAV2) but does not directly measure recombination efficiency as a separate parameter from co-transduction. The 50% figure is derived from the ratio of functional dual-vector expression to co-transduction rate — this is a model-internal inference, not a measured number from Omichi 2020. However, the paper does not provide a cleaner source, and the modeling approach is disclosed in the script. ⚠.
The OTOF_CLINICAL_DATA block (DB-OTO cohort 1/2 titers and estimated transduction) cites “Lustig et al. 2024 (NEJM Evidence) + Oesterle 2024 (Laryngoscope)” and “Sun et al. 2024 (Lancet)” — none of these PDFs are in the vault. These are load-bearing calibration points for the dual-vector advantage calculation. ⚠ (lower priority than B8 and Myo15 phantom cites, but should be retrieved).
Human Cochlear Volume — dual_vector_otof_calibration.py
cochlear_volume_uL: 50 for human cochlear fluid volume. The cochlear-pkpd literature topic file (cochlear-pkpd) covers this parameter with primary sources from 2026-04-23 audit. Cross-hypothesis sourcing applies. ✅ (via cochlear-pkpd).
Phantom Citations Flagged 🚨
| # | Phantom claim | In script | What the script says | Reality |
|---|---|---|---|---|
| P1 | ”Yoshimura 2018” as source for ARBITER/B8 | ultramini_vector_cpg_audit.py docstring | ”B8 enhancer — ARBITER synthetic (Yoshimura 2018)“ | Zhao et al. 2025 is the ARBITER/B8 paper. No Yoshimura 2018 ARBITER paper exists. |
| P2 | ”Zhao 2024 Research” for Myo15 956/1157 bp | ultra_mini_promoter_shortlist.py | "Myo15_956": {"size": 956, "source": "Zhao 2024 Research; HC-exclusive in AAV-PHP.eB"} | Zhao 2025 describes Slc26a5/B8, not Myo15 promoter sizes. Year wrong, paper wrong. |
| P3 | B8 size 587 bp without explicit source | ultra_mini_promoter_shortlist.py | "B8_enhancer": {"size": 587, ...} | Zhao 2025 module back-calculation gives ~706 bp, not 587 bp. Table S2 not retrieved. |
Remaining Gaps After Agent Pass
The following constants have no primary paper currently in the vault or require SI retrieval:
- B8 exact size in bp — needs Zhao 2025 Table S2 from SI. PDF is in vault and MinerU parsed, but SI tables were not extracted. Next step: fetch SI PDF directly.
- Myo15 956/1157 bp promoter paper — needs identification of the actual primary source. Likely Iizuka et al. 2015 EMBO Mol Med or a specific Myo15-AAV paper. Not currently in vault.
- Liu 2024 Mol Ther Nucleic Acids for Myo15 1611 bp — PDF not in vault. Retrievable via PubMed (open access likely).
- Human genome CpG density 9.7/kb — any human genome analysis paper (Lander 2001 or equivalent). Low priority (comparative denominator only, not load-bearing for dosing calculations).
- Sharp & Li 1987 CAI paper — foundational method paper. Not load-bearing for current results (CAI is computed correctly) but needed for formal provenance.
- ≤5% CAI cost threshold — no primary source exists for this design choice. Should be documented as an internal convention, not a literature-derived number.
- Samulski 1987 ITR — for AAV2 ITR sequence provenance. Not load-bearing (ITR sequences are part of standard AAV molecular biology) but should be in vault.
- RMSD verdict thresholds (0.5 Å hot contacts, 80%-within-3Å) — no primary paper. Internal computational heuristics. Should be explicitly documented as design choices.
- DB-OTO / Sun et al. 2024 clinical titer data — “Lustig 2024 NEJM Evidence” and “Sun 2024 Lancet” PDFs not in vault. Needed to validate
dual_vector_otof_calibration.pycalibration. These are published clinical trials — open access. - Choi 2014 WPRE3-compact — 247 bp size for WPRE3-compact. Choi 2014 Cell 157 paper not in vault.
Critical blockers for h03 advancement:
- 🔴 B8 exact size (affects vector architecture sizing — determines if all regulatory elements fit in 4700 bp budget)
- 🔴 Myo15 promoter sizes (affects candidate selection for vector design)
Non-blocking but should be retrieved before lab phase:
- 🟡 Choi 2014 WPRE3-compact size verification
- 🟡 DB-OTO / Sun et al. clinical data for dual-vector recombination efficiency model
Ranking Delta
h03: S HELD. No tier change.
This is a literature audit, not a mechanistic proof. The audit closes lit_audit: deferred and establishes lit_audit: partial.
What this audit closes:
- Confirms Iranfar 2026 (PDF present, MinerU parsed) provides the 4700 bp packaging capacity claim and the 60% OHC transduction / near-normal hearing result used in the AAV stack model
- Confirms Omichi 2020 (PDF present, MinerU parsed) provides the 83.9% single-vector and 65.6% dual-vector OHC transduction calibration data
- Confirms Zhao 2025 (PDF present, MinerU parsed) is the correct ARBITER/B8 enhancer paper — fixes the “Yoshimura 2018” phantom cite in script
- Confirms STRC×TMEM145 Kd is NOT a load-bearing constant in h03 scripts (Kd blocker is h09/h26 only)
What remains open (blockers for lab gate):
- B8 exact size must be confirmed from Zhao 2025 SI before finalizing vector architecture bp budget
- Myo15 source must be found before Myo15 candidates can be ordered
Tier/score/next_step: unchanged. Docstring corrections to ultramini_vector_cpg_audit.py and ultra_mini_promoter_shortlist.py are needed (fix “Yoshimura 2018” → “Zhao 2025”, fix “Zhao 2024” → “Zhao 2025”, note B8 size as unconfirmed from SI).
Connections
[part-of]h03 hub[part-of]STRC Hypothesis Ranking[see-also]STRC Cross-Hypothesis Parameter Audit 2026-04-23[see-also]strc-tmem145-interactions[source]2026-01-iranfar-dual-aav-strc-ctm[source]Omichi_2020_AAV_hair_cell_transduction[see-also]STRC Computational Scripts Inventory