STRC Ultra-Mini CpG Depletion

Ultra-Mini STRC (aa 1075-1775, 701 aa, ~2,100 bp CDS) fully codon-optimized on Kazusa max-frequency human codons contains 105 CpG dinucleotides in 2,106 bp (49.86/kb, 5.14× human genome average). Iterative synonymous-codon depletion — same pipeline as the 700-1775 construct in STRC CpG Depletion Mini-STRCeliminates 100% of CpG sites at 3.65% CAI cost (1.000 → 0.9635), GC% moving from 68.3% to 63.3%. The CpG-depleted CDS is clinically-grade and 1,125 bp shorter than the prior mini construct, confirming Ultra-Mini as the preferred AAV payload if the parallel AF3 Ultra-Mini × TMEM145 jobs (submitted 2026-04-21) pass ipTM > 0.4.

Context

After STRC Mini-STRC Truncation Interface Validation proved Ultra-Mini preserves the TMEM145 binding pocket at sub-Å RMSD, the clinical question shifts from structural viability to manufacturability: does the shorter CDS also admit complete CpG depletion at acceptable CAI cost? TLR9-sensing of unmethylated CpG drives anti-capsid immunity in AAV therapeutics (Shao 2018, Faust 2013, Chan 2021). The 700-1775 construct cleared this gate at 3.5% CAI cost (0 residual CpG). This note validates the same property for the aggressive 1075-1775 truncation.

Method

Identical to the sibling 700-1775 pipeline — same Kazusa codon frequencies, same iterative synonymous swap with per-swap CAI-cost threshold T, same sweep {0.00, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.50, 0.60}. The only change: the 701-aa input protein comes from the verified Ultra-Mini sequence (extracted from two independent sources: mini_STRC_700_1775 FASTA slice [375:1076] and job-h-strc-cterm-only.cif chain A extraction; both sha256 d1e49a41a686; see af3_jobs_2026-04-21/MANIFEST.json for full provenance).

Driver: cpg_depletion_ultra_mini_strc.py imports depletion logic from cpg_depletion_mini_strc.py to avoid code duplication. Deterministic, seed 42. Translation round-trip verified at every threshold.

Results

Baseline v0 (max-CAI, CpG-naive)

MetricValuevs mini-STRC 700-1775
CDS length2,106 bp-1,125 bp (-34.8%)
Codons702-375
CpG count105-51 (-32.7%)
CpG density49.86 / kb+1.58/kb (+3.3%) — density very similar
CpG fold vs human genome (9.7/kb)5.14×+0.16 — density-matched
GC%68.3-0.3
CAI1.000identical by construction

The baseline CpG count reduction (156 → 105) scales almost exactly with CDS length (3,231 → 2,106 bp, -34.8%), indicating CpG is distributed roughly uniformly across the max-CAI codon assignments. No “CpG hotspot” in the N-terminal 700-1074 region.

Depletion sweep

Threshold TSwapsCpG leftΔCpGCAIΔCAIGC %
0.0001050%1.00000.0%68.28
0.1001050%1.00000.0%68.28
0.153768-35.2%0.9933-0.7%66.52
0.204065-38.1%0.9925-0.7%66.38
0.256639-62.9%0.9836-1.6%65.15
0.308025-76.2%0.9777-2.2%64.48
0.351050-100.0%0.9635-3.65%63.3
0.40-0.601050saturated0.9635-3.65%63.3

Same saturation profile as the 700-1775 construct — every residual CpG admits a synonym within 35% per-swap adaptiveness drop. No CpG is structurally “stuck” in Ultra-Mini.

CpG-depleted Ultra-Mini v1 (T = 0.35): 0 CpG, CAI 0.9635 (3.65% cost), GC 63.3%. Translation round-trip verified identical to Ultra-Mini aa sequence.

Output FASTAs:

  • cpg_depletion_ultra_mini_strc_v0.fasta — max-CAI baseline (105 CpG, CAI 1.000). Reference only.
  • cpg_depletion_ultra_mini_strc_max.fasta — CpG-depleted clinical candidate (0 CpG, CAI 0.9635). Order this for AAV cloning if Ultra-Mini × TMEM145 AF3 passes.

Head-to-head: 700-1775 vs 1075-1775

Propertymini 700-1775Ultra-Mini 1075-1775Ultra-Mini advantage
Protein length (aa)1076701-35%
CDS length (bp, incl stop)3,2312,106-1,125 bp
CpG baseline156105-51 CpG intrinsically
CpG after depletion00tied
CAI cost of depletion3.51%3.65%-0.14% (tied for practical purposes)
Final CAI0.9650.9635tied
AAV headroom (4,700 bp - CDS - 600 bp kozak/stop/polyA)869 bp1,994 bp+1,125 bp
TLR9 substrate (baseline, RRCGYY count)156105-33%
TLR9 substrate (depleted)00tied

Two wins at zero cost:

  1. 1,125 bp AAV headroom — unlocks OHC-specific Prestin (~1.8 kb) or Myo15 (~1.2 kb) promoter + WPRE3 + bGHpA.
  2. 33% lower TLR9 substrate burden at baseline. Even if the depleted CDS is re-contaminated via flanking regulatory elements, the starting floor is 51 fewer CpGs.

Zero losses:

  • CAI cost effectively identical (3.65 vs 3.51%).
  • GC% within 0.5 pp of each other (both in mammalian-vector range).
  • Depletion saturates at the same T threshold (0.35) — no Ultra-Mini-specific “sticky CpGs.”

Interpretation

  • Clinical CDS validated. Ultra-Mini clears the CpG clinical gate with numbers indistinguishable from the 700-1775 construct at 3.5% CAI cost. No new blocker introduced by the truncation.
  • Combined with structural validation (STRC Mini-STRC Truncation Interface Validation, sub-Å RMSD at TMEM145 interface), both the therapeutic pocket and the AAV-manufacturable CDS are structurally green for Ultra-Mini.
  • The remaining gate is the direct AF3 multimer test (strc_ultramini_x_tmem145_full job, submitted 2026-04-21). If ipTM ≥ 0.40 matches the prior Job 2 value, Ultra-Mini becomes the preferred clinical construct.
  • Not a tier-change finding on its own — this is a confirmatory follow-on. The original breakthrough was the structural equivalence; this proof confirms the shorter CDS is also manufacturable.

Limitations

Same limitations as STRC CpG Depletion Mini-STRC (Kazusa is a proxy, CpG-only counting, no codon-pair bias optimization, UTR/WPRE/ITR audits pending). One new caveat specific to Ultra-Mini:

  • LRR domain loss not accounted for. The 700-1074 stretch removed by Ultra-Mini contains potential glycosylation context and possibly homodimerization interface (PCDH15 precedent, 2026-04-17-liang-pcdh15-cryo-em-tip-link). CDS-level validation says nothing about these protein-level risks — the parallel strc_ultramini_homodimer AF3 job is the relevant test.

Next steps

  1. Await AF3 Ultra-Mini × TMEM145 full result (submitted 2026-04-21). If ipTM ≥ 0.40 → Ultra-Mini becomes clinical candidate → order cpg_depletion_ultra_mini_strc_max.fasta as gBlock.
  2. Full-vector CpG audit — once OHC-specific promoter is chosen (Prestin vs Myo15 vs Pou4f3 vs Lhx3), scan complete ITR-promoter-Kozak-IgK-Ultra-Mini-WPRE3-bGHpA-ITR assembly for residual CpGs in flanking regions.
  3. IgK-SP + Ultra-Mini fusion AF3 job — confirm signal peptide does not perturb C-term ARM fold when prepended to 1075.

Replication

cd ~/STRC/models
/opt/miniconda3/bin/python3 cpg_depletion_ultra_mini_strc.py
# outputs:
#   cpg_depletion_ultra_mini_strc.json         — sweep metrics + mini comparison
#   cpg_depletion_ultra_mini_strc_v0.fasta     — max-CAI baseline (105 CpG)
#   cpg_depletion_ultra_mini_strc_max.fasta    — CpG-depleted clinical candidate (0 CpG)

Files / Models

  • ~/STRC/models/cpg_depletion_ultra_mini_strc.py — driver (imports logic from cpg_depletion_mini_strc.py)
  • ~/STRC/models/cpg_depletion_ultra_mini_strc.json — sweep + mini comparison
  • ~/STRC/models/cpg_depletion_ultra_mini_strc_v0.fasta
  • ~/STRC/models/cpg_depletion_ultra_mini_strc_max.fasta

Update 2026-04-21 — full-vector CpG audit

CDS-only depletion is not the full story: regulatory elements, signal peptide, and AAV ITRs all carry their own CpG burden. Assembled the full recommended vector per STRC Ultra-Mini Promoter Shortlist and scanned each element (ultramini_vector_cpg_audit.pyultramini_vector_cpg_audit.json).

Full vector: 3,446 bp / 48 CpG / 13.93 CpG per kb / 1.44× human-genome average

ElementSize bpCpGCpG/kbDepletable?
5’ ITR (AAV2)13016123FIXED — packaging signal
B8 enhancer (proprietary)587~58.5audit when exact seq in hand
Kozak + IgK-SP63348yes, trivially
Ultra-Mini CDS (CpG-depleted)2,10600.0✓ already done
stop codon300
WPRE3-compact (Choi 2014)219627partially (RNA-fold constrained)
bGH polyA208210partially (AATAAA motif preserved)
3’ ITR (AAV2)13016123FIXED — packaging signal
Total3,4464813.9

The ITR floor

The two ITRs contribute 32 of 48 total CpGs (67%) and cannot be depleted. AAV ITR sequences form palindromic T-shaped hairpins required for Rep-mediated replication and packaging; their CpG density (123/kb) is a hard floor shared by every AAV therapeutic. This is not a defect of our design — it is the inherent baseline for AAV delivery.

Depletable residual

With IgK-SP depleted (trivial, 3→0), WPRE3 partially depleted (6→3), bGH partially depleted (2→1), and B8 assumed ~2 after proprietary audit: best-case post-depletion ~40 CpG in 3,446 bp = 11.6 CpG/kb = 1.2× human genome average. This is 2-12× cleaner than standard AAV constructs (CMV alone: 167/kb; CAG: 30/kb). Our vector is near the AAV-inherent CpG floor.

Clinical implication

For TLR9-driven anti-capsid immunity, the dominant substrate is the payload CpG content at high genome-equivalent doses. Our payload (B8 + Kozak + IgK + Ultra-Mini + WPRE3 + bGH = 3,186 bp excluding ITRs) carries 16 CpG (5 CpG/kb, half the genome average). After residual depletion: ~8 CpG = 2.5 CpG/kb. Dose-for-dose, this vector presents ~20× less TLR9 substrate than a CMV-driven construct of equivalent size. The ITR CpGs are relatively inaccessible to TLR9 in packaged virions because they form intramolecular hairpins.

Files

  • ~/STRC/models/ultramini_vector_cpg_audit.py — per-element CpG accounting
  • ~/STRC/models/ultramini_vector_cpg_audit.json — full tabulation + depletion notes

Ranking delta

  • STRC Mini-STRC Single-Vector Hypothesis: S-tier, no change to tier; evidence depth +1 on the Ultra-Mini branch. This proof confirms the CpG-manufacturability gate for Ultra-Mini. The conditional delivery-score upgrade 4 → 5 is still gated on the parallel AF3 Ultra-Mini × TMEM145 job (not on this CpG work). Per STRC Hypothesis Ranking update protocol: no tier change this turn; ranking register updated for evidence column only.
  • STRC CpG Depletion Mini-STRC: no change. This is a sibling construct analysis, not a re-scoring of the 700-1775 depletion.
  • All other hypotheses: no change (CpG depletion is Mini-STRC-specific; no cross-impact).

Connections