Paralog Off-Target Rule

Any therapy that recognises its target by sequence match (antisense oligonucleotides, siRNA, guide RNAs for Cas/base-editors/prime-editors, mRNA-hybridising LNPs) must explicitly pass a paralog-discrimination filter. A near-identical paralog (≥95% nucleotide identity over the targeted window) is not a soft risk — it is an automatic off-target that must be disqualified at the design stage or every candidate will fail at the validation stage.

The rule in one line

If paralog identity ≥ 95% over the targeted window, design without a paralog-discrimination filter is incomplete — not conservative, just incomplete. The off-target will be found in Phase 2, not dodged by luck.

Why this keeps catching us

Most sequence-design pipelines (ASO Tm/GC/hairpin scoring, pegRNA discrimination scoring, siRNA thermodynamic asymmetry) optimise for on-target hybridisation but treat off-target as a coarse genome-wide count. When a paralog is 97-99.6% identical (as STRCP1 is to STRC), the paralog is genome-wide “almost on-target” — the scoring function mistakes it for the real target and the guide/ASO cannot tell them apart without a seed-region mismatch strategy.

The four failure modes (tonight’s STRC evidence)

All four surfaced in the same vault on 2026-04-22:

  1. ASO splice-switch → paralog cross-hybridisation. 54/54 Phase-1 ASOs for STRC splice events also hybridise to STRCP1 with ≤2 mismatches. See STRC ASO Phase2 STRCP1 Paralog Cross-Hybridization.
  2. Prime editing → paralog protospacer collision. 61/61 Phase-3 PE3b pegRNAs have exactly one perfect off-target 100 kb downstream in STRCP1. See STRC PE Phase4 STRCP1 Paralog Off-Target.
  3. Variant pathogenicity tools → paralog contamination of MSA. SIFT/PolyPhen-2/CADD inflate apparent tolerance at STRC residues because STRCP1 variants bleed into the alignment. See STRC Pseudogene Problem.
  4. Expression-based safety → paralog confounds tissue read-out. GTEx cannot cleanly separate STRC vs STRCP1 expression. See STRC STRCP1 GTEx Expression Check.

Four distinct technologies, one shared failure mode. The rule is mechanism-agnostic.

Design-stage discrimination — the checklist

Before shipping any sequence-based candidate, answer all five:

  • What is the closest paralog of the target in the delivery genome?
  • What is the percent identity over the targeted window (not whole-gene)?
  • Where does the paralog differ? (ideally within the seed region for guides, within the target_sense for ASOs)
  • Does the scoring function reward using those mismatches — or treat them as indifferent?
  • What is the retained candidate count after the paralog filter — and is any one of them worth more than the 100% that fail it?

If the answer to the last question is “none,” the whole modality is disqualified for this gene — pivot (see STRC OTOA Paralog Cross-Rescue as an example of pivoting to a paralog-leveraging strategy instead of a paralog-discriminating one).

Paralog as feature, not only bug

The same paralog identity that breaks sequence-specific therapy can enable rescue by a different mechanism: if the paralog is functional in a different cell (like OTOA in cochlear supporting cells vs STRC in outer hair cells), pharmacologically redirecting it covers the deficit without ever touching the target gene. See STRC OTOA Paralog Cross-Rescue.

The general principle: paralog identity is a property of the genome. Whether it helps or hurts depends on which therapeutic mechanism you chose. Always resolve it before the mechanism choice, not after.

Connections

Ranking delta

Not an STRC hypothesis proof — no direct ranking move. Individual hypothesis impacts (ASO branch, PE branch) are recorded in their own Phase-2/Phase-4 notes and their ranking entries. This note anchors the cross-modality pattern so future therapy-design branches (base-editors, mRNA-LNP targeting, miRNA therapy) run the check before Phase 1 rather than discovering it in Phase 4.