RDKit

Open-source cheminformatics toolkit. Provides SMILES parsing, 3D conformer generation, descriptor calculation (logP/TPSA/MW/HBD/HBA/rotbonds), Morgan/ECFP fingerprints, Tanimoto similarity, force-field optimization, substructure search, and a Python API (rdkit.Chem).

Reference: Landrum G — RDKit (2006-present), rdkit.org.

What It Does

  • Parse SMILES → Mol object (Chem.MolFromSmiles)
  • Generate 3D conformers (AllChem.EmbedMolecule, ETKDG)
  • Compute physicochemical descriptors (Crippen logP, TPSA, MW, etc.)
  • Generate Morgan / ECFP / MACCS / RDKit fingerprints
  • Compute Tanimoto / Dice / Tversky similarity between fingerprints
  • Substructure / SMARTS / scaffold operations

Install

pip install rdkit              # pip-prebuilt wheel; needs Python 3.9-3.13

Verified 2026-04-26: installed on Python 3.11 (~/Library/Python/3.11/lib/python/site-packages). NOT available on Python 3.14 (default python3 on this Mac). Always invoke RDKit scripts as python3.11 ….

How to Use

Morgan fingerprint + Tanimoto

from rdkit import Chem
from rdkit.Chem import AllChem, DataStructs
m1 = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O")  # aspirin
m2 = Chem.MolFromSmiles("OC(=O)c1ccccc1O")        # salicylic acid
fp1 = AllChem.GetMorganFingerprintAsBitVect(m1, radius=2, nBits=2048)
fp2 = AllChem.GetMorganFingerprintAsBitVect(m2, radius=2, nBits=2048)
DataStructs.TanimotoSimilarity(fp1, fp2)  # → 0.39

DEPRECATION: GetMorganFingerprintAsBitVect is deprecated in 2026.x; the new path is from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator. We still use the legacy call in our scripts; it works, just emits a warning.

Crippen logP + TPSA

from rdkit.Chem import Descriptors
Descriptors.MolLogP(mol)       # Crippen logP
Descriptors.TPSA(mol)          # topological polar surface area
Descriptors.NumHDonors(mol), Descriptors.NumHAcceptors(mol)
Descriptors.NumRotatableBonds(mol), Descriptors.MolWt(mol)

STRC Research Usage

  • Phase 8d v5.2 library generationpharmacochaperone_phase8d_v5_2_library_expanded.py uses RDKit for SMILES-from-fragment construction + descriptor calculation (MW, logP, TPSA, HBD/A, rotbonds) on the 384-compound v5.2 library.
  • Phase 8h-lite #5 Tanimoto vs ototoxins (STRC h01 Phase 8h-lite Light Computational Evidence Package 2026-04-26) — Morgan FP r=2, 2048 bits, lead vs 12-compound ototoxin panel. Max Tanimoto 0.127 (aspirin) → no chemical-class motif overlap.
  • Future Phase 8 v5.3 design — RDKit RGroupDecomposition + Reaction-SMARTS for tetrazole/quinoxaline scaffold mutations on the v5.2 winning core.

Known Limitations

  • Crippen logP is parameter-fit; for ionizable / charged species it’s a weak approximation. Use chemaxon or OPERA for production-grade pKa.
  • Morgan fingerprints capture local atomic environments; they miss long-range pharmacophore patterns (use 3D Pharmer or ROCS for those).
  • ETKDG conformer generation gives ensemble of low-energy conformers but no force-field re-minimization is automatic — chain AllChem.MMFFOptimizeMolecule after embedding for a relaxed pose.

Connections