RDKit
Open-source cheminformatics toolkit. Provides SMILES parsing, 3D conformer generation, descriptor calculation (logP/TPSA/MW/HBD/HBA/rotbonds), Morgan/ECFP fingerprints, Tanimoto similarity, force-field optimization, substructure search, and a Python API (rdkit.Chem).
Reference: Landrum G — RDKit (2006-present), rdkit.org.
What It Does
- Parse SMILES →
Molobject (Chem.MolFromSmiles) - Generate 3D conformers (
AllChem.EmbedMolecule, ETKDG) - Compute physicochemical descriptors (Crippen logP, TPSA, MW, etc.)
- Generate Morgan / ECFP / MACCS / RDKit fingerprints
- Compute Tanimoto / Dice / Tversky similarity between fingerprints
- Substructure / SMARTS / scaffold operations
Install
pip install rdkit # pip-prebuilt wheel; needs Python 3.9-3.13Verified 2026-04-26: installed on Python 3.11 (~/Library/Python/3.11/lib/python/site-packages). NOT available on Python 3.14 (default python3 on this Mac). Always invoke RDKit scripts as python3.11 ….
How to Use
Morgan fingerprint + Tanimoto
from rdkit import Chem
from rdkit.Chem import AllChem, DataStructs
m1 = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O") # aspirin
m2 = Chem.MolFromSmiles("OC(=O)c1ccccc1O") # salicylic acid
fp1 = AllChem.GetMorganFingerprintAsBitVect(m1, radius=2, nBits=2048)
fp2 = AllChem.GetMorganFingerprintAsBitVect(m2, radius=2, nBits=2048)
DataStructs.TanimotoSimilarity(fp1, fp2) # → 0.39DEPRECATION: GetMorganFingerprintAsBitVect is deprecated in 2026.x; the new path is from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator. We still use the legacy call in our scripts; it works, just emits a warning.
Crippen logP + TPSA
from rdkit.Chem import Descriptors
Descriptors.MolLogP(mol) # Crippen logP
Descriptors.TPSA(mol) # topological polar surface area
Descriptors.NumHDonors(mol), Descriptors.NumHAcceptors(mol)
Descriptors.NumRotatableBonds(mol), Descriptors.MolWt(mol)STRC Research Usage
- Phase 8d v5.2 library generation —
pharmacochaperone_phase8d_v5_2_library_expanded.pyuses RDKit for SMILES-from-fragment construction + descriptor calculation (MW, logP, TPSA, HBD/A, rotbonds) on the 384-compound v5.2 library. - Phase 8h-lite #5 Tanimoto vs ototoxins (STRC h01 Phase 8h-lite Light Computational Evidence Package 2026-04-26) — Morgan FP r=2, 2048 bits, lead vs 12-compound ototoxin panel. Max Tanimoto 0.127 (aspirin) → no chemical-class motif overlap.
- Future Phase 8 v5.3 design — RDKit RGroupDecomposition + Reaction-SMARTS for tetrazole/quinoxaline scaffold mutations on the v5.2 winning core.
Known Limitations
- Crippen logP is parameter-fit; for ionizable / charged species it’s a weak approximation. Use
chemaxonorOPERAfor production-grade pKa. - Morgan fingerprints capture local atomic environments; they miss long-range pharmacophore patterns (use 3D Pharmer or ROCS for those).
- ETKDG conformer generation gives ensemble of low-energy conformers but no force-field re-minimization is automatic — chain
AllChem.MMFFOptimizeMoleculeafter embedding for a relaxed pose.
Connections
[applies-to]h01-pharmacochaperone[see-also]ADMET-AI — RDKit feeds SMILES into ADMET-AI for tox prediction[see-also]STRC Computational Scripts Inventory[ref]Schneider 2014 De Novo Molecular Design —2014-schneider-baringhaus-de-novo-molecular-design-book