Research Artifact Retention Policy
Brain stores evidence and reproducibility metadata, not every bulky intermediate file. Scripts, small summaries, hashes, and final tables stay in Brain. Raw generated directories get archived or ignored unless they are the actual evidence.
Problem
Research workbenches can generate thousands of files: docking poses, logs, model caches, raw JSON, and temporary outputs. If every artifact is tracked, Brain git history becomes noisy and recovery becomes slow.
Keep in Brain
- Final scripts used to produce a result.
- Small JSON summaries that back an atomic proof note.
- Final tables copied into the proof note or linked from it.
- Parameter provenance tables.
- Hashes and paths for raw artifacts.
- A short note explaining what an artifact proves.
Keep outside Brain or ignore
- Docking run directories.
- Raw logs over a few hundred lines.
- Model caches and temporary outputs.
- Regenerable intermediate files.
- Repeated run folders where only the summary JSON changed.
Artifact registry pattern
Each research area with heavy compute should maintain an artifact registry:
| Date | Artifact | Path | Generated by | Hash | Proof note | Policy |
|---|---|---|---|---|---|---|
| 2026-04-24 | h01 5e mutant ensemble redock | models/... | script.py | sha256... | [[Proof Note]] | summary-only |Policy values:
tracked- small, canonical, useful to keep in git.summary-only- raw artifact exists locally, but only summary and hash are tracked.archive- keep in_archive/if historically useful.delete-ok- regenerable and not worth preserving.
STRC immediate rule
For models/docking_runs/ and raw logs/, prefer summary-only unless the directory is small and directly cited. The proof note should carry the scientific conclusion; the artifact registry carries reproducibility breadcrumbs.
Connections
[part-of]Vault Stack and Schemas[applies]STRC Computational Scripts Inventory[applies]STRC Hypothesis Ranking[see-also]Obsidian Infrastructure and Recovery