RTFM × Obsidian — Vault Integration Guide¶
The Problem¶
Andrej Karpathy's LLM Wiki pattern transforms Obsidian vaults into AI-readable knowledge bases with three layers: raw sources, a compiled wiki, and a schema. Projects like Claudesidian, claude-obsidian, and obsidian-second-brain implement this pattern — and they all hit the same wall.
When the wiki is small (~100 notes), the LLM reads index.md and finds what it needs. When the wiki grows past 500 notes, index.md becomes unmanageable, the LLM reads everything into context, and tokens explode.
RTFM is the retrieval layer that removes this ceiling. FTS5 full-text search, semantic search, graph-based ranking, and progressive disclosure — all generating Obsidian-native navigation files your vault can read.
Quick Start¶
RTFM detects the vault, proposes corpus mappings from your folder structure, indexes everything, and generates _rtfm/ navigation files. Open Obsidian — the index is already there.
How It Works¶
What rtfm vault does¶
- Detects
.obsidian/to confirm it's a vault - Scans top-level folders, proposes each as a corpus (Research →
research, Publications →publications, etc.) - Creates
.rtfm/library.db— the search index - Syncs all corpora with incremental indexing
- Resolves
[[wikilinks]]into graph edges (follows Obsidian resolution rules) - Generates
_rtfm/— Obsidian-native navigation files with wikilinks, frontmatter, Mermaid diagrams - Configures
.mcp.jsonandCLAUDE.mdso Claude Code works immediately
What you get in Obsidian¶
_rtfm/
├── index.md # Hub: stats, corpus list, top connected documents
├── graph.md # Hub documents, orphans, broken links, Mermaid diagram
├── recent.md # 20 most recently modified files
└── corpus/
├── research.md # Per-corpus index
├── notes.md
└── ...
Every file uses:
- Wikilinks [[path/to/note]] — click to navigate
- YAML frontmatter — queryable by Dataview (TABLE generated_at FROM #rtfm)
- Callouts > [!info] — Obsidian-native stats blocks
- Mermaid diagrams — rendered natively by Obsidian
What the agent gets¶
In Claude Code, the agent uses rtfm_search instead of grepping your vault:
rtfm_search("formal grammars") → 5 results with scores, no content (300 tokens)
rtfm_expand("research--paper-42") → reads only the relevant section
Progressive disclosure: the context grows only by what's actually useful.
Architecture: RTFM vs Karpathy¶
Karpathy's model¶
raw/ ← Human drops sources here (immutable)
wiki/ ← LLM compiles structured notes
index.md ← LLM maintains a flat navigation catalog
CLAUDE.md ← Schema: conventions, workflows
Limitation: index.md is maintained by the LLM, doesn't scale past ~500 notes, and the LLM reads everything into context.
RTFM-augmented model¶
raw/ ← Human drops sources here (unchanged)
wiki/ ← LLM compiles structured notes (unchanged)
_rtfm/ ← RTFM generates intelligent navigation (replaces index.md)
CLAUDE.md ← Schema: references rtfm_search for retrieval
.rtfm/ ← Search index (SQLite + FTS5, invisible to Obsidian)
What changes: The LLM still writes the wiki. But instead of reading a flat index.md, it searches via RTFM. And instead of a manually maintained catalog, _rtfm/ provides auto-generated navigation based on actual content analysis, link graphs, and relevance scoring.
Why RTFM scales where index.md doesn't¶
Karpathy index.md |
RTFM | |
|---|---|---|
| Navigation | Flat catalog, LLM-maintained | Auto-generated, graph-aware |
| Search | Read index → scan pages | FTS5 + semantic + hybrid |
| Token cost | Grows with wiki size | Constant (~300 tokens per search) |
| Wikilinks | Written by LLM | Resolved and indexed as graph edges |
| Multi-format | Markdown only | 15 parsers (MD, Python, PDF, LaTeX, SQLite, Jupyter, CSV, XLSX, ...) |
| Metrics | None | Backlink counts, hub detection, orphan detection |
Topologies: One Vault or Many?¶
Karpathy recommends one vault per project. RTFM's corpus system makes this a choice, not a constraint.
Option 1: One vault per project (Karpathy-style)¶
Each project is its own Obsidian vault with its own rtfm vault.
Best for: Isolated projects with distinct knowledge domains.
Option 2: One vault, multiple corpora¶
A single vault with folders mapped to corpora. RTFM scopes searches by corpus.
~/vault/
├── Research/ → corpus "research"
├── Projects/App/ → corpus "app"
├── Publications/ → corpus "publications"
└── _rtfm/ → navigation for everything
The agent searches with rtfm_search("auth flow", corpus="app") — only sees what's relevant.
Best for: Cross-domain work where connections between topics matter.
Option 3: Code project + vault as external source¶
The code lives in a git repo. The vault is separate. RTFM bridges them.
~/code/my-app/ ← Git repo (Claude Code CWD)
├── src/
├── .rtfm/config.json ← sources: [project, vault/Research]
└── CLAUDE.md
~/vault/ ← Obsidian vault (rtfm vault)
├── Research/
├── _rtfm/
└── ...
The agent in my-app searches code AND vault notes in one query.
Best for: Code projects that need domain knowledge from a vault.
The rule of thumb¶
| The project is... | Recommended setup |
|---|---|
| Pure knowledge (research, wiki, notes) | rtfm vault in the vault directly |
| Pure code (app, library, tool) | rtfm init in the repo, vault as external source |
| Both (research + code) | Either works — vault with code as corpus, or separate + linked |
Wikilink Resolution¶
RTFM resolves [[wikilinks]] following Obsidian's rules:
- Exact filename match (case-insensitive,
.mdoptional) - Path-suffix match:
[[folder/Note]]matchessome/folder/Note.md - Disambiguation: when multiple files share a name, prefer the closest to the source file
Resolved links become graph edges in the database. This powers:
- Hub detection — notes with many backlinks rank higher in search
- Orphan detection — notes with no links (shown in _rtfm/graph.md)
- Centrality boost — rtfm_search can weight results by link density
Recommended Obsidian Plugins¶
- Dataview — Query RTFM-generated frontmatter across the vault
- Graph View (core) —
_rtfm/pages appear as connected nodes in the graph - Obsidian Web Clipper — Save web articles as markdown into
raw/(Karpathy recommends this)
CLI Reference¶
# Initialize vault
rtfm vault /path/to/vault # Auto-detect corpora, index, generate _rtfm/
rtfm vault /path/to/vault --no-embeddings # Skip semantic embeddings
rtfm vault /path/to/vault --no-output # Skip _rtfm/ generation
rtfm vault /path/to/vault --regenerate # Regenerate _rtfm/ without re-syncing
# After initialization, standard RTFM commands work
rtfm search "query" # Search across all corpora
rtfm search "query" --corpus research # Search specific corpus
rtfm sync # Re-sync all sources
rtfm status # Check index health
Regeneration¶
- Full:
rtfm vault --regeneraterebuilds all_rtfm/files from the database - Lightweight: after each sync,
_rtfm/recent.mdupdates automatically (single SQL query) - Manual: delete
_rtfm/and runrtfm vault --regeneratefor a clean rebuild
_rtfm/ files are never edited manually — they're always regenerable.