Skip to content

RTFM × NotebookLM

NotebookLM caps you at 50 queries/day per notebook. RTFM removes that ceiling by indexing the answers locally: ask once, retrieve forever, offline, in milliseconds.

This guide shows how to plug notebooklm-mcp batch outputs into RTFM. Two paths, both work:

  • Markdown path — zero config, the default markdown parser already does the right thing
  • JSON path — typed metadata + faceted source filtering via a one-time YAML mapping

What notebooklm-mcp produces

Calling POST /batch-to-vault writes two files per question into a directory of your choice:

vault/
├── what-is-the-osbd-process.md       # frontmatter + answer + cited excerpts
└── what-is-the-osbd-process.json     # structured nblm-answer-v1 sidecar

The markdown is the human-friendly artifact. The JSON sidecar conforms to the nblm-answer-v1 JSON Schema (stable, semver-versioned — breaking changes ship as nblm-answer-v2). Full integration notes: notebooklm-mcp/deployment/docs/14-RTFM-INTEGRATION.md. Both files are guaranteed to coexist.

Path A — Markdown (zero config)

cd /path/to/vault
rtfm init --no-embeddings
rtfm add . --corpus nblm
rtfm sync

That's it. RTFM's markdown parser groups by ## Answer and ### [N] source headers, so:

Search query lands on… Chunk header path
Answer body keywords > Answer
Cited excerpt content > [N] <source name>
Source filename > [N] <source name>
Asked question > Q: <question>

YAML frontmatter (notebook_url, sources array, asked_at) is also indexed as plain text — full-text searchable but not typed.

Path B — JSON sidecar (typed metadata + edges)

For typed metadata in chunks.metadata (queryable via SQL), faceted source names, and cites edges between answer and source files, drop the nblm-answer.yaml mapping into .rtfm/mappings/:

# .rtfm/mappings/nblm-answer.yaml
# yaml-language-server: $schema=https://schemas.roomi-fields.com/rtfm-mapping-v1.json
name: nblm-answer-v1

match:
  schema_url: "https://schemas.roomi-fields.com/nblm-answer-v1.json"
  discriminator:
    type: nblm-answer

chunks:
  - title: "Q: {{ question }}"
    content: "{{ answer.text }}"
    metadata:
      notebook_id: "{{ notebook.id }}"
      notebook_url: "{{ notebook.url }}"
      asked_at: "{{ asked_at }}"

  - foreach: citations
    title: "{{ marker }} {{ source_name }}"
    content: "{{ source_text }}"
    metadata:
      source_name: "{{ source_name }}"
      citation_marker: "{{ marker }}"

edges:
  - relation: cites
    foreach: citations
    target: "{{ source_name }}"
    target_kind: literal

After re-syncing, every .json answer file produces:

  • 1 chunk for question + answer.text (typed metadata: notebook id, URL, date)
  • N chunks for each citation, with source_name and citation_marker in typed metadata
  • cites edges linking the answer to each citation source by name

Now you can write SQL like:

SELECT chunk_id, content
FROM chunks
WHERE json_extract(metadata, '$.source_name') LIKE '%Keller%';

…or use the answer file's notebook_id to group everything from the same notebook.

See the full JSON Schema Mappings reference for mapping syntax — target_kind, nested discriminators, multi-foreach chunks, etc.

[Once per notebook, periodic]
  CLI agent generates exhaustive question set
    → notebooklm-mcp /batch-to-vault → vault/*.md + vault/*.json

[At will, unlimited, offline, ~ms]
  Agent → rtfm_search → rtfm_expand → answer

Generate batches when knowledge changes (new sources added to the notebook, old answers stale). Query through RTFM the rest of the time. NotebookLM's 50/day quota becomes irrelevant.

Which path should I use?

If you… Use
just want it to work Path A (markdown)
want to filter results by source name Path B (JSON + mapping)
want a graph of answers ↔ sources Path B + index the cited PDFs too
are not sure Path A first, then add Path B if you hit a limit

Both paths can coexist — the markdown and JSON files are independent and search results are deduplicated by content hash.

Notes on the cites edge

The mapping above produces cites edge candidates. RTFM's edge resolver currently materializes import, link, and include relations into the edges table; custom relations like cites pass through extract_edges but are not yet stored. If the cited PDFs are themselves indexed in RTFM, the resolver will be extended to match cites targets by filename — track [issue #TBD] for status.

In the meantime, citation source names are queryable via chunks.metadata.source_name, which covers most retrieval needs without a graph.