Run 1 000 NotebookLM questions overnight

This is the pattern we use to run very long batches of citation-backed Q&A against Google NotebookLM — from PhD literature reviews to market intelligence pipelines. Single laptop, single account, eight hours, one thousand structured answers in a JSONL file.

The whole thing fits in one shell loop, because the project exposes a plain REST API on http://localhost:3000. There is no SDK to learn, no agent harness to configure, no MCP client to wire.

What you need

This project running locally: npm run setup-auth (one-time Google login), then npm run start:http. Install guide.
A list of questions in a text file, one per line.
A notebook id. Either pick one from GET /notebooks/scrape or set a default with PUT /notebooks/:id/activate.
Optionally: a second Google account for rotation. Multi-account guide.

The minimum viable batch (10 lines of bash)

NOTEBOOK_ID="paste-your-id-here"
INPUT="questions.txt"
OUTPUT="answers.jsonl"

while IFS= read -r question; do
  curl -sS -X POST http://localhost:3000/ask \
    -H 'Content-Type: application/json' \
    -d "$(jq -n --arg q "$question" --arg n "$NOTEBOOK_ID" \
        '{question: $q, notebook_id: $n, source_format: "json"}')" \
    >> "$OUTPUT"
  echo >> "$OUTPUT"
done < "$INPUT"

That works. It does not survive a session expiry at hour 4, it does not throttle, it does not resume after a network blip, and it makes 1 000 sequential blocking calls. So for real batches we wrap it.

The production pattern

#!/usr/bin/env python3
"""Run a batch of NotebookLM questions through the local REST API.

Resumes safely on restart, handles re-auth, rotates accounts, throttles to
respect rate limits, and writes one JSON line per answer with citations.
"""

import json, time, sys
from pathlib import Path
import httpx

API = "http://localhost:3000"
NOTEBOOK_ID = "paste-your-id-here"
INPUT = Path("questions.txt")
OUTPUT = Path("answers.jsonl")
THROTTLE_SECONDS = 8           # average pace; tune to your account's quota
MAX_RETRIES = 3
ACCOUNTS = ["primary", "backup"]  # registered via `npm run accounts add`

def already_done() -> set[str]:
    """Resume support: skip questions already answered."""
    if not OUTPUT.exists():
        return set()
    done = set()
    for line in OUTPUT.read_text().splitlines():
        try:
            done.add(json.loads(line)["question"])
        except (json.JSONDecodeError, KeyError):
            continue
    return done

def switch_account(name: str) -> None:
    httpx.post(f"{API}/re-auth", json={"account": name}, timeout=120).raise_for_status()

def ask(question: str, account_idx: int = 0) -> dict:
    payload = {
        "question": question,
        "notebook_id": NOTEBOOK_ID,
        "source_format": "json",
    }
    for attempt in range(1, MAX_RETRIES + 1):
        try:
            r = httpx.post(f"{API}/ask", json=payload, timeout=180)
            r.raise_for_status()
            data = r.json()
            if data.get("success"):
                return data
            # rate-limited or quota — try the next account
            if "rate" in str(data.get("error", "")).lower():
                account_idx = (account_idx + 1) % len(ACCOUNTS)
                switch_account(ACCOUNTS[account_idx])
                continue
        except httpx.HTTPError as e:
            print(f"  attempt {attempt}: {e}", file=sys.stderr)
            time.sleep(2 ** attempt)
    raise RuntimeError(f"failed after {MAX_RETRIES} retries: {question}")

def main() -> None:
    done = already_done()
    questions = [q.strip() for q in INPUT.read_text().splitlines() if q.strip()]
    todo = [q for q in questions if q not in done]
    print(f"{len(done)} already answered · {len(todo)} to go")

    with OUTPUT.open("a") as f:
        for i, question in enumerate(todo, 1):
            t0 = time.time()
            answer = ask(question)
            row = {
                "question": question,
                "answer": answer["answer"],
                "citations": answer.get("citations", []),
                "session_id": answer.get("session_id"),
                "elapsed_s": round(time.time() - t0, 1),
            }
            f.write(json.dumps(row, ensure_ascii=False) + "\n")
            f.flush()
            print(f"[{i}/{len(todo)}] {row['elapsed_s']}s · {len(row['citations'])} cites")
            time.sleep(THROTTLE_SECONDS)

if __name__ == "__main__":
    main()

Save it as batch.py, drop your questions in questions.txt, run python batch.py. Kill it any time, re-run, it picks up where it left off.

Why this pattern works

One file in, one file out

Both ends are plain text. Your input is a questions.txt you can edit in any tool. Your output is answers.jsonl — JSON Lines, one answer per line, trivially loadable into pandas, jq, BigQuery, or another LLM for further processing:

jq '.answer' answers.jsonl | wc -l
jq -c '{q: .question, n: (.citations | length)}' answers.jsonl

Resume on restart

Network drops, OS updates, batch scripts that get killed at 3am — they all happen. The first thing the script does is re-read the output file and skip questions whose text is already there. You lose the in-flight question and nothing else.

Auto-reauth across multi-hour runs

Google sessions don't survive the night. The REST API checks the URL ground truth on every call and re-logins automatically using credentials stored in the AES-256-GCM vault (npm run setup-auth puts them there). TOTP codes are computed on the fly, so 2FA-protected accounts work transparently. Multi-account configuration.

Account rotation when one quota saturates

Free Google accounts hit a daily NotebookLM Q&A quota. The script flips to the next registered account on rate-limit errors via POST /re-auth. With two accounts you can typically push 1 500–2 000 questions in a 24-hour window without manual intervention.

Citations come back structured

source_format: "json" returns a citations array of {id, source, excerpt} objects directly attached to the answer. You can join citations back to your sources for downstream processing — fact-checking, page-number resolution, LaTeX \cite{} generation for a thesis bibliography.

Sizing your throttle

NotebookLM does not document a public rate limit, so we picked 8 seconds between calls based on hundreds of overnight runs. That gives you ~450 questions in an hour and ~3 600 in eight hours per account. If you see rate limit errors before that, raise the throttle to 12-15 seconds or add a third account. If you can sustain 5 seconds without hitting limits, go for it.

When to switch to MCP mode instead

If your driver is a coding agent (Claude Code, Cursor, Codex) rather than a script, the same operations are exposed as MCP tools. Use that surface when you want the agent to reason about which question to ask next; use the REST API when you have a flat list to grind through. Both modes ship from the same package.

What this gives you in practice

For one of our PhD use cases we run 100-200 questions per chapter across a thirty-chapter thesis library. That's 5 000+ structured answers with citations, computed overnight on a laptop, indexed back into the thesis as \cite{}-ready snippets. Total cost: zero (uses the user's own NotebookLM account), total infrastructure: one Node process and one Python script.

Next steps

HTTP API reference — every endpoint, every parameter.
n8n integration guide — same pattern but as a visual workflow.
Multi-account guide — register a second Google account for rotation.
Compare with PleasePrompto — when this project is the right pick over the upstream MCP-only server.

What you need​

The minimum viable batch (10 lines of bash)​

The production pattern​

Why this pattern works​

One file in, one file out​

Resume on restart​

Auto-reauth across multi-hour runs​

Account rotation when one quota saturates​

Citations come back structured​

Sizing your throttle​

When to switch to MCP mode instead​

What this gives you in practice​

Next steps​