Run 1 000 NotebookLM questions overnight
This is the pattern we use to run very long batches of citation-backed Q&A against Google NotebookLM — from PhD literature reviews to market intelligence pipelines. Single laptop, single account, eight hours, one thousand structured answers in a JSONL file.
The whole thing fits in one shell loop, because the project exposes a plain REST API on http://localhost:3000. There is no SDK to learn, no agent harness to configure, no MCP client to wire.
What you need
- This project running locally:
npm run setup-auth(one-time Google login), thennpm run start:http. Install guide. - A list of questions in a text file, one per line.
- A notebook id. Either pick one from
GET /notebooks/scrapeor set a default withPUT /notebooks/:id/activate. - Optionally: a second Google account for rotation. Multi-account guide.
The minimum viable batch (10 lines of bash)
NOTEBOOK_ID="paste-your-id-here"
INPUT="questions.txt"
OUTPUT="answers.jsonl"
while IFS= read -r question; do
curl -sS -X POST http://localhost:3000/ask \
-H 'Content-Type: application/json' \
-d "$(jq -n --arg q "$question" --arg n "$NOTEBOOK_ID" \
'{question: $q, notebook_id: $n, source_format: "json"}')" \
>> "$OUTPUT"
echo >> "$OUTPUT"
done < "$INPUT"
That works. It does not survive a session expiry at hour 4, it does not throttle, it does not resume after a network blip, and it makes 1 000 sequential blocking calls. So for real batches we wrap it.
The production pattern
#!/usr/bin/env python3
"""Run a batch of NotebookLM questions through the local REST API.
Resumes safely on restart, handles re-auth, rotates accounts, throttles to
respect rate limits, and writes one JSON line per answer with citations.
"""
import json, time, sys
from pathlib import Path
import httpx
API = "http://localhost:3000"
NOTEBOOK_ID = "paste-your-id-here"
INPUT = Path("questions.txt")
OUTPUT = Path("answers.jsonl")
THROTTLE_SECONDS = 8 # average pace; tune to your account's quota
MAX_RETRIES = 3
ACCOUNTS = ["primary", "backup"] # registered via `npm run accounts add`
def already_done() -> set[str]:
"""Resume support: skip questions already answered."""
if not OUTPUT.exists():
return set()
done = set()
for line in OUTPUT.read_text().splitlines():
try:
done.add(json.loads(line)["question"])
except (json.JSONDecodeError, KeyError):
continue
return done
def switch_account(name: str) -> None:
httpx.post(f"{API}/re-auth", json={"account": name}, timeout=120).raise_for_status()
def ask(question: str, account_idx: int = 0) -> dict:
payload = {
"question": question,
"notebook_id": NOTEBOOK_ID,
"source_format": "json",
}
for attempt in range(1, MAX_RETRIES + 1):
try:
r = httpx.post(f"{API}/ask", json=payload, timeout=180)
r.raise_for_status()
data = r.json()
if data.get("success"):
return data
# rate-limited or quota — try the next account
if "rate" in str(data.get("error", "")).lower():
account_idx = (account_idx + 1) % len(ACCOUNTS)
switch_account(ACCOUNTS[account_idx])
continue
except httpx.HTTPError as e:
print(f" attempt {attempt}: {e}", file=sys.stderr)
time.sleep(2 ** attempt)
raise RuntimeError(f"failed after {MAX_RETRIES} retries: {question}")
def main() -> None:
done = already_done()
questions = [q.strip() for q in INPUT.read_text().splitlines() if q.strip()]
todo = [q for q in questions if q not in done]
print(f"{len(done)} already answered · {len(todo)} to go")
with OUTPUT.open("a") as f:
for i, question in enumerate(todo, 1):
t0 = time.time()
answer = ask(question)
row = {
"question": question,
"answer": answer["answer"],
"citations": answer.get("citations", []),
"session_id": answer.get("session_id"),
"elapsed_s": round(time.time() - t0, 1),
}
f.write(json.dumps(row, ensure_ascii=False) + "\n")
f.flush()
print(f"[{i}/{len(todo)}] {row['elapsed_s']}s · {len(row['citations'])} cites")
time.sleep(THROTTLE_SECONDS)
if __name__ == "__main__":
main()
Save it as batch.py, drop your questions in questions.txt, run python batch.py. Kill it any time, re-run, it picks up where it left off.
Why this pattern works
One file in, one file out
Both ends are plain text. Your input is a questions.txt you can edit in any tool. Your output is answers.jsonl — JSON Lines, one answer per line, trivially loadable into pandas, jq, BigQuery, or another LLM for further processing:
jq '.answer' answers.jsonl | wc -l
jq -c '{q: .question, n: (.citations | length)}' answers.jsonl
Resume on restart
Network drops, OS updates, batch scripts that get killed at 3am — they all happen. The first thing the script does is re-read the output file and skip questions whose text is already there. You lose the in-flight question and nothing else.
Auto-reauth across multi-hour runs
Google sessions don't survive the night. The REST API checks the URL ground truth on every call and re-logins automatically using credentials stored in the AES-256-GCM vault (npm run setup-auth puts them there). TOTP codes are computed on the fly, so 2FA-protected accounts work transparently. Multi-account configuration.
Account rotation when one quota saturates
Free Google accounts hit a daily NotebookLM Q&A quota. The script flips to the next registered account on rate-limit errors via POST /re-auth. With two accounts you can typically push 1 500–2 000 questions in a 24-hour window without manual intervention.
Citations come back structured
source_format: "json" returns a citations array of {id, source, excerpt} objects directly attached to the answer. You can join citations back to your sources for downstream processing — fact-checking, page-number resolution, LaTeX \cite{} generation for a thesis bibliography.
Sizing your throttle
NotebookLM does not document a public rate limit, so we picked 8 seconds between calls based on hundreds of overnight runs. That gives you ~450 questions in an hour and ~3 600 in eight hours per account. If you see rate limit errors before that, raise the throttle to 12-15 seconds or add a third account. If you can sustain 5 seconds without hitting limits, go for it.
When to switch to MCP mode instead
If your driver is a coding agent (Claude Code, Cursor, Codex) rather than a script, the same operations are exposed as MCP tools. Use that surface when you want the agent to reason about which question to ask next; use the REST API when you have a flat list to grind through. Both modes ship from the same package.
What this gives you in practice
For one of our PhD use cases we run 100-200 questions per chapter across a thirty-chapter thesis library. That's 5 000+ structured answers with citations, computed overnight on a laptop, indexed back into the thesis as \cite{}-ready snippets. Total cost: zero (uses the user's own NotebookLM account), total infrastructure: one Node process and one Python script.
Next steps
- HTTP API reference — every endpoint, every parameter.
- n8n integration guide — same pattern but as a visual workflow.
- Multi-account guide — register a second Google account for rotation.
- Compare with PleasePrompto — when this project is the right pick over the upstream MCP-only server.