[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80033":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":15,"starSnapshotCount":15,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},80033,"notebooklm-wiki-pipeline","capitalparser\u002Fnotebooklm-wiki-pipeline","capitalparser","Turn Google Drive PDFs into Obsidian wiki notes via NotebookLM MCP without loading full PDFs into Claude context",null,"Python",66,5,65,3,0,1,42.43,"MIT License",false,"main",true,[23,24,25,26,27,28,29,30],"claude-code","google-drive","knowledge-management","mcp","notebooklm","obsidian","pdf","token-optimization","2026-06-12 04:01:26","# NotebookLM Wiki Pipeline\n\n[한국어 README](README.ko.md)\n\nTurn large Google Drive PDFs into Obsidian wiki notes without loading the full PDF text into Claude or Codex context. NotebookLM reads the source, and the agent receives only the structured answer needed to create a reusable note.\n\n**vNext update:** reuse one NotebookLM notebook per topic, while each new note-generation query is scoped to the newly attached or selected PDF source.\n\nThat is the main product value:\n\n```text\nTopic notebook = reusable knowledge container\nMCP query with source_ids = target-PDF-only extraction\n```\n\n```bash\n\u002Fpdf-to-wiki https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002FYOUR_FILE_ID \"K-IFRS 1109 Financial Instruments\" --topic audit-accounting\n```\n\n![Actual NotebookLM source-scoped notebook screen](docs\u002Fassets\u002Fnotebooklm-source-scoped-public-demo.png)\n\nThe screenshot above is an actual NotebookLM notebook screen from a live MCP test using original public-safe demo PDFs. The topic notebook contains three related infrastructure sources: clean energy grid planning, urban water resilience, and public transit operations. For note generation, the MCP call used `source_ids=[target_source_id]`, and NotebookLM returned `sources_used` with only the target clean-energy source.\n\n---\n\n## Why This Matters\n\nThe old safe pattern was:\n\n```text\n1 PDF = 1 NotebookLM notebook\n```\n\nThat avoids source contamination, but it does not scale well. Users who process many PDFs about the same topic end up with scattered one-off notebooks.\n\nThe vNext pattern is:\n\n```text\n1 topic = 1 reusable NotebookLM notebook\n1 new wiki note = query only 1 target source inside that notebook\n```\n\nFor example, a `public-infrastructure` notebook can contain:\n\n- `clean_energy_grid_report.pdf`\n- `water_resilience_brief.pdf`\n- `public_transit_operations_note.pdf`\n\nWhen you want a wiki note for only the clean-energy grid PDF, call:\n\n```python\nnotebook_query(\n    notebook_id=\"public-infrastructure-topic-notebook\",\n    source_ids=[\"target:clean-energy-grid-report\"],\n    query=\"Summarize insights using only the Clean Energy Grid Planning Report.\"\n)\n```\n\nThe notebook remains reusable for future topic-level questions, but the note extraction remains grounded in one selected PDF.\n\n## User Benefits\n\n- Reuse topic notebooks instead of creating one notebook per PDF.\n- Keep related PDFs together for later cross-document questions.\n- Generate a new wiki note from only the newly attached or selected source.\n- Reduce answer contamination by passing `source_ids=[target_source_id]`.\n- Record `target_source_id`, `sources_used`, and `query_scope` in the completion report.\n\n## Architecture\n\n```text\nGoogle Drive PDF\n  |\n  | pass Drive URL or file ID only\n  v\nTopic registry\n  |\n  | choose topic notebook by --topic or routing keywords\n  v\nReusable NotebookLM topic notebook\n  |\n  | notebook_query(source_ids=[target_source_id], query=...)\n  v\nNotebookLM answer grounded in the target PDF\n  |\n  | agent formats Markdown and wikilinks\n  v\nObsidian wiki note\n```\n\n![Topic notebook routing flow](docs\u002Fassets\u002Ftopic-notebook-flow-routed.svg)\n\n## Installation\n\n1. Connect Google Drive in Claude.\n\n```text\nclaude.ai -> Settings -> Integrations -> Google Drive -> Connect\n```\n\n2. Install `notebooklm-mcp-cli`.\n\n```bash\nuv tool install notebooklm-mcp-cli\n```\n\n3. Log in to NotebookLM.\n\n```bash\nnlm login\n```\n\n4. Register the MCP server with Claude Code.\n\n```bash\nnlm setup add claude-code\n```\n\n5. Install the slash command.\n\n```bash\ncp commands\u002Fpdf-to-wiki.md ~\u002F.claude\u002Fcommands\u002Fpdf-to-wiki.md\n```\n\n6. Configure the output directory in `~\u002F.claude\u002Fcommands\u002Fpdf-to-wiki.md`.\n\n```text\nOUTPUT_DIR=~\u002Fyour-obsidian-vault\u002FAI_Generated\n```\n\n## Topic Registry\n\nCopy the example registry and fill in your own NotebookLM notebook IDs.\n\n```bash\ncp config\u002Fnotebooks.example.json config\u002Fnotebooks.local.json\n```\n\nExample:\n\n```json\n{\n  \"default_policy\": \"single_source_notebook\",\n  \"default_extraction_mode\": \"source_scoped_topic_query\",\n  \"topics\": [\n    {\n      \"id\": \"public-infrastructure\",\n      \"label\": \"Public Infrastructure\",\n      \"notebook_id\": \"NOTEBOOKLM_NOTEBOOK_ID_FOR_PUBLIC_INFRASTRUCTURE\",\n      \"routing_keywords\": [\"clean energy\", \"water resilience\", \"public transit\", \"infrastructure\"],\n      \"sources\": []\n    }\n  ]\n}\n```\n\nCheck the routing decision locally:\n\n```bash\npython3 scripts\u002Fnotebook_registry.py \\\n  \"https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002FYOUR_FILE_ID\u002Fview\" \\\n  --title \"Clean Energy Grid Planning Report\" \\\n  --topic public-infrastructure \\\n  --registry config\u002Fnotebooks.local.json\n```\n\nExpected decision shape:\n\n```json\n{\n  \"topic_id\": \"public-infrastructure\",\n  \"notebook_action\": \"reuse_topic_notebook\",\n  \"extraction_mode\": \"source_scoped_topic_query\",\n  \"extraction_notebook_action\": \"reuse_topic_notebook\",\n  \"topic_notebook_action\": \"query_target_source_in_topic\",\n  \"source_action\": \"add_source\"\n}\n```\n\n## MCP Calls\n\nAdd the PDF to the topic notebook:\n\n```text\nsource_add(\n  notebook_id=\"{topic_notebook_id}\",\n  source_type=\"drive\",\n  document_id=\"{drive_file_id}\",\n  doc_type=\"pdf\",\n  wait=True,\n  wait_timeout=120.0\n)\n```\n\nThen query only the target source:\n\n```text\nnotebook_query(\n  notebook_id=\"{topic_notebook_id}\",\n  source_ids=[\"{target_source_id}\"],\n  query=\"{target-PDF-only prompt}\"\n)\n```\n\nThe prompt should also state the scope:\n\n```text\nprimary_scope: target PDF only\nsource_scoped_query: query only this target source_id\ntopic_notebook_context: other PDFs in the same notebook may be used only for a separated comparison section\n```\n\n## Source Verification Gate\n\nThe live test found an important operational gap: Drive search may return a wrong PDF when filenames are generic. Before generating the final note, run a short source verification query:\n\n```text\nUsing only target_source_id, verify the document title, author, and topic.\nIf it is not the requested document, stop and report source_mismatch.\n```\n\nThis prevents the pipeline from summarizing a wrong source that happened to match the search query.\n\n## Output Note Metadata\n\nGenerated notes should preserve the routing and query scope:\n\n```yaml\nsource: notebooklm\ndrive_url: {drive_url}\ndrive_file_id: {drive_file_id}\nnotebook_id: {topic_notebook_id}\ntarget_source_id: {target_source_id}\nsources_used: [{target_source_id}]\nnotebook_policy: reuse_topic_notebook\nextraction_mode: source_scoped_topic_query\nquery_scope: target_source_only\ntopic: {topic_id}\ncreated: {YYYY-MM-DD}\ntags: [ai-generated, pdf-analysis]\n```\n\n## Testing\n\nRun deterministic routing tests:\n\n```bash\npython3 -m unittest discover -v\n```\n\nRun a CLI smoke test:\n\n```bash\npython3 scripts\u002Fnotebook_registry.py \\\n  \"https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002FYOUR_FILE_ID\u002Fview\" \\\n  --title \"Clean Energy Grid Planning Report\" \\\n  --topic public-infrastructure\n```\n\n## Guardrails\n\n- Do not download full Drive PDF content into the agent context.\n- Do not paste extracted PDF text into prompts.\n- Use Drive only for metadata and source IDs.\n- Use NotebookLM `source_add` for PDF ingestion.\n- Use `notebook_query(source_ids=[target_source_id])` for new note extraction.\n- Keep topic-level comparison questions separate from target-source note generation.\n\n## Project Structure\n\n```text\n.\n├── config\u002F\n│   └── notebooks.example.json\n├── commands\u002F\n│   └── pdf-to-wiki.md\n├── scripts\u002F\n│   └── notebook_registry.py\n├── tests\u002F\n│   └── test_notebook_registry.py\n├── docs\u002F\n│   ├── assets\u002F\n│   └── adr\u002F\n└── examples\u002F\n```\n","该项目通过NotebookLM MCP将Google Drive中的PDF转换为Obsidian维基笔记，无需将整个PDF加载到Claude的上下文中。其核心功能包括使用结构化答案生成可重用笔记，并支持针对特定主题创建单一可复用笔记本。技术特点在于通过传递源ID来优化令牌使用，确保每次查询仅基于选定的PDF内容生成笔记，从而减少信息污染。适用于需要高效管理大量相关文档的知识工作者，特别是在审计、会计等领域，能够帮助用户在处理同一主题下的多个PDF时保持组织性和一致性。",2,"2026-06-11 03:59:00","CREATED_QUERY"]