[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82910":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":15,"starSnapshotCount":15,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},82910,"dataroom","hanxiao\u002Fdataroom","hanxiao","Give a query, get a dataroom. Pi + self-hosted Qwen3.6 research harness on a single L4.","https:\u002F\u002Fdataroom.hanxiao.io",null,"Python",156,14,67,0,3,43,89,34,3.53,"MIT License",false,"main",[25,26,27],"harness","local-llm","pi","2026-06-12 02:04:29","# Dataroom\n\nGive it a query. A local model in a [Pi](https:\u002F\u002Fpi.dev) harness loops search-read-write until it has built a comprehensive, fully-cited **dataroom** on disk - a `.zip` you hand to a frontier model for the long-horizon task.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fbanner.png\" width=\"860\"\n       alt=\"Give a query to a self-hosted pi + harness + local model loop; it loops search-read-write to build a dataroom and hands you a .zip\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cb>Live demo → \u003Ca href=\"https:\u002F\u002Fdataroom.hanxiao.io\">dataroom.hanxiao.io\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n## Why\n\n[For long-horizon task you need a grounded, well-organized knowledge dump before the real work can start.](https:\u002F\u002Fx.com\u002Fhxiao\u002Fstatus\u002F2044765001370701981?s=20) That upfront research is mostly a search-read-write loop, and few things are usually wrong with how it gets done today.\n\n- **Research is mechanical, so don't pay frontier tokens for it.** Gathering and organizing sources is tool-calling, not deep reasoning - a small local model in a disciplined harness (search, dedup, cite, verify) does it fine. And because it runs on your own GPU at near-zero marginal token cost, it can keep going for hours until the dataroom is actually comprehensive, instead of stopping when a metered budget runs out.\n- **The output is context for a machine, not a report for a human.** A 2025-style deep-research run ends in a long PDF nobody reads. Dataroom ends in a structured `.zip` - `topics\u002F`, `sources\u002F`, `data\u002F`, a `SUMMARY.md`, every claim cited - built to be consumed by the next agent, not skimmed.\n- **It is stage one of a two-stage pipeline.** Unzip the dataroom into a frontier model's context and let it do the expensive second stage (usually implementation). The research does not have to be perfect - its consumer is intelligent and can spot gaps - it has to be comprehensive and grounded.\n\nEverything runs locally on your own GPU: the model is self-hosted (llama.cpp), and the only thing that leaves the box is the web search\u002Fread the agent chooses to do.\n\n## How it works\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fscreenshot-home.png\" width=\"800\"\n       alt=\"Dataroom homepage: a query box and a live list of jobs with status, file counts, and pause\u002Fresume\u002Fdownload controls\" \u002F>\n\u003C\u002Fp>\n\nSubmit a query and an async job spins up a headless Pi coding agent backed by a self-hosted Qwen3.6-35B-A3B (llama.cpp). The agent runs its own research loop: `pi --mode json --continue` resumes the same per-cwd session across turns, and on each turn it searches, reads, reranks, and writes sourced files into a `dataroom\u002F` directory on disk.\n\n- Autonomous loop: the agent is not micromanaged. It is handed tools and a one-page methodology, then drives itself - search, read, dedup, write, verify - until the work is done.\n- Outcome-based stopping: `DONE` is honored only once the dataroom holds enough substantive sourced files, all sub-questions are closed, and a `SUMMARY.md` exists. Turns \u002F seconds \u002F Jina-call caps are only hard backstops, and a premature `DONE` is rejected so the agent keeps going. The reason it stopped is surfaced on the dashboard.\n- jina CLI: the `jina` CLI is on PATH (search \u002F read \u002F rerank \u002F embed \u002F dedup), driven from bash and composable via pipes (`jina search Q | jina rerank R`, `cat urls.txt | jina read`, `xargs -P 8` for parallel fan-out) so bulky intermediates stay out of the LLM context. \n- Embedding dedup index: `jina-embeddings-v5-nano` is preloaded for the dataroom index (embed \u002F semantic search \u002F dedup), with server-side reconciliation so it never drifts from disk. The agent must search the index before adding anything, to avoid duplicates and keep structure.\n- Live dashboard: real-time context utilization, throughput, tool-call distribution, live activity feed, warnings\u002Ferrors, progress-to-floor, a stop-reason banner, and the dataroom file tree, at `GET \u002Fjobs\u002F{id}\u002Fdashboard`.\n\nThe [live dashboard](https:\u002F\u002Fdataroom.hanxiao.io) for a finished job - progress-to-floor, total tokens, tool-call distribution, throughput, the activity feed, and the dataroom file tree:\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fscreenshot-dashboard.png\" width=\"800\"\n       alt=\"Dataroom job dashboard: progress-to-floor, total tokens, tool-call distribution, throughput, live activity feed, and the dataroom file tree\" \u002F>\n\u003C\u002Fp>\n\n## Get Started\n\nAn NVIDIA Docker host runs two containers (llama-server + the app). `scripts\u002Fsetup.sh` installs Docker + the NVIDIA toolkit, downloads the model, and brings the stack up. The only value you must set is `JINA_API_KEY`; everything else in `.env.example` ships with working defaults. No NVIDIA GPU? Option C runs it on an Apple Silicon Mac.\n\nClone and set the key once:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhanxiao\u002Fdataroom.git\ncd dataroom\ncp .env.example .env\nsed -i 's\u002F^JINA_API_KEY=.*\u002FJINA_API_KEY=jina_your_real_key\u002F' .env\n```\n\nThen pick one (each brings the stack up and prints the API URL when done):\n\n\u003Cdetails open>\n\u003Csummary>\u003Cb>Option A: prebuilt image (fastest)\u003C\u002Fb>\u003C\u002Fsummary>\n\nPull the published app image from GHCR instead of building it locally (skips the ~14GB build). `setup.sh` still installs Docker + the toolkit and downloads the model, then pulls + starts the stack:\n\n```bash\nDAAS_PULL=1 bash scripts\u002Fsetup.sh\n```\n\nPulls `ghcr.io\u002Fhanxiao\u002Fdataroom:latest`.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Option B: build from source\u003C\u002Fb>\u003C\u002Fsummary>\n\nBuild the app image locally (no pull). Same one-shot, just slower the first time:\n\n```bash\nbash scripts\u002Fsetup.sh\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Option C: Apple Silicon (Mac, Metal)\u003C\u002Fb>\u003C\u002Fsummary>\n\nNo Docker, no NVIDIA. The model runs on Metal via Homebrew `llama.cpp`; the app, Pi agent, and embedder run in a local `uv` venv. Needs 32 GB+ unified memory (the Q4 model wires ~22 GB).\n\n```bash\nbrew install llama.cpp\nnpm install -g @earendil-works\u002Fpi-coding-agent@0.78.0\nuv venv --python 3.11 .venv\nuv pip install --python .venv\u002Fbin\u002Fpython torch -r server\u002Frequirements.txt jina-cli huggingface-hub hf_transfer\nmkdir -p models\u002Fmtp\nHF_HUB_ENABLE_HF_TRANSFER=1 HF_TOKEN=hf_... .venv\u002Fbin\u002Fpython -c \"from huggingface_hub import hf_hub_download; \\\nhf_hub_download('unsloth\u002FQwen3.6-35B-A3B-MTP-GGUF','Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf',local_dir='models\u002Fmtp')\"\ncp .env.example .env && sed -i '' 's\u002F^JINA_API_KEY=.*\u002FJINA_API_KEY=jina_your_real_key\u002F' .env\nbash scripts\u002Fmac-run.sh\n```\n\nGGUF and Metal-flag details (MTP needs `llama.cpp` >= 9430): [`docs\u002FMAC.md`](docs\u002FMAC.md).\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Option D: Windows (WSL2 + Docker Desktop)\u003C\u002Fb>\u003C\u002Fsummary>\n\nThe same Docker stack as A\u002FB, run from a WSL2 shell with Docker Desktop (GPU via WSL integration, no NVIDIA toolkit). It validates Docker + GPU, downloads the model, and starts the stack:\n\n```bash\nbash scripts\u002Fsetup-win.sh          # or DAAS_PULL=1 bash scripts\u002Fsetup-win.sh to pull the prebuilt image\n```\n\n\u003C\u002Fdetails>\n\nPrereqs:\n- An NVIDIA GPU with the driver installed (`nvidia-smi` must work) and the `nvidia-container-toolkit` (Options A\u002FB\u002FD). `setup.sh` installs the toolkit on Debian\u002FUbuntu hosts; on RHEL-family hosts install it yourself first. The llama-server needs the GPU; the app's v5-nano embedder runs on CPU by default (`EMBED_DEVICE=cpu`) to leave VRAM for the Q4 model (set `EMBED_DEVICE=cuda` to move it onto the GPU).\n- A Jina API key: https:\u002F\u002Fjina.ai\u002Fapi-dashboard\u002F\n- Disk for a ~22GB model download plus the CUDA + pytorch base images and job data under `.\u002Fdata`. The model download alone can take several minutes on a slow link; it resumes from the Hugging Face cache if interrupted.\n\n## Skill & API usage\n\n### Skill\n\nAnother LLM\u002Fagent can commission a dataroom from a deployed instance with the `use-dataroom` skill ([`skills\u002Fuse-dataroom\u002FSKILL.md`](skills\u002Fuse-dataroom\u002FSKILL.md)): submit a query with a **minutes time-box** (like handing an intern a time-boxed task), poll until it finishes, then download and unzip the result. One-shot:\n\n```bash\nBASE=\"https:\u002F\u002Fdataroom.hanxiao.io\"          # the deployed instance\nQUERY=\"Competitive landscape of self-hosted small embedding models in 2026\"\nMINUTES=30                                  # time box: works up to this long, then hands over\n\nJOB=$(curl -s -X POST \"$BASE\u002Fjobs\" -H 'content-type: application\u002Fjson' \\\n  -d \"{\\\"query\\\": $(python3 -c 'import json,sys;print(json.dumps(sys.argv[1]))' \"$QUERY\"), \\\"max_seconds\\\": $((MINUTES*60))}\" \\\n  | python3 -c 'import sys,json; print(json.load(sys.stdin)[\"job_id\"])')\necho \"watch: $BASE\u002Fjobs\u002F$JOB\u002Fdashboard\"\n\nwhile :; do\n  S=$(curl -s \"$BASE\u002Fjobs\u002F$JOB\" | python3 -c 'import sys,json; print(json.load(sys.stdin).get(\"status\",\"?\"))')\n  case \"$S\" in done|stopped|failed) break;; esac; sleep 30\ndone\n\ncurl -s -OJ \"$BASE\u002Fjobs\u002F$JOB\u002Fresult\" && unzip -oq \"dataroom-$JOB.zip\" -d \"$JOB\"   # -> $JOB\u002Fdataroom\u002F\n```\n\n`stopped` (time box reached) is a success, not an error - you still get the dataroom built so far. See the skill file for status meanings, partial `\u002Fsnapshot` downloads, and the full endpoint table.\n\n### API\n\nOnce the stack is up, the API is on port 8000 (open, no auth). `{JOB}` is the 12-hex id returned by `POST \u002Fjobs`.\n\n```bash\n# submit a job -> {\"job_id\":\"\u003C12hex>\",\"status\":\"queued\"}\ncurl -s -X POST localhost:8000\u002Fjobs -H 'content-type: application\u002Fjson' \\\n  -d '{\"query\":\"Competitive landscape of self-hosted small embedding models in 2026\"}'\n# optionally cap work: -d '{\"query\":\"...\",\"max_turns\":50,\"max_seconds\":3600}'\n\nJOB=abc123def456\n\n# list all jobs (live + on-disk), newest first\ncurl -s localhost:8000\u002Fjobs\n\n# single job status\ncurl -s localhost:8000\u002Fjobs\u002F$JOB\n\n# per-job metrics feed (drives the dashboard)\ncurl -s localhost:8000\u002Fjobs\u002F$JOB\u002Fstats\n\n# tail of the Pi agent log (last 8000 chars)\ncurl -s localhost:8000\u002Fjobs\u002F$JOB\u002Flog\n\n# download the dataroom AS-IT-IS-NOW (works mid-run)\ncurl -s -OJ localhost:8000\u002Fjobs\u002F$JOB\u002Fsnapshot\n\n# download the FINAL dataroom zip (409 until the job stops)\ncurl -s -OJ localhost:8000\u002Fjobs\u002F$JOB\u002Fresult\n\n# read one dataroom file (path is relative to the job's dataroom\u002F dir)\ncurl -s 'localhost:8000\u002Fjobs\u002F'$JOB'\u002Ffile?path=SUMMARY.md'\n\n# open the live dashboard in a browser\nopen http:\u002F\u002Flocalhost:8000\u002Fjobs\u002F$JOB\u002Fdashboard\n```\n\nThere is also a minimal submit page at `GET \u002F` and a liveness probe at `GET \u002Fhealth` (`{\"ok\":true}`).\n\n## Architecture\n\nTwo containers on a single GPU host: `daas-llama` serves the model, `daas-app` runs the FastAPI orchestrator + the Pi agent + the embedding index. The agent loops turns until the dataroom meets the outcome floor, then the orchestrator zips it.\n\n```mermaid\nflowchart LR\n    user([User \u002F curl \u002F Web UI]) -->|POST \u002Fjobs query| api\n\n    subgraph host[\"NVIDIA Docker host (single L4 24GB)\"]\n        subgraph appc[\"container: daas-app (:8000, GPU)\"]\n            api[\"FastAPI app.py\u003Cbr\u002F>\u002Fjobs \u002Fresult \u002Fsnapshot\u003Cbr\u002F>\u002Ffile \u002Fstats \u002Flog \u002Fdashboard \u002Fhealth\"]\n            orch[\"orchestrator\u003Cbr\u002F>run_dataroom.py\u003Cbr\u002F>floor + ceiling guard\"]\n            pi[\"Pi coding agent\u003Cbr\u002F>pi --mode json --continue\"]\n            jina[\"jina CLI on PATH\u003Cbr\u002F>search \u002F read \u002F rerank \u002F embed\"]\n            emb[\"v5-nano embedder\u003Cbr\u002F>EMBED_DEVICE=cpu (off-GPU)\u003Cbr\u002F>dataroom_index\"]\n            disk[(\"\u002Fdata\u002Fjobs\u002F&lt;id&gt;\u002F\u003Cbr\u002F>dataroom\u002F + meta + logs\")]\n        end\n\n        subgraph llamac[\"container: daas-llama (:8080, GPU)\"]\n            llama[\"llama-server\u003Cbr\u002F>Qwen3.6-35B-A3B UD-Q4_K_XL\u003Cbr\u002F>+ MTP draft, ctx 131072\"]\n        end\n    end\n\n    jinacloud([Jina API\u003Cbr\u002F>jina.ai]):::ext\n\n    api -->|spawn thread + subprocess| orch\n    orch -->|loop turns| pi\n    pi -->|OpenAI-compat \u002Fv1\u003Cbr\u002F>LLAMA_URL| llama\n    pi -->|bash| jina\n    jina -->|JINA_API_KEY| jinacloud\n    pi -->|index \u002F search \u002F dedup| emb\n    pi -->|read \u002F write \u002F edit| disk\n    emb -.reconcile.- disk\n    orch -->|zip dataroom\u002F| disk\n    disk -->|GET \u002Fresult or \u002Fsnapshot .zip| user\n\n    classDef ext fill:#eee,stroke:#999,stroke-dasharray:4 3;\n```\n\nBy default `llama-server` serves `Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf` (repo `unsloth\u002FQwen3.6-35B-A3B-MTP-GGUF`) with MTP draft flags, and the agent model id is `qwen3.6`.\n\n### Switching the model\n\n**One knob.** Set `MODEL=\u003Chf_repo>\u002F\u003Cfile.gguf>` in `.env` and re-run `scripts\u002Fsetup.sh`. It derives the download repo and filename from `MODEL`, pulls the GGUF, and persists the filename so `docker-compose` serves the same file - download and serve stay in sync.\n\n```bash\n# .env  (default)\nMODEL=unsloth\u002FQwen3.6-35B-A3B-MTP-GGUF\u002FQwen3.6-35B-A3B-UD-Q4_K_XL.gguf\n```\n\nThat is all you change to swap the LLM. The rest are **advanced overrides**, rarely needed (leave unset to use the defaults derived from `MODEL`):\n\n| Env var | Default | Role |\n| --- | --- | --- |\n| `MODEL_ID` | `qwen3.6` | Agent-facing model id (Pi `models.json` \u002F `defaultModel`). A free label; need not match the GGUF. |\n| `CHAT_TEMPLATE_FILE` | `\u002Ftemplates\u002Fchat_template.jinja` | Jinja chat template inside the llama-server container. |\n| `SPEC_ARGS` | `--spec-type draft-mtp --spec-draft-n-max 2` | MTP \u002F speculative-draft flags appended to `llama-server`. |\n\nNon-Qwen caveat: switching to a non-Qwen GGUF is not just a filename swap. The bundled chat template is Qwen3.6-specific - point `CHAT_TEMPLATE_FILE` at the new model's Jinja template (a wrong template silently corrupts tool-calling), or drop the flag to use the GGUF's embedded template. `--spec-type draft-mtp` needs a GGUF that ships an MTP draft head (the `...-MTP-GGUF` repo does); for a plain GGUF set `SPEC_ARGS=` (empty). The `CTX_SIZE` default of 131072 is tuned to Qwen3.6's hybrid GDN+MoE KV math; a dense model of similar size uses far more KV per token, so lower `CTX_SIZE` or it may OOM on the L4. Re-measure VRAM with `nvidia-smi` for any other weights. See `docs\u002FDEPLOY.md` for the deeper reproducibility detail.\n\n**The default L4 tune.** The shipped llama-server settings are the best we could squeeze out of a low-budget L4 (24GB VRAM) without sacrificing generation quality, tuned in [`Qwen3.6-35B-A3B-MTP-L4`](https:\u002F\u002Fgithub.com\u002Fhanxiao\u002FQwen3.6-35B-A3B-MTP-L4):\n\n| Setting | Value | Why |\n| --- | --- | --- |\n| Quant | `Qwen3.6-35B-A3B` **UD-Q4_K_XL** (~22GB) | best quality that still fits 24GB alongside MTP + KV |\n| MTP draft | `--spec-type draft-mtp --spec-draft-n-max 2` (no `--spec-draft-p-min`) | ~80-90% draft acceptance; `n-max 2` is the sweet spot on this MoE, `p-min` hurts MoE |\n| KV cache | `--cache-type-k\u002Fv q4_0` | this hybrid GDN+MoE has only 10\u002F40 KV-bearing layers, so q4_0 KV is tiny (~0.65GB at 131072) |\n| Batch | `-ub 256 -b 2048` | measured-best prefill throughput |\n| Context | `--ctx-size 131072` | full native window; fits with q4_0 KV |\n| Offload | `-ngl` unset (auto-fit) + mmap on | forcing all layers to GPU OOMs once MTP + KV load; auto-fit spills compute-light expert layers to CPU |\n| Cache reuse | omitted | GDN recurrent-state drift can silently corrupt digits (llama.cpp#21681) |\n\nMeasured ~22.2GB used at the full 131072 window, no OOM. A smaller **Q3_K_XL** (~17GB) would free enough VRAM to also put the v5-nano embedder on the GPU - but embedding is not the bottleneck (LLM decode is), so we keep the embedder on CPU and spend the freed headroom on **Q4 for slightly better generation quality** instead.\n\n## Local dev (no GPU)\n\nPoint the agent at any OpenAI-compatible endpoint (or a remote Qwen box) via `LLAMA_URL`, then run the harness directly:\n\n```bash\nuv venv && uv pip install -r server\u002Frequirements.txt\nJINA_API_KEY=... LLAMA_URL=http:\u002F\u002F\u003Chost>:8080 \\\n  uv run python -m server.run_dataroom --query \"your query\" --out .\u002Fout\n```\n\n## License\n\nMIT\n","Dataroom 是一个通过查询自动生成数据室的项目。它基于本地模型和Pi框架，在单个L4 GPU上运行，能够自动执行搜索、读取和写入循环，最终构建出一个全面且引用完整的数据室，并以ZIP文件形式提供给用户。该项目利用了Qwen3.6-35B-A3B模型（通过llama.cpp自托管），整个过程完全在本地进行，确保了低成本与高效性。适用于需要前期研究资料整理的长周期任务场景，如深度学习项目的准备阶段或复杂问题的研究初期。",2,"2026-06-11 04:09:36","CREATED_QUERY"]