[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-922":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":14,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":14,"lastSyncTime":29,"discoverSource":30},922,"openduck","CITGuru\u002Fopenduck","CITGuru","Distributed DuckDB - dual execution and differential storage","",null,"C++",547,26,2,4,0,3,12,6,8.29,"MIT License",false,"main",true,[],"2026-06-12 02:00:20","# OpenDuck\n\nOpenDuck makes DuckDB work like a cloud database without giving up its embedded-DB feel. You attach a remote database in one line — `ATTACH 'openduck:mydb'` — tables resolve transparently, a single query can split its work across your laptop and a remote worker, and storage underneath is layered, snapshot-based, and concurrency-safe. It's a DuckDB extension plus a small Rust gateway\u002Fworker speaking an open gRPC + Arrow IPC protocol, so you self-host the whole thing or plug your own backend in.\n\nThe architecture follows the path [MotherDuck](https:\u002F\u002Fmotherduck.com) pioneered with [differential storage](https:\u002F\u002Fmotherduck.com\u002Fblog\u002Fdifferential-storage-building-block-for-data-warehouse\u002F), [dual execution](https:\u002F\u002Fmotherduck.com\u002Fvideos\u002Fbringing-duckdb-to-the-cloud-dual-execution-explained\u002F), and the `md:` attach scheme. OpenDuck reimplements those ideas as an open protocol and an open backend you can run yourself.\n\n```python\nimport duckdb\n\ncon = duckdb.connect(config={\"allow_unsigned_extensions\": \"true\"})\ncon.execute(\"LOAD '\u002Fpath\u002Fto\u002Fopenduck.duckdb_extension';\")\ncon.execute(\"ATTACH 'openduck:mydb?endpoint=http:\u002F\u002Flocalhost:7878&token=xxx' AS cloud;\")\n\ncon.sql(\"SELECT * FROM cloud.users\").show()                    # remote, transparent\ncon.sql(\"SELECT * FROM local.t JOIN cloud.t2 ON ...\").show()   # hybrid, one query\n```\n\n## What OpenDuck does\n\n### Differential storage\n\nAppend-only layers with PostgreSQL metadata. DuckDB sees a normal file; OpenDuck persists data as immutable sealed layers addressable from object storage. Snapshots give you consistent reads. One serialized write path, many concurrent readers.\n\n### Hybrid (dual) execution\n\nA single query can run partly on your machine and partly on a remote worker. The gateway splits the plan, labels each operator `LOCAL` or `REMOTE`, and inserts bridge operators at the boundaries. Only intermediate results cross the wire.\n\n```\n[LOCAL]  HashJoin(l.id = r.id)\n  [LOCAL]  Scan(products)          ← your laptop\n  [LOCAL]  Bridge(R→L)\n    [REMOTE] Scan(sales)           ← remote worker\n```\n\n### DuckDB-native catalog\n\nThe extension implements DuckDB's `StorageExtension` and `Catalog` interfaces. Remote tables are first-class catalog entries, they participate in JOINs, CTEs, and the optimizer like local tables.\n\n### Open protocol\n\nOpenDuck's protocol is intentionally minimal and defined in [`execution.proto`](proto\u002Fopenduck\u002Fv1\u002Fexecution.proto). The data plane is two RPCs: one to execute a query and stream Arrow IPC batches back, another to cancel a running execution. Two additional RPCs handle worker lifecycle (registration and heartbeat) so the gateway can route queries by database affinity and compute context.\n\nBecause the protocol is open and simple, you're not locked into a single backend. Any service that speaks gRPC and returns Arrow can serve as an OpenDuck-compatible backend. Run the included Rust gateway, replace it with your own implementation, or plug in an entirely different execution engine — the client and extension don't care what's on the other side.\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────┐\n│  DuckDB process (client)                    │\n│                                             │\n│  LOAD openduck                              │\n│  ATTACH 'openduck:mydb' AS cloud            │\n│                                             │\n│  ┌─────────────────────────────────────┐    │\n│  │ OpenDuckCatalog                     │    │\n│  │  └─ OpenDuckSchemaEntry             │    │\n│  │      └─ OpenDuckTableEntry (users)  │    │\n│  │      └─ OpenDuckTableEntry (events) │    │\n│  └──────────────┬──────────────────────┘    │\n│                 │ gRPC + Arrow IPC          │\n└─────────────────┼───────────────────────────┘\n                  │\n      ┌───────────▼───────────┐\n      │  Gateway (Rust)       │\n      │  - token auth         │\n      │  - worker registry    │\n      │  - affinity routing   │     ┌──────────────┐\n      │  - plan splitting     │────▶│  Worker 1    │\n      │  - backpressure       │◀────│  (DuckDB)    │\n      │                       │     │  RegisterWorker\n      │                       │     └──────────────┘\n      │                       │     ┌──────────────┐\n      │                       │────▶│  Worker N    │\n      │                       │◀────│  (DuckDB)    │\n      │                       │     │  Heartbeat   │\n      └───────────────────────┘     └──────────────┘\n              │\n    ┌─────────┴─────────┐\n    ▼                   ▼\n┌──────────┐    ┌──────────────┐\n│ Postgres │    │ Object store │\n│ metadata │    │ sealed layers│\n└──────────┘    └──────────────┘\n```\n\n## Quick start\n\n### 1. Build the backend\n\n```bash\ncargo build --workspace\n```\n\n### 2. Build the DuckDB extension\n\nThe openduck extension is not yet published to DuckDB's extension repository, so you need to build it from source. See [`extensions\u002Fopenduck\u002FREADME.md`](extensions\u002Fopenduck\u002FREADME.md) for full prerequisites (vcpkg, bison on macOS).\n\n```bash\ncd extensions\u002Fopenduck && make\n```\n\nThis produces the loadable binary at:\n\n```\nextensions\u002Fopenduck\u002Fbuild\u002Frelease\u002Fextension\u002Fopenduck\u002Fopenduck.duckdb_extension\n```\n\n### 3. Start the server\n\n```bash\nexport OPENDUCK_TOKEN=your-token\ncargo run -p openduck -- -d mydb --token your-token\n```\n\n### 4. Connect\n\nBecause the extension is unsigned, every DuckDB connection needs `allow_unsigned_extensions` enabled and an explicit `LOAD` with the full path to the built binary.\n\n**Python (DuckDB SDK directly):**\n\n```python\nimport duckdb\n\ncon = duckdb.connect(config={\"allow_unsigned_extensions\": \"true\"})\ncon.execute(\"LOAD 'extensions\u002Fopenduck\u002Fbuild\u002Frelease\u002Fextension\u002Fopenduck\u002Fopenduck.duckdb_extension';\")\ncon.execute(\"ATTACH 'openduck:mydb?endpoint=http:\u002F\u002Flocalhost:7878&token=your-token' AS cloud;\")\n\ncon.sql(\"SELECT * FROM cloud.users LIMIT 10\").show()\n```\n\nYou can also set `OPENDUCK_EXTENSION_PATH` to avoid hard-coding the path:\n\n```bash\nexport OPENDUCK_EXTENSION_PATH=extensions\u002Fopenduck\u002Fbuild\u002Frelease\u002Fextension\u002Fopenduck\u002Fopenduck.duckdb_extension\n```\n\n**Python (openduck wrapper — auto-detects the local build):**\n\n```bash\npip install -e clients\u002Fpython\nexport OPENDUCK_TOKEN=your-token\n```\n\n```python\nimport openduck\n\ncon = openduck.connect(\"mydb\")\ncon.sql(\"SELECT 1 AS x\").show()\n```\n\nThe wrapper finds the extension automatically from the build tree or `OPENDUCK_EXTENSION_PATH`.\n\n**CLI:**\n\n```bash\nduckdb -unsigned -c \"\n  LOAD 'extensions\u002Fopenduck\u002Fbuild\u002Frelease\u002Fextension\u002Fopenduck\u002Fopenduck.duckdb_extension';\n  ATTACH 'openduck:mydb?token=your-token' AS cloud;\n  SELECT * FROM cloud.users LIMIT 10;\n\"\n```\n\n**Rust:**\n\n```rust\nuse duckdb::Connection;\n\nlet conn = Connection::open_in_memory()?;\nconn.execute_batch(r\"\n    SET allow_unsigned_extensions = true;\n    LOAD 'extensions\u002Fopenduck\u002Fbuild\u002Frelease\u002Fextension\u002Fopenduck\u002Fopenduck.duckdb_extension';\n    ATTACH 'openduck:mydb?endpoint=http:\u002F\u002Flocalhost:7878&token=xxx' AS cloud;\n\")?;\n\nlet mut stmt = conn.prepare(\"SELECT * FROM cloud.users LIMIT 10\")?;\n```\n\n> **Note:** Once the extension is published to the DuckDB extension repository, `INSTALL openduck; LOAD openduck;` will work without building from source or enabling unsigned extensions.\n\nSee [`examples\u002Fpython\u002Fduckdb_sdk_ducklake.py`](examples\u002Fpython\u002Fduckdb_sdk_ducklake.py) for a comprehensive walkthrough including DuckLake integration and hybrid local+remote queries.\n\n## Layout\n\n```\ncrates\u002F\n  exec-gateway\u002F     Gateway — auth, worker registry, routing, hybrid plan splitting\n  exec-worker\u002F      Worker — embedded DuckDB, Arrow IPC streaming\n  exec-proto\u002F       Protobuf\u002Ftonic codegen + shared auth module\n  openduck-cli\u002F     Unified CLI (openduck [default]|gateway|worker|query|cancel|status|snapshot|gc)\n  openduck-metrics\u002F OpenTelemetry metrics (optional OTLP exporter)\n  diff-core\u002F        Core types and StorageBackend trait\n  diff-metadata\u002F    Postgres metadata repo, GC, PgStorageBackend\n  diff-layer-fs\u002F    Append-only on-disk segment files\n  diff-blob\u002F        Sealed layer upload to S3-compatible object storage\n  diff-bridge\u002F      C ABI static library for the DuckDB extension\n  diff-fuse\u002F        Linux FUSE adapter over StorageBackend\n\nextensions\u002F\n  openduck\u002F         DuckDB C++ extension (StorageExtension + Catalog)\n\nclients\u002F\n  python\u002F           openduck Python package (pip install -e clients\u002Fpython)\n\nproto\u002F\n  openduck\u002Fv1\u002F      Protocol definition (execution.proto)\n\n```\n\n## OpenDuck vs MotherDuck\n\nMotherDuck is a commercial cloud service. OpenDuck is an open-source project inspired by its architecture.\n\n\n|                          | MotherDuck            | OpenDuck                                 |\n| ------------------------ | --------------------- | ---------------------------------------- |\n| **What**                 | Managed cloud service | Self-hosted open-source                  |\n| **Attach scheme**        | `md:`                 | `openduck:` \u002F `od:`                      |\n| **Auth**                 | `motherduck_token`    | `OPENDUCK_TOKEN`                         |\n| **Differential storage** | Proprietary           | Open (Postgres metadata + object store)  |\n| **Hybrid execution**     | Proprietary planner   | Open (gateway + plan splitting)          |\n| **Protocol**             | Private wire format   | Open gRPC + Arrow IPC                    |\n| **Backend**              | MotherDuck's cloud    | Anything implementing `ExecutionService` |\n| **Extension**            | Bundled in DuckDB     | Separate loadable extension              |\n\n\nOpenDuck is not wire-compatible with MotherDuck. It reimplements the same architectural ideas as an open protocol.\n\n## OpenDuck vs Arrow Flight SQL\n\nArrow Flight SQL is a generic database protocol — \"JDBC\u002FODBC over Arrow.\" OpenDuck is a DuckDB-specific system with a narrower scope but deeper integration.\n\n\n|                      | Arrow Flight SQL                | OpenDuck                                     |\n| -------------------- | ------------------------------- | -------------------------------------------- |\n| **Scope**            | Any SQL database                | DuckDB-specific                              |\n| **Integration**      | Separate client driver          | DuckDB StorageExtension + Catalog            |\n| **Catalog**          | Server-side (`GetTables`, etc.) | Extension-side (DuckDB catalog entries)      |\n| **Execution**        | Full query on server            | Hybrid — split across local and remote       |\n| **Protocol surface** | ~15 RPCs                        | 4 RPCs (2 data plane + 2 worker lifecycle)   |\n| **Plan format**      | SQL only                        | SQL (M2), structured plan IR (M3)            |\n| **Optimizer**        | Client-side, unaware            | DuckDB optimizer sees remote tables natively |\n\n\n## OpenDuck vs DuckLake\n\nOpenDuck doesn't replace DuckLake — you use them together. They operate at different layers entirely.\n\nDuckLake is a **lakehouse catalog**: it manages tables as Parquet files in object storage with transactional metadata in Postgres (or SQLite\u002FDuckDB). It decides *where data lives* and *how tables are organized*.\n\nOpenDuck is a **storage and execution layer** for DuckDB's own compute engine. It provides differential storage (append-only layers with snapshot isolation), hybrid query execution (split a single query across local and remote), and transparent remote attach (`ATTACH 'openduck:mydb'`).\n\nIf you're using DuckLake but still fall back to a `.duckdb` file for things DuckLake doesn't support yet (e.g. indexes, full-text search, or workloads that need DuckDB-native storage), OpenDuck makes that file concurrency-safe with snapshot isolation. And when you want to query a DuckLake catalog running on a remote server, OpenDuck is the transport — a worker backed by DuckLake serves queries over gRPC, and clients attach via the openduck extension without knowing or caring what the backend storage is.\n\n|                      | DuckLake                              | OpenDuck                                    |\n| -------------------- | ------------------------------------- | ------------------------------------------- |\n| **Layer**            | Catalog (table → Parquet in S3)       | Storage + execution (DuckDB file I\u002FO, gRPC) |\n| **What it manages**  | Table metadata, Parquet data files    | DuckDB pages, layers, snapshots             |\n| **Concurrency**      | Parquet files are immutable           | Snapshot isolation on `.duckdb` files        |\n| **Remote access**    | Not built-in                          | `ATTACH 'openduck:...'` + hybrid execution  |\n| **Together**         | DuckLake catalog on a remote worker → OpenDuck streams results to the client |\n\n\n## Documentation\n\nFull docs live in [`docs\u002F`](docs\u002F):\n\n- [Overview](docs\u002Foverview.md) — what OpenDuck is, problems it solves, comparisons.\n- [Architecture](docs\u002Farchitecture.md) — components, protocol, data flow, security model.\n- [Configuration](docs\u002Fconfiguration.md) — every CLI flag, env var, TOML key, and DuckDB secret.\n- Guides:\n  - [Getting started](docs\u002Fguides\u002Fgetting-started.md) — clone → build → first query.\n  - [Python client](docs\u002Fguides\u002Fpython-client.md) — the `openduck` package API and patterns.\n  - [DuckDB extension](docs\u002Fguides\u002Fduckdb-extension.md) — `LOAD`, `ATTACH`, URI format, secrets, table functions.\n  - [Differential storage](docs\u002Fguides\u002Fdifferential-storage.md) — append-only layers, snapshots, the three storage modes.\n  - [Hybrid execution](docs\u002Fguides\u002Fhybrid-execution.md) — `--hybrid`, `openduck_run`, plan splitting.\n  - [Snapshots and garbage collection](docs\u002Fguides\u002Fsnapshots-and-gc.md) — sealing, point-in-time reads, retention.\n  - [Deployment](docs\u002Fguides\u002Fdeployment.md) — single-process, multi-worker, Docker, observability.\n  - [Troubleshooting](docs\u002Fguides\u002Ftroubleshooting.md) — common errors and fixes.\n\n\n## Acknowledgments\n\nOpenDuck's architecture draws heavily from MotherDuck's published work on [differential storage](https:\u002F\u002Fmotherduck.com\u002Fblog\u002Fdifferential-storage-building-block-for-data-warehouse\u002F), [dual execution](https:\u002F\u002Fmotherduck.com\u002Fvideos\u002Fbringing-duckdb-to-the-cloud-dual-execution-explained\u002F), and [cloud-native DuckDB](https:\u002F\u002Fmotherduck.com\u002Fduckdb-book-summary-chapter7\u002F). Credit to the MotherDuck team for pioneering these ideas.\n\n## License\n\nMIT","OpenDuck 是一个让 DuckDB 具备云数据库功能的扩展项目，同时保留了其嵌入式数据库的特点。通过一行命令即可连接远程数据库，并支持本地与远程混合执行查询，底层存储采用分层、基于快照且并发安全的设计。该项目使用 C++ 编写，搭配少量 Rust 代码作为网关\u002F工作节点，通过开放的 gRPC + Arrow IPC 协议进行通信。这使得用户可以自托管整个系统或接入自定义后端。OpenDuck 特别适用于需要将数据分析任务分布在本地和云端执行的场景，如需处理大规模数据集但又希望保持开发体验一致性的应用环境。","2026-06-11 02:40:15","CREATED_QUERY"]