[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82755":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":14,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":9,"rankLanguage":9,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":17,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},82755,"rockduck","qianzii2\u002Frockduck","qianzii2","HTAP embedded database: transactional DeltaStore + columnar Vortex storage + DuckDB SQL engine",null,"Rust",101,4,1,0,42.1,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:38","# RockDuck\r\n\r\n### HTAP 嵌入式数据库 | Hybrid Transactional & Analytical Processing Database\r\n\r\n**事务**（DeltaStore 行级增量） + **分析**（Vortex 列式存储） + **引擎**（DuckDB SQL 执行）\r\n\r\n```\r\n版本 0.1.0 | Rust 1.91+ | Apache 2.0\r\n```\r\n\r\n---\r\n\r\n## 目录\r\n\r\n- [核心理念](#核心理念)\r\n- [快速开始](#快速开始)\r\n- [HTAP 端到端工作流](#htap-端到端工作流)\r\n- [架构总览](#架构总览)\r\n- [存储层次](#存储层次)\r\n- [MVCC 与 Time-Travel](#mvcc-与-time-travel)\r\n- [自适应删除掩码](#自适应删除掩码)\r\n- [自适应列编码](#自适应列编码)\r\n- [HTAP 双存储路由](#htap-双存储路由)\r\n- [写入路径](#写入路径)\r\n- [读取路径](#读取路径)\r\n- [Compaction 调度](#compaction-调度)\r\n- [PDT Merge Compaction](#pdt-merge-compaction)\r\n- [Query Feedback — 自适应 Compaction](#query-feedback--自适应-compaction)\r\n- [Iceberg v2 导出](#iceberg-v2-导出)\r\n- [DuckDB 集成](#duckdb-集成)\r\n- [配置参考](#配置参考)\r\n- [核心模块一览](#核心模块一览)\r\n\r\n---\r\n\r\n## 核心理念\r\n\r\nRockDuck 是一个用 Rust 构建的 **HTAP 嵌入式数据库**，同时支持事务型（OLTP）和分析型（OLAP）工作负载。它的名字取自三项核心能力的首字母组合：**事务**（行级增量存储） + **分析**（列式存储） + **引擎**（DuckDB SQL 执行）。\r\n\r\nRockDuck 的架构深受三个成熟系统的启发：\r\n\r\n\r\n| 灵感来源                | 借鉴的设计                                                                                   |\r\n| ------------------- | -------------------------------------------------------------------------------------- |\r\n| **Apache Iceberg**  | Shadow Column MVCC（通过 `created_by_txn` \u002F `deleted_by_txn` 实现多版本快照隔离）                     |\r\n| **ClickHouse MergeTree** | Segment \u002F Granule 分层组织，后台 Compaction 合并，将写入的活跃数据与冻结的分析数据分层管理               |\r\n| **Snowflake**       | Zone Map 谓词下推（per-granule min\u002Fmax 统计跳过无关数据块） + HTAP 双存储路由（DeltaStore 与列存分别处理 OLTP\u002FOLAP 查询） |\r\n\r\n\r\n---\r\n\r\n## 快速开始\r\n\r\n### 编译\r\n\r\n```bash\r\ncargo build --release\r\n```\r\n\r\n### Library API\r\n\r\n```rust\r\nuse rockduck::RockDuck;\r\nuse std::sync::Arc;\r\n\r\nlet db = RockDuck::open(\".\u002Fdata\")?;\r\n\r\nlet mut cols = std::collections::HashMap::new();\r\ncols.insert(\"id\".to_string(), Arc::new(arrow_array::Int64Array::from(vec![1i64])) as _);\r\ncols.insert(\"name\".to_string(), Arc::new(arrow_array::StringArray::from(vec![\"Alice\"])) as _);\r\n\r\nlet txn_id = db.insert(\"users\", b\"pk001\", &cols)?;\r\n\r\n\u002F\u002F Time-Travel 查询（TxnId = 10 时的快照）\r\nlet results = db.scan_as_of(\"users\", 10, None, None)?;\r\n\r\n\u002F\u002F Iceberg 导出\r\nlet path = db.export_iceberg(\"users\", \"\u002Ftmp\u002Ficeberg_table\", None).await?;\r\n```\r\n\r\n### CLI\r\n\r\n```bash\r\n# 插入\r\ncargo run -- insert pk001 --columns id=1 --columns age=30 -t users\r\n\r\n# 查询\r\ncargo run -- get pk001 -t users\r\n\r\n# 扫描\r\ncargo run -- scan --start pk001 --end pk100 -t users\r\n\r\n# 删除\r\ncargo run -- delete pk001 -t users\r\n\r\n# 统计\r\ncargo run -- stats -t users\r\n```\r\n\r\n---\r\n\r\n## HTAP 端到端工作流\r\n\r\nRockDuck 的读写路径通过 HTAP 双存储路由实现事务与分析的统一访问：\r\n\r\n```mermaid\r\nflowchart LR\r\n    subgraph Write[\"写入路径（OLTP）\"]\r\n        W1[\"INSERT \u002F UPDATE \u002F DELETE\"]\r\n        W2[\"DeltaStore 记录 cell 级 before\u002Fafter\"]\r\n        W3[\"Vortex 追加写入\"]\r\n        W4[\"WAL Begin → Commit\"]\r\n        W5[\"RocksDB 双索引更新\"]\r\n    end\r\n\r\n    subgraph Router[\"query\u002Frouter.rs 路由决策\"]\r\n        R1{\"has_updates?\"}\r\n        R2{\"query_type\"}\r\n    end\r\n\r\n    subgraph Read[\"读取路径（OLAP + OLTP 融合）\"]\r\n        R_VX[\"VortexOnly\\n列存全表扫描\"]\r\n        R_DS[\"DeltaStoreOnly\\n点查最新修改\"]\r\n        R_MG[\"Merge\\nDeltaStore overlay + Vortex\"]\r\n    end\r\n\r\n    W1 --> W2 --> W3 --> W4 --> W5\r\n    R1 -->|No| R_VX\r\n    R1 -->|Yes| R2\r\n    R2 -->|\"PointGet\"| R_DS\r\n    R2 -->|\"RangeScan\u002FAggregate\"| R_MG\r\n    R2 -->|\"FullScan w\u002F filter\"| R_MG\r\n```\r\n\r\n### DeltaStore — 事务写入与增量更新\r\n\r\nUpdate 路径（`write\u002Finsert.rs`）不覆盖原 Vortex 数据，而是将变更记录到 DeltaStore：\r\n\r\n```mermaid\r\nsequenceDiagram\r\n    participant App as 应用层\r\n    participant DS as DeltaStore\r\n    participant VX as Vortex\r\n    participant Scan as scan.rs\r\n\r\n    App->>DS: update(table, pk, columns)\r\n    DS->>VX: 读取旧值 (before image)\r\n    DS-->>DS: record_update(txn, col, offset, old, new)\r\n    DS->>DS: persist() → _upd_{col}.vortex\r\n\r\n    Note over DS: 下次读取时，DeltaStore 覆盖 Vortex 原始值\r\n\r\n    Scan->>VX: 读取 Vortex RecordBatch\r\n    Scan->>DS: get_visible_deltas()\r\n    DS-->>Scan: CellDelta overlay\r\n    Scan-->>App: 合并后的最新数据\r\n```\r\n\r\n---\r\n\r\n## 架构总览\r\n\r\n```mermaid\r\nflowchart TB\r\n    subgraph Core[\"RockDuck Core (db.rs)\"]\r\n        txn[\"txn_counter\\n(txn ID 生成器)\"]\r\n        wal_cache[\"segment_bloom_filters\\n(内存布隆过滤器缓存)\"]\r\n        mvcc_mgr[\"visibility_manager\\n(MVCC 可见性管理)\"]\r\n        delta_mgr[\"delta_store\\n(DeltaStoreManager)\"]\r\n    end\r\n\r\n    subgraph Write[\"写入层\"]\r\n        insert[\"write\u002Finsert.rs\"]\r\n        wal[\"write\u002Fwal.rs\\n(32KB Block + CRC32 WAL)\"]\r\n    end\r\n\r\n    subgraph Read[\"读取层\"]\r\n        scan[\"read\u002Fscan.rs\\n(Zone Map 裁剪 + Filter Pushdown)\"]\r\n        point_get[\"read\u002Fpoint_get.rs\"]\r\n        router[\"query\u002Frouter.rs\\n(HTAP 读路径路由)\"]\r\n    end\r\n\r\n    subgraph Storage[\"storage\u002Fvortex.rs\"]\r\n        writer[\"VortexWriter\"]\r\n        reader[\"VortexReader\"]\r\n        mmap_cache[\"Arc\u003CMmap> Cache\\n(零拷贝 Frozen 数据)\"]\r\n    end\r\n\r\n    subgraph Metadata[\"metadata\u002Frocksdb.rs — 12 个 Column Family\"]\r\n        pk_idx[\"pk_idx\\n(Hash 索引 → IndexEntry)\"]\r\n        pk_skip[\"pk_skiplist\\n(Sorted 索引，范围扫描)\"]\r\n        seg_meta[\"seg_meta\\nSegment 元数据序列化\"]\r\n        stat[\"stat\\n(TableStats 表级统计)\"]\r\n        zone[\"zone\\n(Zone Map per-granule 统计)\"]\r\n        mvcc_cf[\"mvcc\\n(活跃事务追踪: active:{txn_id} → begin_ts)\"]\r\n        sys[\"sys\\n(committed_txn 持久化)\"]\r\n        proj[\"proj_meta\\n(Secondary Projection 元数据)\"]\r\n        layer[\"layer\\n(Immutable Layer 快照)\"]\r\n        lbf[\"lbf\\n(Learned Bloom Filter 预留)\"]\r\n        bf[\"bf\\n(Per-granule Bloom Filter)\"]\r\n        iceberg_cf[\"iceberg_manifest\\n(Native Iceberg 清单存储)\"]\r\n    end\r\n\r\n    subgraph Query[\"查询引擎\"]\r\n        vtab[\"query\u002Fvtab.rs\\n(RockDuckVTab 流式推送)\"]\r\n        duckdb_fn[\"query\u002Fduckdb_ext.rs\\n(docdb_scan \u002F docdb_iceberg_info)\"]\r\n    end\r\n\r\n    subgraph Compaction[\"compaction\u002F\"]\r\n        scheduler[\"scheduler.rs\"]\r\n        nonblocking[\"nonblocking.rs\"]\r\n        pdt_merge[\"pdt_merge.rs\"]\r\n        reencode[\"reencode.rs\"]\r\n    end\r\n\r\n    Core --> Write\r\n    Core --> Read\r\n    Core --> Storage\r\n    Core --> Metadata\r\n    Core --> Query\r\n    Core --> Compaction\r\n\r\n    Write --> Storage\r\n    Write --> Metadata\r\n    Read --> Metadata\r\n    Read --> Storage\r\n    Read --> router\r\n    scan --> reader\r\n    reader --> mmap_cache\r\n    vtab --> scan\r\n    Compaction --> Storage\r\n    Compaction --> Metadata\r\n```\r\n\r\n\r\n\r\n### 数据目录结构\r\n\r\n```mermaid\r\ngraph TD\r\n    root[\"rockduck_data\u002F\"]\r\n    root --> meta[\"meta\u002F\\n(RocksDB 元数据)\"]\r\n    root --> segments[\"segments\u002F\"]\r\n    root --> wal[\"wal\u002F\"]\r\n    root --> temp[\"temp\u002F\"]\r\n\r\n    segments --> active[\"active\u002F\"]\r\n    segments --> immutable[\"immutable\u002F\"]\r\n\r\n    active --> seg_active[\"{seg_id}\u002F\"]\r\n    immutable --> seg_imm[\"{seg_id}\u002F\"]\r\n\r\n    seg_imm --> col_vortex[\"{col}.vortex\\n列数据文件\"]\r\n    seg_imm --> del_vortex[\"_del.vortex\\n删除掩码\"]\r\n    seg_imm --> meta_vortex[\"_meta.vortex\\nSegment 元数据\"]\r\n    seg_imm --> upd_vortex[\"_upd.vortex\\n更新掩码\"]\r\n    seg_imm --> zm_vortex[\"_zm.json\\nZone Map\"]\r\n\r\n    wal --> wal_file[\"wal_000000.bin\\n(32KB Block + CRC32)\"]\r\n\r\n    meta --> rocksdb_sst[\"*.sst, MANIFEST\\n(RocksDB 数据文件)\"]\r\n```\r\n\r\n\r\n\r\n---\r\n\r\n## 存储层次\r\n\r\nRockDuck 的数据组织为 **Segment → Granule → Block** 三层结构，灵感来自 ClickHouse MergeTree。\r\n\r\n```mermaid\r\ngraph BT\r\n    DB[\"RockDuck 数据库\"]\r\n    DB --> Seg1[\"Segment #1 (seg_abc...)\"]\r\n    DB --> Seg2[\"Segment #2 (seg_def...)\"]\r\n    DB --> SegN[\"...\"]\r\n\r\n    Seg1 --> G1_1[\"Granule 0\\n~1MB, ~1024 rows\"]\r\n    Seg1 --> G1_2[\"Granule 1\\n~1MB, ~1024 rows\"]\r\n    Seg1 --> G1_M[\"...\"]\r\n\r\n    G1_1 --> B1_1[\"Block 0\\n1024 rows\\ncol stats: min\u002Fmax\"]\r\n    G1_1 --> B1_2[\"Block 1\\n1024 rows\\ncol stats: min\u002Fmax\"]\r\n    G1_1 --> B1_K[\"...\"]\r\n\r\n    G1_2 --> B2_1[\"Block 0\\n1024 rows\\ncol stats: min\u002Fmax\"]\r\n```\r\n\r\n\r\n\r\n### Segment 生命周期\r\n\r\n```mermaid\r\nstateDiagram-v2\r\n    [*] --> Active\r\n    Active --> Compactable : del_ratio 上升\r\n    Active --> Frozen : freeze_segment()\r\n    Compactable --> Active : 删除回收\r\n    Compactable --> Frozen : freeze_segment()\r\n    Frozen --> [*] : Compaction Merge\r\n\r\n    note right of Active\r\n        可写\r\n        BufReader 读取\r\n        Bloom Filter 更新中\r\n    end note\r\n\r\n    note right of Frozen\r\n        只读\r\n        mmap 零拷贝读取\r\n        可导出 Iceberg\r\n    end note\r\n```\r\n\r\n\r\n\r\n### 数据类型与默认编码\r\n\r\n```mermaid\r\nflowchart LR\r\n    subgraph Types[\"数据类型\"]\r\n        direction TB\r\n        Ints[\"整数类型\\nInt8 ~ Int64\\nUInt8 ~ UInt64\"]\r\n        Floats[\"浮点类型\\nFloat32, Float64\"]\r\n        Bools[\"布尔类型\\nBool\"]\r\n        Others[\"其他类型\\nUtf8, Binary, Timestamp...\"]\r\n    end\r\n\r\n    Types --> Delta[\"Delta 编码\\n(单调递增\u002F递减)\"]\r\n    Types --> Gorilla[\"Gorilla 编码\\n(浮点数压缩)\"]\r\n    Types --> RLE[\"RLE 编码\\n(重复值多)\"]\r\n    Types --> Raw[\"Raw (无编码)\"]\r\n```\r\n\r\n\r\n\r\n---\r\n\r\n## MVCC 与 Time-Travel\r\n\r\n### MVCC 设计（Shadow Column 方式）\r\n\r\n每个数据行记录两个事务 ID：\r\n\r\n```mermaid\r\nclassDiagram\r\n    class IndexEntry {\r\n        +String seg_id\r\n        +u32 granule_id\r\n        +u32 offset\r\n        +TxnId created_by_txn\r\n        +Option~TxnId~ deleted_by_txn\r\n    }\r\n```\r\n\r\n\r\n\r\n### 可见性判断\r\n\r\n```mermaid\r\nflowchart TD\r\n    START[\"is_visible(snapshot, created_txn, deleted_txn)\"]\r\n    START --> ISO{\"snapshot.isolation\"}\r\n\r\n    ISO -->|\"ReadCommitted\"| RC1{\"created_txn >\\ncommitted_txn?\"}\r\n    RC1 -->|Yes| RC_FALSE[\"return false\"]\r\n    RC1 -->|No| RC2{\"deleted_txn ≤\\ncommitted_txn?\"}\r\n    RC2 -->|Yes| RC_FALSE2[\"return false\"]\r\n    RC2 -->|No| RC_TRUE[\"return true\"]\r\n\r\n    ISO -->|\"RepeatableRead\\nor Snapshot\"| RR1{\"created_txn >\\nsnapshot_id?\"}\r\n    RR1 -->|Yes| RR_FALSE1[\"return false\"]\r\n    RR1 -->|No| RR2{\"deleted_txn ≤\\nsnapshot_id?\"}\r\n    RR2 -->|Yes| RR_FALSE2[\"return false\"]\r\n    RR2 -->|No| RR3{\"created_txn ∈\\nactive_txns?\"}\r\n    RR3 -->|Yes| RR_FALSE3[\"return false\"]\r\n    RR3 -->|No| RR_TRUE[\"return true\"]\r\n```\r\n\r\n\r\n\r\n### Time-Travel 查询\r\n\r\n```rust\r\n\u002F\u002F 在 TxnId = 10 的时间点查询数据\r\nlet snapshot = db.snapshot_at(10, IsolationLevel::Snapshot)?;\r\nlet results = db.scan_as_of(\"users\", 10, pk_range, filter)?;\r\n```\r\n\r\n### MVCC RocksDB 存储 — 12 个 Column Family\r\n\r\n```mermaid\r\ngraph LR\r\n    subgraph CF_Index[\"索引层\"]\r\n        direction TB\r\n        pk_idx[\"pk_idx\\nHash(table:pk) → IndexEntry\\n{seg_id, granule_id, offset,\\ncreated_by_txn, deleted_by_txn}\"]\r\n        pk_skip[\"pk_skiplist\\nSorted(table:pk) → IndexEntry\\n支持范围扫描\"]\r\n        lbf[\"lbf\\nLearned Bloom Filter\\n(预留)\"]\r\n        bf[\"bf\\nPer-granule Bloom Filter\\n快速判断 key 是否存在\"]\r\n    end\r\n\r\n    subgraph CF_Data[\"数据层\"]\r\n        direction TB\r\n        seg_meta[\"seg_meta\\nSegmentMeta 序列化\\n{status, row_count, del_ratio, upd_ratio}\"]\r\n        proj_meta[\"proj_meta\\nProjection 元数据\\n列子集映射\"]\r\n    end\r\n\r\n    subgraph CF_Stats[\"统计层\"]\r\n        direction TB\r\n        stat[\"stat\\nTableStats 表级统计\\n{row_count, del_count, last_*_txn}\"]\r\n        zone[\"zone\\nZoneMapStats per-granule\\n{min\u002Fmax per column}\"]\r\n    end\r\n\r\n    subgraph CF_MVCC[\"MVCC 层\"]\r\n        direction TB\r\n        mvcc_cf[\"mvcc\\nKey: active:{txn_id}\\nValue: begin_ts\\n追踪活跃事务\"]\r\n        sys[\"sys\\nKey: __system__:committed_txn\\nValue: max_committed_txn_id\"]\r\n    end\r\n\r\n    subgraph CF_Layer[\"分层存储\"]\r\n        direction TB\r\n        layer[\"layer\\nImmutable Layer 快照\\n支持历史数据查询\"]\r\n    end\r\n\r\n    subgraph CF_Iceberg[\"Iceberg\"]\r\n        direction TB\r\n        iceberg[\"iceberg_manifest\\nKey: iceberg:latest\\nValue: IcebergExport (bincode)\\n原生 Iceberg 清单\"]\r\n    end\r\n```\r\n\r\n**索引双写策略**：每条 Insert 同时写入 `pk_idx`（O(1) 点查）和 `pk_skiplist`（O(log n) 范围扫描），牺牲写入性能换取读取灵活性。\r\n\r\n```mermaid\r\ngraph TD\r\n    INSERT[\"INSERT (table, pk, cols)\"] --> IDX[\"双写索引\"]\r\n    IDX --> H[\"pk_idx CF\\nkey = table:pk → IndexEntry\"]\r\n    IDX --> S[\"pk_skiplist CF\\nkey = table:pk → IndexEntry\"]\r\n\r\n    H --> BF_U[\"bf CF 更新\\nPer-granule Bloom Filter\"]\r\n    S --> ZM_U[\"zone CF 更新\\nZone Map min\u002Fmax\"]\r\n\r\n    BF_U --> DONE[\"写入完成\"]\r\n    ZM_U --> DONE\r\n```\r\n\r\n---\r\n\r\n## 自适应删除掩码\r\n\r\nDelMask 根据删除率自动选择最优存储格式，触发模式切换：\r\n\r\n```mermaid\r\nstateDiagram-v2\r\n    [*] --> Empty : new()\r\n    Empty --> SkipList : 第 1 次删除\r\n    SkipList --> SkipList : del_ratio \u003C 1%\r\n    SkipList --> Roaring : del_ratio 突破 1%\r\n\r\n    Roaring --> Roaring : 1% \u003C del_ratio \u003C 50%\r\n    Roaring --> FullBitmap : del_ratio 突破 50%\r\n    Roaring --> SkipList : del_ratio 回落到 \u003C 1%\r\n\r\n    FullBitmap --> FullBitmap : del_ratio > 50%\r\n    FullBitmap --> Compaction : 触发 Compaction\r\n\r\n    Compaction --> [*] : 删除行物理清除\r\n```\r\n\r\n\r\n\r\n```mermaid\r\ngraph LR\r\n    subgraph Threshold1[\"del_ratio \u003C 1%\"]\r\n        DS1[\"SkipList~Vec\u003Cu32>\\n只存已删除行号\"]\r\n    end\r\n\r\n    subgraph Threshold2[\"1% ≤ del_ratio \u003C 50%\"]\r\n        DS2[\"RoaringBitmap\\n位图压缩，范围查询快\"]\r\n    end\r\n\r\n    subgraph Threshold3[\"del_ratio ≥ 50%\"]\r\n        DS3[\"FullBitmap~Vec\u003Cu8>\\n每行 1 bit + Compaction 触发\"]\r\n    end\r\n\r\n    DS1 -.->|\"add_delete()\\n自动切换\"| DS2\r\n    DS2 -.->|\"add_delete()\\n自动切换\"| DS3\r\n```\r\n\r\n\r\n\r\n---\r\n\r\n## 自适应列编码\r\n\r\n`AdaptiveEncoder` 分析真实数据特征，推荐最优编码方案：\r\n\r\n```mermaid\r\nflowchart TD\r\n    START[\"analyze_column_array(array)\\n采样 10K 行\"] --> CARD{\"cardinality\"}\r\n\r\n    CARD -->|\"\u003C 1000\"| LOW_CARD[\"Dict 编码\\nconfidence=0.9\"]\r\n    CARD -->|\"≥ 1000\"| SORTED{\"is_sorted?\"}\r\n\r\n    SORTED -->|\"true\"| DELTA_SORT[\"Delta 编码\\nconfidence=0.85\\n单调递增\u002F递减\"]\r\n    SORTED -->|\"false\"| DTYPE{\"dtype\"}\r\n\r\n    DTYPE -->|\"Float32\u002FFloat64\"| FLOAT{\"compression_hint\"}\r\n    FLOAT -->|\"> 0.5\\n(低方差)\"| ALP[\"ALP 编码\\nconfidence=0.7\"]\r\n    FLOAT -->|\"≤ 0.5\\n(高方差)\"| GORILLA[\"Gorilla 编码\\nconfidence=0.75\"]\r\n\r\n    DTYPE -->|\"Int\u002FUInt\"| RANGE[\"min\u002Fmax 范围\\n\u003C cardinality × 2?\"]\r\n    RANGE -->|Yes| DELTA_RANGE[\"Delta 编码\\nconfidence=0.8\"]\r\n    RANGE -->|No| RAW[\"Raw (无编码)\\nconfidence=0.5\"]\r\n```\r\n\r\n\r\n\r\n### Block 级统计信息\r\n\r\n每 1024 行为一个 Block，记录热点列的 min\u002Fmax，用于 granule 内谓词下推：\r\n\r\n```mermaid\r\ngraph TB\r\n    G[\"Granule (1MB, ~1024 rows)\"]\r\n    G --> B1[\"Block 0\\nrows 0-1023\\nstats: col.age [10, 90]\"]\r\n    G --> B2[\"Block 1\\nrows 1024-2047\\nstats: col.age [20, 80]\"]\r\n\r\n    B1 --> Q1{\"查询: age > 85?\"}\r\n    Q1 -->|\"min=10, max=90\\n10 \u003C 85, 90 ≥ 85\\n→ 可能有结果, 不裁剪\"| KEEP1[\"保留 Block 0\"]\r\n    Q1 -->|\"max=80 \u003C 85\\n→ 整个 Block 裁剪\"| SKIP1[\"跳过 Block 0\"]\r\n\r\n    B2 --> Q2{\"查询: age > 85?\"}\r\n    Q2 -->|80 \u003C 85| SKIP2[\"跳过 Block 1\"]\r\n```\r\n\r\n\r\n\r\n---\r\n\r\n## HTAP 双存储路由\r\n\r\n### ReadPath 决策树\r\n\r\n```mermaid\r\nflowchart TD\r\n    START[\"choose_read_path(RouterParams)\"] --> UPDATES{\"has_updates\\ndelta_count > 0?\"}\r\n    UPDATES -->|No| VX[\"ReadPath::VortexOnly\\n全部走 Vortex\"]\r\n\r\n    UPDATES -->|Yes| QTYPE{\"query_type\"}\r\n\r\n    QTYPE -->|\"PointGet\"| DSO[\"ReadPath::DeltaStoreOnly\\n只读 DeltaStore\\n最新修改优先\"]\r\n\r\n    QTYPE -->|\"FullScan\\n+ has_updates\"| MERGE[\"ReadPath::Merge\\nDeltaStore overlay + Vortex\"]\r\n\r\n    QTYPE -->|\"Aggregate\"| SEL{\"filter_selectivity\"}\r\n    SEL -->|\"> 0.1\"| VX_A[\"VortexOnly\\n大量数据扫描\"]\r\n    SEL -->|\"≤ 0.1\"| DSO_A[\"DeltaStoreOnly\\n少量数据聚合\"]\r\n\r\n    QTYPE -->|\"RangeScan\"| SEL_R{\"filter_selectivity\\n+ delta_count\"}\r\n\r\n    SEL_R -->|\"sel \u003C 0.01\\ndelta_count \u003C 100\"| DSO_R[\"DeltaStoreOnly\\n高精度点查\"]\r\n\r\n    SEL_R -->|\"sel > 0.5\\n或 delta_count > 1000\"| MERGE_R[\"ReadPath::Merge\\n大范围扫描\"]\r\n\r\n    SEL_R -->|其他| MERGE_D[\"ReadPath::Merge\\n默认路径\"]\r\n```\r\n\r\n\r\n\r\n### DeltaStore 数据结构\r\n\r\n```mermaid\r\nclassDiagram\r\n    class DeltaStore {\r\n        +String seg_id\r\n        +BTreeMap~TxnId, HashMap~String, HashMap~u64, CellDelta~~ deltas\r\n        +Option~BTreeMap~TxnId, ()~ committed_txns\r\n    }\r\n\r\n    class CellDelta {\r\n        +u64 row\r\n        +String col\r\n        +DeltaOpType op\r\n        +Option~Vec~before\r\n        +Option~Vec~after\r\n        +TxnId txn_id\r\n    }\r\n\r\n    class DeltaOpType {\r\n        \u003C\u003Cenumeration>>\r\n        Update\r\n        Delete\r\n        Insert\r\n    }\r\n\r\n    DeltaStore \"1\" --> \"*\" CellDelta\r\n    CellDelta --> DeltaOpType\r\n```\r\n\r\n\r\n\r\n### DeltaStore overlay 合并过程\r\n\r\n```mermaid\r\nsequenceDiagram\r\n    participant Vx as Vortex 列存\r\n    participant DS as DeltaStore\r\n    participant Scan as read\u002Fscan.rs\r\n\r\n    Scan->>Vx: 读取 Vortex 原始数据\r\n    Vx-->>Scan: RecordBatch {id:[1,2,3], age:[20,30,40]}\r\n\r\n    Scan->>DS: get_all_visible_deltas()\r\n    DS-->>Scan: { \"age\" → { 1: CellDelta{before:20, after:25} } }\r\n\r\n    Note over Scan: apply_deltas_overlay()\r\n    Scan->>Scan: row 1 的 age 20 → 25\r\n\r\n    Scan-->>Result: RecordBatch {id:[1,2,3], age:[25,30,40]}\r\n```\r\n\r\n\r\n\r\n---\r\n\r\n## 写入路径\r\n\r\n```mermaid\r\nflowchart TD\r\n    A[\"insert() \u002F insert_batch()\"] --> B[\"txn_id = next_txn_id()\"]\r\n    B --> C[\"WAL — Begin 记录\"]\r\n    C --> D[\"columns → RecordBatch\"]\r\n    D --> E[\"allocate_position()\\n查找\u002F创建活跃 Segment\"]\r\n\r\n    E --> F[\"write_segment_batch()\\n追加到 {col}.vortex\"]\r\n    F --> G[\"_del.vortex 新增位置=false\"]\r\n    G --> H[\"双写 RocksDB 索引\"]\r\n\r\n    H --> H1[\"pk_idx CF\\npk:table:pk → IndexEntry\"]\r\n    H --> H2[\"pk_skiplist CF\\n有序键，支持范围扫描\"]\r\n\r\n    H2 --> I[\"Bloom Filter 更新\\nsegment_bloom_filters 缓存\"]\r\n    I --> J[\"WAL — Commit + flush\"]\r\n    J --> K[\"committed_txn 持久化到 sys CF\"]\r\n    K --> L[\"返回 TxnId\"]\r\n```\r\n\r\n\r\n\r\n### WAL Block 格式\r\n\r\n```mermaid\r\ngraph RL\r\n    subgraph WAL_File[\"wal_000000.bin (顺序追加)\"]\r\n        direction RL\r\n        B1[\"Block 1 (32KB)\"]\r\n        B2[\"Block 2 (32KB)\"]\r\n        B3[\"Block 3 (32KB)\"]\r\n    end\r\n\r\n    subgraph Block_N[\"Block Header (16 bytes)\"]\r\n        HDR[\"block_seq (8B) | used_bytes (4B) | header_crc (4B)\"]\r\n    end\r\n\r\n    subgraph Records[\"Records (≤ 32752 bytes)\"]\r\n        R1[\"op_type (1B) | txn_id (8B) | payload_len (4B) | payload | crc32 (4B)\"]\r\n        R2[\"op_type (1B) | txn_id (8B) | payload_len (4B) | payload | crc32 (4B)\"]\r\n        R3[\"...\"]\r\n    end\r\n\r\n    Block_N --> Records\r\n    Records --> R1\r\n    Records --> R2\r\n    Records --> R3\r\n```\r\n\r\n\r\n\r\n### WAL 崩溃恢复流程\r\n\r\n```mermaid\r\nflowchart LR\r\n    START[\"RockDuck 启动\"] --> RECOVER[\"recover_from_wal()\"]\r\n    RECOVER --> LIST[\"list_wal_files()\\n扫描 wal_*.bin\"]\r\n    LIST --> SCAN[\"scan_committed_records()\"]\r\n    SCAN --> FSM[\"重建事务状态机\"]\r\n\r\n    FSM --> T1[\"Begin \u002F Insert \u002F Delete \u002F Update\"]\r\n    T1 --> COMMIT{\"OpType == Commit?\"}\r\n    T1 --> ROLLBACK{\"OpType == Rollback?\"}\r\n\r\n    COMMIT -->|\"收集该 txn 的所有记录\"| COMMIT_COLLECT\r\n    ROLLBACK -->|\"丢弃该 txn 的所有记录\"| ROLLBACK_DROP\r\n\r\n    COMMIT_COLLECT --> REPLAY[\"重放到 RocksDB\\npk_idx + pk_skiplist\"]\r\n    ROLLBACK_DROP --> DONE[\"忽略\"]\r\n    REPLAY --> DONE\r\n\r\n    DONE --> MAX[\"max_committed_txn 更新\"]\r\n```\r\n\r\n\r\n\r\n---\r\n\r\n## 读取路径\r\n\r\n```mermaid\r\nflowchart TD\r\n    A[\"get(pk) \u002F scan(pk_range, filter)\"] --> BF[\"Bloom Filter 检查\\n快速跳过不存在的 PK\"]\r\n    A --> ZM[\"Zone Map 裁剪\\n跳过不包含查询值的数据块\"]\r\n    A --> BK[\"Block Stats 裁剪\\nGranule 内谓词下推\"]\r\n\r\n    BF --> IDX[\"RocksDB pk_idx 查找\\n→ IndexEntry\\n{seg_id, granule_id, offset}\"]\r\n    ZM --> IDX\r\n    BK --> IDX\r\n\r\n    IDX --> STATUS{\"Segment 状态\"}\r\n\r\n    STATUS -->|\"Active \u002F Compactable\"| BUF[\"BufReader 读取\\n_arrow-ipc FileReader_\"]\r\n    STATUS -->|\"Frozen\"| MMAP[\"mmap 零拷贝\\nArc\u003CMmap> 缓存共享\"]\r\n\r\n    BUF --> MVCC[\"MVCC 可见性过滤\\nis_visible(snapshot, ...)\"]\r\n    MMAP --> MVCC\r\n\r\n    MVCC --> DM[\"Del Mask 应用\\n已删除行过滤\"]\r\n    DM --> FILT[\"Filter 表达式求值\\nArrow compute filter\"]\r\n    FILT --> RB[\"RecordBatch 返回\"]\r\n```\r\n\r\n\r\n\r\n### Vortex 文件布局\r\n\r\n```mermaid\r\ngraph TD\r\n    subgraph Segment[\"segments\u002F{seg_id}\u002F\"]\r\n        direction TB\r\n        META[\"_meta.vortex\\nSegment 元数据\\nSegmentMeta (bincode)\"]\r\n        DEL[\"_del.vortex\\n删除掩码\\nBooleanArray\"]\r\n\r\n        COL1[\"{col1}.vortex\\nArrow IPC 列文件\"]\r\n        COL2[\"{col2}.vortex\\nArrow IPC 列文件\"]\r\n        COLN[\"...\"]\r\n\r\n        UPD[\"_upd_age.vortex\\n更新掩码\"]\r\n        ZM[\"_zm.json\\nZone Map\"]\r\n    end\r\n\r\n    subgraph VortexReader[\"VortexReader\"]\r\n        READER[\"read_column(seg_id, col)\\n自动选择读取路径\"]\r\n        READER --> META_CHECK{\"meta.status\"}\r\n        META_CHECK -->|Frozen| MMAP_READ[\"read_column_mmap_internal()\"]\r\n        META_CHECK -->|Active| BUF_READ[\"read_arrow_file()\"]\r\n        MMAP_READ --> ARC_CACHE[\"Arc\u003CMmap> 缓存\"]\r\n    end\r\n\r\n    VortexReader --> COL1\r\n    VortexReader --> COL2\r\n    VortexReader --> DEL\r\n```\r\n\r\n\r\n\r\n### Filter 表达式解析与求值\r\n\r\n`filter_expr.rs` 实现了一个手写的表达式解析器，不依赖外部 SQL 解析库：\r\n\r\n```mermaid\r\nflowchart TD\r\n    RAW[\"WHERE age > 30 AND name = 'Alice' OR NOT deleted\"]\r\n    RAW --> TOKEN[\"Tokenizer\"]\r\n    TOKEN --> PARSE[\"Recursive Descent Parser\"]\r\n    PARSE --> AST[\"Expr AST\"]\r\n    AST --> EVAL[\"evaluate(batch, &Expr) → BooleanArray\"]\r\n\r\n    subgraph AST[\"Expr AST\"]\r\n        OR[\"Or\"]\r\n        AND1[\"And\"]\r\n        GT[\"Compare(age > 30)\"]\r\n        EQ[\"Compare(name = 'Alice')\"]\r\n        NOT_DEL[\"Not\"]\r\n        DEL[\"Compare(deleted = true)\"]\r\n        OR --> AND1\r\n        OR --> NOT_DEL\r\n        AND1 --> GT\r\n        AND1 --> EQ\r\n        NOT_DEL --> DEL\r\n    end\r\n\r\n    EVAL --> MASK[\"BooleanMask → Arrow Compute Filter\"]\r\n```\r\n\r\n**支持的操作符**：\r\n\r\n| 类别 | 操作符 |\r\n| --- | --- |\r\n| 比较 | `>`, `>=`, `\u003C`, `\u003C=`, `=`, `!=` |\r\n| 逻辑 | `AND`, `OR`, `NOT` |\r\n| 括号 | `(`, `)` |\r\n| 字面量 | 整数、浮点、字符串、布尔 |\r\n\r\n**求值策略**：先序遍历 AST，短路求值（遇到 `false AND ...` 直接返回，跳过后续列读取）。\r\n\r\n---\r\n\r\n## Compaction 调度\r\n\r\n### 优先级队列与评分公式\r\n\r\n```mermaid\r\nflowchart LR\r\n    subgraph Evaluate[\"evaluate() — 遍历所有 Segment\"]\r\n        direction TB\r\n        E1[\"读取 SegmentMeta\"]\r\n        E1 --> SCORE[\"calculate_priority()\"]\r\n        SCORE --> REASON[\"determine_reason()\"]\r\n    end\r\n\r\n    SCORE -->|\"del_score = del_ratio² × 10\\n+ size_score = log₂(MB) × 0.5\\n+ age_score = log₂(hours) × 0.3\"| FORMULA[\"priority = del_score\\n  + size_score\\n  + age_score\"]\r\n    FORMULA --> PUSH[\"BinaryHeap.push()\\n优先级队列\"]\r\n```\r\n\r\n\r\n\r\n### Compaction 原因判定\r\n\r\n```mermaid\r\nflowchart TD\r\n    M[\"SegmentMeta\"] --> DEL{\"del_ratio > 0.5?\"}\r\n    DEL -->|Yes| HDR[\"CompactionReason::HighDeleteRatio\"]\r\n    DEL -->|No| SIZE{\"size \u003C 1MB?\"}\r\n    SIZE -->|Yes| SF[\"CompactionReason::SmallFile\"]\r\n    SIZE -->|No| UPD{\"upd_ratio > 0.3\\n且 del_ratio \u003C 0.5?\"}\r\n    UPD -->|Yes| INC[\"CompactionReason::IncrementalMaterialize\"]\r\n    UPD -->|No| PER[\"CompactionReason::Periodic\"]\r\n```\r\n\r\n\r\n\r\n### Feature 5: 查询反馈增强的 Compaction 优先级\r\n\r\n```mermaid\r\nflowchart TD\r\n    BASE[\"base_score\\n= del²×10 + size×0.5 + age×0.3\"] --> FEEDBACK\r\n\r\n    subgraph FEEDBACK[\"Query Feedback 惩罚\"]\r\n        STALE[\"Zone Map 失准\\nstaleness_penalty × 5.0\"]\r\n        MISS[\"裁剪失效\\n(1 - prune_hit_ratio) × 3.0\"]\r\n    end\r\n\r\n    BASE --> PEN1[\"+ staleness_penalty\"]\r\n    PEN1 --> PEN2[\"+ miss_penalty\"]\r\n    PEN2 --> FINAL[\"final_priority\\n用于 BinaryHeap 排序\"]\r\n```\r\n\r\n\r\n\r\n---\r\n\r\n## PDT Merge Compaction\r\n\r\n### 位置删除（PDT）原理\r\n\r\n传统 Compaction 比较 key 值再决定去留。PDT（Positional Delete Tracking）只处理位置变化，不比较数据：\r\n\r\n```mermaid\r\ngraph LR\r\n    subgraph Old[\"旧 Segment\"]\r\n        O1[\"位置 0: alive\"]\r\n        O2[\"位置 1: deleted ✗\"]\r\n        O3[\"位置 2: alive\"]\r\n        O4[\"位置 3: deleted ✗\"]\r\n        O5[\"位置 4: alive\"]\r\n    end\r\n\r\n    subgraph PDT[\"PDT Merge\"]\r\n        DM[\"DelMask\\nSkipList \u002F RoaringBitmap\"]\r\n    end\r\n\r\n    subgraph New[\"新 Segment\"]\r\n        N1[\"位置 0: row0 的值\"]\r\n        N2[\"位置 1: row2 的值 ← 跳过 deleted\"]\r\n        N3[\"位置 2: row4 的值 ← 跳过 deleted\"]\r\n    end\r\n\r\n    Old -->|读取 DelMask| PDT\r\n    PDT -->|存活位置列表| New\r\n```\r\n\r\n**核心收益**：I\u002FO 量 = 有效数据量，而非总数据量。\r\n\r\n### 多路合并\r\n\r\n```mermaid\r\nflowchart LR\r\n    S1[\"Segment A\\n(del_ratio=60%)\"] --> M[\"PDT multiway_merge()\"]\r\n    S2[\"Segment B\\n(del_ratio=55%)\"] --> M\r\n    S3[\"Segment C\\n(del_ratio=70%)\"] --> M\r\n    M --> OUT[\"新 Segment\\n(del_ratio=0%)\"]\r\n```\r\n\r\n---\r\n\r\n## Query Feedback — 自适应 Compaction\r\n\r\n### 工作原理\r\n\r\n`QueryFeedbackCollector` 追踪 Zone Map 裁剪命中率，影响 Compaction 优先级：\r\n\r\n```mermaid\r\nflowchart TD\r\n    Q[\"查询 SELECT * WHERE age > 30\"]\r\n\r\n    Q --> ZM[\"Zone Map 估算\\nGranule 0-9 全部可能包含\"]\r\n    ZM --> COMPARE{\"实际匹配 Granule 数\"}\r\n\r\n    COMPARE -->|估算 10，实际 8+| HIT[\"prune_hit\\nstaleness_penalty -= 0.05\"]\r\n    COMPARE -->|估算 10，实际 1| MISS[\"prune_miss\\nstaleness_penalty += 0.1\\nmiss_penalty += 3.0\"]\r\n\r\n    HIT --> PRIORITY[\"优先级分数\"]\r\n    MISS --> PRIORITY\r\n    PRIORITY --> HEAP[\"CompactionHeap\"]\r\n```\r\n\r\n### staleness_penalty 与 prune_hit_ratio\r\n\r\n| 状态 | staleness_penalty | prune_hit_ratio |\r\n| --- | --- | --- |\r\n| 从未查询 | 0.0 | 0.5（中立） |\r\n| 多次 miss | → 1.0（封顶） | → 0.0 |\r\n| 多次 hit | → 0.0（下限） | → 1.0 |\r\n\r\n惩罚分数加入 Compaction 优先级：`priority += staleness_penalty × 5.0 + (1 - prune_hit_ratio) × 3.0`\r\n\r\n---\r\n\r\n## Iceberg v2 导出\r\n\r\n### 双层设计\r\n\r\n```mermaid\r\nflowchart LR\r\n    subgraph Hot[\"热路径 (Native in-RocksDB)\"]\r\n        direction TB\r\n        H1[\"IcebergExport\\n(bincode 序列化)\"]\r\n        H2[\"CF: iceberg_manifest\\nKey: iceberg:latest\"]\r\n        H3[\"freeze_segment()\\n自动更新清单\"]\r\n    end\r\n\r\n    subgraph Cold[\"冷路径 (On-Demand Spec-Compliant)\"]\r\n        direction TB\r\n        C1[\"export_to_iceberg()\"]\r\n        C2[\"v{N}.metadata.json\\n(TableMetadata)\"]\r\n        C3[\"snap-*.avro\\n(Manifest List)\"]\r\n        C4[\"*-m0.avro\\n(Manifest)\"]\r\n        C5[\"data\u002Fsegments\u002F\"]\r\n    end\r\n\r\n    Hot -->|\"freeze_for_iceberg()\"| Cold\r\n```\r\n\r\n\r\n\r\n### Iceberg 导出目录结构\r\n\r\n```mermaid\r\ngraph TD\r\n    root[\"target\u002F (Iceberg Table Root)\"]\r\n    root --> vh[\"version-hint.txt\\n\\\"2\\\"\"]\r\n    root --> meta[\"metadata\u002F\"]\r\n    root --> data[\"data\u002F\"]\r\n\r\n    meta --> vm[\"v{snapshot_id}.metadata.json\"]\r\n    meta --> ml[\"snap-{id}-{seq}-{uuid}.avro\\n(Manifest List)\"]\r\n    meta --> vh2[\"version-hint.txt\\n\\\"2\\\"\"]\r\n\r\n    data --> segs[\"segments\u002F\"]\r\n    segs --> s1[\"{seg_id}\u002F\"]\r\n    segs --> s2[\"{seg_id}\u002F\"]\r\n\r\n    s1 --> c1[\"{col1}.vortex\\n(Vortex 数据文件)\"]\r\n    s1 --> c2[\"{col2}.vortex\"]\r\n    s1 --> dm[\"_del.vortex\"]\r\n    s1 --> m_[\"_meta.vortex\"]\r\n```\r\n\r\n\r\n\r\n### Iceberg 导出流程\r\n\r\n```mermaid\r\nflowchart TD\r\n    START[\"export_to_iceberg()\"] --> COLLECT[\"收集所有 Frozen segments\"]\r\n    COLLECT --> FIELDS[\"提取 field_id_map\\ncol → Iceberg field_id\"]\r\n    FIELDS --> ENTRIES[\"translate::\\nbuild_data_file_entries()\"]\r\n    ENTRIES --> MANIFEST[\"加载\u002F创建 IcebergExport\"]\r\n    MANIFEST --> DIRS[\"创建目录结构\\n(metadata\u002F, data\u002F)\"]\r\n    DIRS --> COPY[\"复制 Vortex 文件到 data\u002F\"]\r\n    COPY --> SCHEMA[\"translate::to_iceberg_schema()\"]\r\n    SCHEMA --> AVRO_M[\"write_manifest_avro_sync()\"]\r\n    AVRO_M --> AVRO_ML[\"write_manifest_list_avro_sync()\"]\r\n    AVRO_ML --> JSON[\"write TableMetadata JSON\"]\r\n    JSON --> FSYNC[\"sync_file() \u002F sync_dir()\\n(Windows: FlushFileBuffers)\"]\r\n    FSYNC --> SAVE[\"save_manifest() → RocksDB\"]\r\n    SAVE --> DONE[\"返回 metadata_path\"]\r\n```\r\n\r\n\r\n\r\n### RocksDB Iceberg Manifest 存储\r\n\r\n```mermaid\r\ngraph LR\r\n    subgraph iceberg_manifest_CF[\"iceberg_manifest Column Family\"]\r\n        K1[\"iceberg:latest\\n→ IcebergExport (bincode)\"]\r\n        K2[\"iceberg:history\\n→ Vec\u003CSnapshotRef> (bincode)\"]\r\n    end\r\n\r\n    subgraph SnapshotRef[\"SnapshotRef\"]\r\n        SR1[\"name: \\\"main\\\"\"]\r\n        SR2[\"snapshot_id: 12345\"]\r\n        SR3[\"type_: \\\"branch\\\"\"]\r\n    end\r\n\r\n    K1 --> IcebergExport[\"IcebergExport\\nsnapshot_id, sequence_number\\nentries: Vec\u003CDataFileEntry>\"]\r\n    IcebergExport --> DFE[\"DataFileEntry\\nfile_path, record_count\\nlower_bounds, upper_bounds\\nnull_counts, split_offsets\"]\r\n```\r\n\r\n\r\n\r\n### DataFileEntry 结构\r\n\r\n```mermaid\r\nclassDiagram\r\n    class DataFileEntry {\r\n        +String file_path\r\n        +String file_format = \"VORTEX\"\r\n        +u64 record_count\r\n        +u64 file_size\r\n        +HashMap~i32, Vec~u8~~ lower_bounds\r\n        +HashMap~i32, Vec~u8~~ upper_bounds\r\n        +HashMap~i32, u64~ null_counts\r\n        +Vec~u64~ split_offsets\r\n        +i32 sort_order_id = 1\r\n    }\r\n```\r\n\r\n\r\n\r\n---\r\n\r\n## DuckDB 集成\r\n\r\n### 自定义 VTab 流式推送\r\n\r\nDuckDB 自带的 `ArrowVTab` 会将所有 RecordBatch concat 成一个巨大批次，RockDuck 实现了自定义 `RockDuckVTab`：\r\n\r\n```mermaid\r\nsequenceDiagram\r\n    participant DuckDB as DuckDB 查询引擎\r\n    participant VTab as RockDuckVTab\r\n\r\n    Note over DuckDB,VTab: bind() 阶段（一次性）\r\n    DuckDB->>VTab: bind(path)\r\n    VTab->>RockDuck: RockDuck::open(path)\r\n    VTab->>VTab: scan(\"default\")\r\n    VTab-->>DuckDB: 注册 schema 列 + cardinality\r\n\r\n    Note over DuckDB,VTab: init() 阶段（一次性）\r\n    DuckDB->>VTab: init()\r\n    VTab-->>DuckDB: max_threads = 1\r\n\r\n    Note over DuckDB,VTab: func() 阶段（按需多次调用）\r\n    DuckDB->>VTab: func(output)\r\n    VTab->>VTab: batch_index.fetch_add(1, Relaxed)\r\n    alt 还有批次\r\n        VTab-->>DuckDB: record_batch_to_duckdb_data_chunk(batch[idx])\r\n    else 所有批次已推送\r\n        VTab-->>DuckDB: set_len(0)\r\n    end\r\n\r\n    DuckDB->>VTab: func(output)\r\n    Note right of VTab: batch_index = 1\r\n    VTab-->>DuckDB: batches[1]\r\n```\r\n\r\n\r\n\r\n### DuckDB SQL 表函数\r\n\r\n```mermaid\r\ngraph LR\r\n    subgraph TableFunctions[\"docdb_* 表函数\"]\r\n        F1[\"docdb_scan(path)\\n扫描 RockDuck 表\"]\r\n        F2[\"docdb_iceberg_info(path)\\n读取 metadata.json\"]\r\n        F3[\"docdb_iceberg_entries(data_dir)\\n列出 Vortex 数据文件\"]\r\n    end\r\n\r\n    F1 --> DuckDB_SQL[\"DuckDB SQL 查询\"]\r\n    F2 --> DuckDB_SQL\r\n    F3 --> DuckDB_SQL\r\n```\r\n\r\n\r\n\r\n```sql\r\n-- DuckDB 中使用示例\r\nSELECT * FROM docdb_scan('\u002Fpath\u002Fto\u002Frockduck\u002Fdata');\r\n\r\nINSTALL vortex;\r\nLOAD vortex;\r\nSELECT * FROM read_vortex('\u002Fpath\u002Fto\u002Fexported\u002Fsegments\u002F*\u002F*.vortex');\r\n```\r\n\r\n\r\n\r\n### 端到端测试覆盖\r\n\r\n`tests\u002Fintegration_tests.rs` 包含 30+ 个端到端测试，覆盖完整生命周期：\r\n\r\n```mermaid\r\nflowchart TD\r\n    subgraph Lifecycle[\"数据库生命周期\"]\r\n        L1[\"test_open_database_default\"]\r\n        L2[\"test_open_database_custom_config\"]\r\n        L3[\"test_data_persists_after_reopen\"]\r\n    end\r\n\r\n    subgraph Write[\"写入测试\"]\r\n        W1[\"test_insert_single_record_then_get\"]\r\n        W2[\"test_insert_batch_then_scan_all\"]\r\n        W3[\"test_batch_insert_individual_point_gets\"]\r\n        W4[\"test_large_batch_insert_and_scan\"]\r\n        W5[\"test_flush_succeeds\"]\r\n        W6[\"test_next_txn_id_incrementing\"]\r\n    end\r\n\r\n    subgraph Delete[\"删除测试\"]\r\n        D1[\"test_delete_then_point_get_returns_none\"]\r\n        D2[\"test_deleted_records_excluded_from_scan\"]\r\n        D3[\"test_double_delete_is_idempotent\"]\r\n        D4[\"test_delete_then_insert_same_key\"]\r\n        D5[\"test_delete_nonexistent_returns_error\"]\r\n    end\r\n\r\n    subgraph Scan[\"扫描测试\"]\r\n        S1[\"test_scan_all_records\"]\r\n        S2[\"test_scan_with_pk_range_half_open\"]\r\n        S3[\"test_scan_nonexistent_table\"]\r\n        S4[\"test_scan_empty_range_returns_nothing\"]\r\n        S5[\"test_scan_with_filter_returns_correct_rows\"]\r\n    end\r\n\r\n    subgraph Stats[\"统计测试\"]\r\n        T1[\"test_table_stats_row_count_matches_scan\"]\r\n        T2[\"test_table_stats_alive_rows_after_delete\"]\r\n        T3[\"test_table_stats_basic\"]\r\n        T4[\"test_table_stats_del_ratio_zero\"]\r\n    end\r\n\r\n    subgraph Segment[\"Segment 测试\"]\r\n        G1[\"test_list_segments\"]\r\n        G2[\"test_get_segment_meta\"]\r\n        G3[\"test_list_segments_returns_after_insert\"]\r\n        G4[\"test_mmap_read_returns_same_as_bufreader\"]\r\n    end\r\n\r\n    subgraph MultiTable[\"多表隔离\"]\r\n        M1[\"test_multiple_tables_data_isolation\"]\r\n    end\r\n```\r\n\r\n**测试原则**：每个写入测试必须验证数据能被正确读回，确保端到端数据一致性。\r\n\r\n---\r\n\r\n## 配置参考\r\n\r\n\r\n| 配置项                     | 默认值               | 说明                   |\r\n| ----------------------- | ----------------- | -------------------- |\r\n| `data_dir`              | `.\u002Frockduck_data` | 根数据目录                |\r\n| `granule_size`          | 1 MB              | 每 Granule 的行数        |\r\n| `segment_target_size`   | 1 GB              | Segment 目标大小         |\r\n| `num_threads`           | CPU 核心数           | 并行度                  |\r\n| `enable_bloom_filter`   | `true`            | 写入路径布隆过滤器            |\r\n| `bloom_filter_fpp`      | `0.01`            | 布隆过滤器假阳性率            |\r\n| `enable_zone_map`       | `true`            | Granule 级 min\u002Fmax 统计 |\r\n| `enable_compression`    | `true`            | 列压缩                  |\r\n| `compression_algorithm` | `\"lz4\"`           | lz4 \u002F zstd \u002F snappy  |\r\n| `enable_wal`            | `true`            | 写前日志（崩溃恢复）           |\r\n| `wal_max_file_size`     | 128 MB            | WAL 文件轮转阈值           |\r\n\r\n\r\n---\r\n\r\n## 核心模块一览\r\n\r\n\r\n| 模块 | 文件 | 职责 |\r\n| --- | --- | --- |\r\n| **入口** | `db.rs` | `RockDuck` 主结构体，所有公开 API，WAL 恢复编排 |\r\n| **存储** | `storage\u002Fvortex.rs` | VortexWriter\u002FVortexReader，支持 BufReader 和 mmap 零拷贝 |\r\n| **元数据** | `metadata\u002Frocksdb.rs` | RocksDB 初始化，12 个 Column Family 管理 |\r\n| **MVCC** | `mvcc\u002Fvisibility.rs` | 可见性管理，三种隔离级别，Time-Travel 快照 |\r\n| **WAL** | `write\u002Fwal.rs` | 32KB Block + CRC32，崩溃恢复，WAL rotation |\r\n| **写入** | `write\u002Finsert.rs` | Insert\u002FDelete\u002FUpdate，双索引（Hash + Skiplist）写入 |\r\n| **读取** | `read\u002Fscan.rs` | 范围扫描，pk_skiplist 有序遍历，DeltaStore overlay 合并 |\r\n| **点查** | `read\u002Fpoint_get.rs` | 主键查找，LBF（Learned Bloom Filter）预测，Bloom Filter 检查 |\r\n| **删除掩码** | `segment\u002Fdel_mask.rs` | SkipList \u002F RoaringBitmap \u002F FullBitmap 自适应切换 |\r\n| **DeltaStore** | `segment\u002Fdelta_store.rs` | Cell 级更新追踪，before\u002Fafter 镜像，MVCC 可见性过滤 |\r\n| **列编码** | `segment\u002Fencoding.rs` | AdaptiveEncoder，真实数据采样分析，自适应编码推荐 |\r\n| **Segment Layout** | `segment\u002Flayout.rs` | PAX 目录结构，文件命名规范 |\r\n| **Segment 元数据** | `segment\u002Fmeta.rs` | SegmentMeta\u002FGranuleMeta\u002FBlockStats，Zone Map，CompareOp |\r\n| **路由** | `query\u002Frouter.rs` | HTAP 三路径选择（VortexOnly \u002F DeltaStoreOnly \u002F Merge） |\r\n| **VTab** | `query\u002Fvtab.rs` | RockDuckVTab 流式推送，AtomicUsize 批次索引追踪 |\r\n| **DuckDB 函数** | `query\u002Fduckdb_ext.rs` | `docdb_scan`、`docdb_iceberg_info`、`docdb_iceberg_entries` |\r\n| **过滤器表达式** | `query\u002Ffilter_expr.rs` | 解析器（tokenizer\u002Fparser），支持 AND\u002FOR\u002FNOT\u002F比较符 |\r\n| **查询反馈** | `query\u002Ffeedback.rs` | QueryFeedbackCollector，Zone Map 命中率追踪，staleness penalty |\r\n| **Compaction 调度** | `compaction\u002Fscheduler.rs` | BinaryHeap 优先级队列，评分公式，CompactionReason 判定 |\r\n| **PDT Merge** | `compaction\u002Fpdt_merge.rs` | 位置删除合并，多路合并，MergeStats 统计 |\r\n| **Iceberg 导出** | `iceberg\u002Fexport.rs` | Iceberg v2 导出编排器 |\r\n| **Iceberg Avro** | `iceberg\u002Favro_writer.rs` | Avro Manifest 写入，3 个硬编码 Avro Schema |\r\n| **Iceberg 清单** | `iceberg\u002Fcatalog.rs` | RocksDB 内 Iceberg 原生清单存储 |\r\n| **Iceberg 翻译** | `iceberg\u002Ftranslate.rs` | Arrow -> Iceberg Schema 翻译，DataFileEntry 构建 |\r\n| **配置** | `config.rs` | RockDuckConfig Builder，支持 bloom\u002Fzone_map\u002Fcompression\u002FWAL 配置 |\r\n\r\n\r\n---\r\n\r\n## 许可\r\n\r\nApache License 2.0\r\n","RockDuck 是一个用 Rust 构建的 HTAP 嵌入式数据库，旨在同时支持事务型（OLTP）和分析型（OLAP）工作负载。它结合了 DeltaStore 行级增量存储、Vortex 列式存储以及 DuckDB SQL 引擎，提供高效的数据处理能力。项目借鉴了 Apache Iceberg 的 MVCC 机制、ClickHouse MergeTree 的分层组织和后台 Compaction 机制，以及 Snowflake 的 Zone Map 和 HTAP 双存储路由技术。这些特性使得 RockDuck 能够在保证事务一致性的同时，实现高效的分析查询。适用于需要在同一系统中进行实时交易处理和复杂数据分析的应用场景。",2,"2026-06-11 04:09:07","CREATED_QUERY"]