[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-73323":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":24,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":36,"lastSyncTime":37,"discoverSource":38},73323,"vortex","vortex-data\u002Fvortex","vortex-data","An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.","https:\u002F\u002Fvortex.dev",null,"Rust",2995,169,17,216,0,8,24,70,93.69,"Apache License 2.0",false,"develop",true,[26,27,28,29,30,31,32],"array","arrow","compression","file","multimodal","python","rust","2026-06-12 04:01:09","# 🌪️ Vortex\n\n[![Build Status](https:\u002F\u002Fgithub.com\u002Fvortex-data\u002Fvortex\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002Fvortex-data\u002Fvortex\u002Factions)\n[![OpenSSF Best Practices](https:\u002F\u002Fwww.bestpractices.dev\u002Fprojects\u002F10567\u002Fbadge)](https:\u002F\u002Fwww.bestpractices.dev\u002Fprojects\u002F10567)\n[![Documentation](https:\u002F\u002Fdocs.rs\u002Fvortex\u002Fbadge.svg)](https:\u002F\u002Fdocs.vortex.dev)\n[![CodSpeed Badge](https:\u002F\u002Fimg.shields.io\u002Fendpoint?url=https:\u002F\u002Fcodspeed.io\u002Fbadge.json)](https:\u002F\u002Fcodspeed.io\u002Fvortex-data\u002Fvortex)\n[![Crates.io](https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fvortex.svg)](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fvortex)\n[![PyPI - Version](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fvortex-data)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fvortex-data\u002F)\n[![Maven - Version](https:\u002F\u002Fimg.shields.io\u002Fmaven-central\u002Fv\u002Fdev.vortex\u002Fvortex-spark)](https:\u002F\u002Fcentral.sonatype.com\u002Fartifact\u002Fdev.vortex\u002Fvortex-spark)\n[![codecov](https:\u002F\u002Fcodecov.io\u002Fgithub\u002Fvortex-data\u002Fvortex\u002Fgraph\u002Fbadge.svg)](https:\u002F\u002Fcodecov.io\u002Fgithub\u002Fvortex-data\u002Fvortex)\n\n[Join the community on Slack!](https:\u002F\u002Fvortex.dev\u002Fslack) | [Documentation](https:\u002F\u002Fdocs.vortex.dev\u002F) | [Performance Benchmarks](https:\u002F\u002Fbench.vortex.dev)\n\nIf you are interested in closer collaboration, please email info@vortex.dev\n\n## Overview\n\nVortex is a next-generation columnar file format and toolkit designed for high-performance data processing.\nIt is the fastest and most extensible format for building data systems backed by object storage. It provides:\n\n- **Blazing Fast Performance**\n  - 100x faster random access reads (vs. modern Apache Parquet)\n  - 10-20x faster scans\n  - 5x faster writes\n  - Similar compression ratios\n  - Efficient support for wide tables with zero-copy\u002Fzero-parse metadata\n\n- **Extensible Architecture**\n  - Modeled after Apache DataFusion's extensible approach\n  - Pluggable encoding system, type system, compression strategy, & layout strategy\n  - Zero-copy compatibility with Apache Arrow\n\n- **Open Source, Neutral Governance**\n  - A Linux Foundation (LF AI & Data) Project\n  - Apache-2.0 Licensed\n\n- **Integrations**\n  - Arrow, DataFusion, DuckDB, Spark, Pandas, Polars, & more\n  - Apache Iceberg (coming soon)\n\n> 🟢 **Development Status**: Library APIs may change from version to version, but we now consider\n> the file format \u003Cins>_stable_\u003C\u002Fins>. From release 0.36.0, all future releases of Vortex should\n> maintain backwards compatibility of the file format (i.e., be able to read files written by\n> any earlier version >= 0.36.0).\n\n## Key Features\n\n### Core Capabilities\n\n- **Logical Types** - Clean separation between logical schema and physical layout\n- **Zero-Copy Arrow Integration** - Seamless conversion to\u002Ffrom Apache Arrow arrays\n- **Extensible Encodings** - Pluggable physical layouts with built-in optimizations\n- **Cascading Compression** - Support for nested encoding schemes\n- **High-Performance Computing** - Optimized compute kernels for encoded data\n- **Rich Statistics** - Lazy-loaded summary statistics for optimization\n\n### Technical Architecture\n\n#### Logical vs Physical Design\n\nVortex strictly separates logical and physical concerns:\n\n- **Logical Layer**: Defines data types and schema\n- **Physical Layer**: Handles encoding and storage implementation\n- **Built-in Encodings**: Compatible with Apache Arrow's memory format\n- **Extension Encodings**: Optimized compression schemes (RLE, dictionary, etc.)\n\n## Quick Start\n\n### Installation\n\n#### Rust Crate\n\nAll features are exported through the main `vortex` crate.\n\n```bash\ncargo add vortex\n```\n\n#### Python Package\n\n```bash\nuv add vortex-data\n```\n\n#### Command Line UI (vx)\n\nFor browsing the structure of Vortex files, you can use the `vx` command-line tool.\n\n```bash\n# Install pre-built binary (fast, recommended)\ncargo binstall vortex-tui\n\n# Or build from source\ncargo install vortex-tui --locked\n\n# Or run via Python without installing\nuvx --from vortex-data vx --help\n\n# Usage\nvx browse \u003Cfile>\n```\n\n### Development Setup\n\n#### Prerequisites (macOS)\n\n```bash\n# Optional but recommended dependencies\nbrew install flatbuffers protobuf  # For .fbs and .proto files\nbrew install duckdb               # For benchmarks\n\n# Install Rust toolchain\ncurl --proto '=https' --tlsv1.2 -sSf https:\u002F\u002Fsh.rustup.rs | sh\n# or\nbrew install rustup\n\n# Initialize submodules\ngit submodule update --init --recursive\n\n# Setup dependencies with uv\nuv sync --all-packages\n```\n\n### Benchmarking\n\nUse `vx-bench` to run benchmarks comparing engines (DataFusion, DuckDB) and formats (Parquet, Vortex):\n\n```bash\n# Install the benchmark orchestrator\nuv tool install \"bench_orchestrator @ .\u002Fbench-orchestrator\u002F\"\n\n# Run TPC-H benchmarks\nvx-bench run tpch --engine datafusion,duckdb --format parquet,vortex\n\n# Compare results\nvx-bench compare --run latest\n```\n\nSee [bench-orchestrator\u002FREADME.md](bench-orchestrator\u002FREADME.md) for full documentation.\n\n### Performance Optimization\n\nFor optimal performance, we suggest using [MiMalloc](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fmimalloc):\n\n```rust,ignore\n#[global_allocator]\nstatic GLOBAL_ALLOC: MiMalloc = MiMalloc;\n```\n\n## Project Information\n\n### License\n\nLicensed under the Apache License, Version 2.0.\n\n### Governance\n\nVortex is an independent open-source project and not controlled by any single company. The Vortex Project is a\nsub-project of the Linux Foundation Projects. The governance model is documented in\n[CONTRIBUTING.md](CONTRIBUTING.md) and is subject to the terms of\nthe [Technical Charter](https:\u002F\u002Fvortex.dev\u002Fcharter.pdf).\n\n### Contributing\n\nPlease **do** read [CONTRIBUTING.md](CONTRIBUTING.md) before you contribute.\n\n### Reporting Vulnerabilities\n\nIf you discover a security vulnerability, please email \u003Cvuln-report@vortex.dev>.\n\n### Trademarks\n\nCopyright © Vortex a Series of LF Projects, LLC.\nFor terms of use, trademark policy, and other project policies please see \u003Chttps:\u002F\u002Flfprojects.org>\n\n## Acknowledgments\n\nThe Vortex project benefits enormously from groundbreaking work from the academic & open-source communities.\n\n### Research in Vortex\n\n- [BtrBlocks](https:\u002F\u002Fwww.cs.cit.tum.de\u002Ffileadmin\u002Fw00cfj\u002Fdis\u002Fpapers\u002Fbtrblocks.pdf) - Efficient columnar compression\n- [FastLanes](https:\u002F\u002Fwww.vldb.org\u002Fpvldb\u002Fvol16\u002Fp2132-afroozeh.pdf) & [FastLanes on GPU](https:\u002F\u002Fdbdbd2023.ugent.be\u002Fabstracts\u002Ffelius_fastlanes.pdf) - High-performance integer compression\n- [FSST](https:\u002F\u002Fwww.vldb.org\u002Fpvldb\u002Fvol13\u002Fp2649-boncz.pdf) - Fast random access string compression\n- [ALP](https:\u002F\u002Fir.cwi.nl\u002Fpub\u002F33334\u002F33334.pdf) & [G-ALP](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3736227.3736242) - Adaptive lossless floating-point compression\n- [Procella](https:\u002F\u002Fdl.acm.org\u002Fcitation.cfm?id=3360438) - YouTube's unified data system\n- [Anyblob](https:\u002F\u002Fwww.durner.dev\u002Fapp\u002Fmedia\u002Fpapers\u002Fanyblob-vldb23.pdf) - High-performance access to object storage\n- [ClickHouse](https:\u002F\u002Fwww.vldb.org\u002Fpvldb\u002Fvol17\u002Fp3731-schulze.pdf) - Fast analytics for everyone\n- [MonetDB\u002FX100](https:\u002F\u002Fwww.cidrdb.org\u002Fcidr2005\u002Fpapers\u002FP19.pdf) - Hyper-Pipelining Query Execution\n- [Morsel-Driven Parallelism](https:\u002F\u002Fdb.in.tum.de\u002F~leis\u002Fpapers\u002Fmorsels.pdf): A NUMA-Aware Query Evaluation Format for the Many-Core Age\n- [The FastLanes File Format](https:\u002F\u002Fgithub.com\u002Fcwida\u002FFastLanes\u002Fblob\u002Fdev\u002Fdocs\u002Fspecification.pdf) - Expression Operators\n\n### Vortex in Research\n\n- [Anyblox](https:\u002F\u002Fgienieczko.com\u002Fanyblox-paper) - A Framework for Self-Decoding Datasets\n- [F3](https:\u002F\u002Fdl.acm.org\u002Fdoi\u002Fpdf\u002F10.1145\u002F3749163) - Open-Source Data File Format for the Future\n\n### Open Source Inspiration\n\n- [Apache Arrow](https:\u002F\u002Farrow.apache.org)\n- [Apache DataFusion](https:\u002F\u002Fgithub.com\u002Fapache\u002Fdatafusion)\n- [parquet2](https:\u002F\u002Fgithub.com\u002Fjorgecarleitao\u002Fparquet2) by Jorge Leitao\n- [DuckDB](https:\u002F\u002Fgithub.com\u002Fduckdb\u002Fduckdb)\n- [Velox](https:\u002F\u002Fgithub.com\u002Ffacebookincubator\u002Fvelox) & [Nimble](https:\u002F\u002Fgithub.com\u002Ffacebookincubator\u002Fnimble)\n\n#### Thanks to all contributors who have shared their knowledge and code with the community! 🚀\n","Vortex 是一个高性能的列式文件格式和工具包，专为快速数据处理而设计。其核心功能包括极快的数据读写速度（相比现代Apache Parquet，随机访问读取速度快100倍，扫描速度快10-20倍，写入速度快5倍），同时保持相似的压缩比；支持广泛的表结构与零拷贝\u002F零解析元数据；以及高度可扩展的架构，允许用户自定义编码系统、类型系统、压缩策略及布局策略。此外，Vortex与多种流行的数据处理库如Arrow、DataFusion、DuckDB等无缝集成。该项目采用Rust语言编写，并遵循Apache 2.0许可证，在Linux基金会的支持下进行中立治理。适用于需要高效存储与分析大规模数据集的应用场景，特别是在基于对象存储构建数据系统的背景下。",2,"2026-06-11 03:45:01","high_star"]