[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72121":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":24,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":31,"discoverSource":32},72121,"smallpond","deepseek-ai\u002Fsmallpond","deepseek-ai","A lightweight data processing framework built on DuckDB and 3FS.","",null,"Python",4961,444,50,23,0,2,9,6,29.95,"MIT License",false,"main",true,[26,27],"data-processing","duckdb","2026-06-12 02:02:58","# smallpond\n\n[![CI](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fsmallpond\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg?branch=main)](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002Fsmallpond\u002Factions\u002Fworkflows\u002Fci.yml)\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fsmallpond)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fsmallpond\u002F)\n[![Docs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdocs-latest-brightgreen.svg)](https:\u002F\u002Fdeepseek-ai.github.io\u002Fsmallpond\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue.svg)](LICENSE)\n\nA lightweight data processing framework built on [DuckDB] and [3FS].\n\n## Features\n\n- 🚀 High-performance data processing powered by DuckDB\n- 🌍 Scalable to handle PB-scale datasets\n- 🛠️ Easy operations with no long-running services\n\n## Installation\n\nPython 3.8 to 3.12 is supported.\n\n```bash\npip install smallpond\n```\n\n## Quick Start\n\n```bash\n# Download example data\nwget https:\u002F\u002Fduckdb.org\u002Fdata\u002Fprices.parquet\n```\n\n```python\nimport smallpond\n\n# Initialize session\nsp = smallpond.init()\n\n# Load data\ndf = sp.read_parquet(\"prices.parquet\")\n\n# Process data\ndf = df.repartition(3, hash_by=\"ticker\")\ndf = sp.partial_sql(\"SELECT ticker, min(price), max(price) FROM {0} GROUP BY ticker\", df)\n\n# Save results\ndf.write_parquet(\"output\u002F\")\n# Show results\nprint(df.to_pandas())\n```\n\n## Documentation\n\nFor detailed guides and API reference:\n- [Getting Started](docs\u002Fsource\u002Fgetstarted.rst)\n- [API Reference](docs\u002Fsource\u002Fapi.rst)\n\n## Performance\n\nWe evaluated smallpond using the [GraySort benchmark] ([script]) on a cluster comprising 50 compute nodes and 25 storage nodes running [3FS].  The benchmark sorted 110.5TiB of data in 30 minutes and 14 seconds, achieving an average throughput of 3.66TiB\u002Fmin.\n\nDetails can be found in [3FS - Gray Sort].\n\n[DuckDB]: https:\u002F\u002Fduckdb.org\u002F\n[3FS]: https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS\n[GraySort benchmark]: https:\u002F\u002Fsortbenchmark.org\u002F\n[script]: benchmarks\u002Fgray_sort_benchmark.py\n[3FS - Gray Sort]: https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002F3FS?tab=readme-ov-file#2-graysort\n\n## Development\n\n```bash\npip install .[dev]\n\n# run unit tests\npytest -v tests\u002Ftest*.py\n\n# build documentation\npip install .[docs]\ncd docs\nmake html\npython -m http.server --directory build\u002Fhtml\n```\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n","smallpond 是一个基于 DuckDB 和 3FS 构建的轻量级数据处理框架。它利用 DuckDB 提供高性能的数据处理能力，支持 PB 级别数据集的扩展，并且无需维护长时间运行的服务即可轻松操作。此框架适用于需要高效处理大规模数据集的场景，如大数据分析、机器学习预处理等。通过简单的 Python API，用户可以快速加载、处理并保存数据，同时保持较低的运维成本。","2026-06-11 03:40:27","high_star"]