[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-5681":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":46,"readmeContent":47,"aiSummary":48,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":49,"discoverSource":50},5681,"Daft","Eventual-Inc\u002FDaft","Eventual-Inc","High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale","https:\u002F\u002Fdaft.ai",null,"Rust",5562,488,33,249,0,2,17,91,12,85.67,"Apache License 2.0",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45],"ai-engineering","ai-pipeline","arrow","artificial-intelligence","big-data","data-engineering","distributed","distributed-computing","distributed-systems","embeddings","etl","huggingface","iceberg","machine-learning","multimodal","parquet","python","ray","rust","2026-06-12 04:00:26","|Banner|\n\n|CI| |PyPI| |Latest Tag| |Coverage| |Slack|\n\n`Website \u003Chttps:\u002F\u002Fwww.daft.ai>`_ • `Docs \u003Chttps:\u002F\u002Fdocs.daft.ai>`_ • `Installation \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Finstall\u002F>`_ • `Daft Quickstart \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Fquickstart\u002F>`_ • `Community and Support \u003Chttps:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft\u002Fdiscussions>`_\n\nDaft: High-Performance Data Engine for AI and Multimodal Workloads\n==================================================================\n\n|TrendShift|\n\n`Daft \u003Chttps:\u002F\u002Fwww.daft.ai>`_ is a high-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale.\n\n* **Native multimodal processing:** Process images, audio, video, and embeddings alongside structured data in a single framework\n* **Built-in AI operations:** Run LLM prompts, generate embeddings, and classify data at scale using OpenAI, Transformers, or custom models\n* **Python-native, Rust-powered:** Skip the JVM complexity with Python at its core and Rust under the hood for blazing performance\n* **Seamless scaling:** Start local, scale to distributed clusters on `Ray \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Fdistributed\u002Fray\u002F>`_, `Kubernetes \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Fdistributed\u002Fkubernetes\u002F>`_\n* **Universal connectivity:** Access data anywhere (S3, GCS, Iceberg, Delta Lake, Hugging Face, Unity Catalog)\n* **Out-of-box reliability:** Intelligent memory management and sensible defaults eliminate configuration headaches\n\nGetting Started\n---------------\n\nInstallation\n^^^^^^^^^^^^\n\nInstall Daft with ``pip install daft``. Requires Python 3.10 or higher.\n\nFor more advanced installations (e.g. installing from source or with extra dependencies such as Ray and AWS utilities), please see our `Installation Guide \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Finstall\u002F>`_\n\nQuickstart\n^^^^^^^^^^\n\nGet started in minutes with our `Quickstart \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Fquickstart\u002F>`_ - load a real-world e-commerce dataset, process product images, and run AI inference at scale.\n\n\nMore Resources\n^^^^^^^^^^^^^^\n\n* `Examples \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Fexamples\u002F>`_ - see Daft in action with use cases across text, images, audio, and more\n* `User Guide \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002F>`_ - take a deep-dive into each topic within Daft\n* `API Reference \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Fapi\u002F>`_ - API reference for public classes\u002Ffunctions of Daft\n\nBenchmarks\n----------\n|Benchmark Image|\n\nTo see the full benchmarks, detailed setup, and logs, check out our `benchmarking page. \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Fbenchmarks>`_\n\nContributing\n------------\n\nWe ❤️ developers! To start contributing to Daft, please read `CONTRIBUTING.md \u003Chttps:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft\u002Fblob\u002Fmain\u002FCONTRIBUTING.md>`_. This document describes the development lifecycle and toolchain for working on Daft. It also details how to add new functionality to the core engine and expose it through a Python API.\n\nHere's a list of `good first issues \u003Chttps:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft\u002Fissues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22>`_ to get yourself warmed up with Daft. Comment in the issue to pick it up, and feel free to ask any questions!\n\nTelemetry\n---------\n\nTo help improve Daft, we collect non-identifiable data via Scarf (https:\u002F\u002Fscarf.sh).\n\nTo disable this behavior, set the environment variable ``DO_NOT_TRACK=true``.\n\nThe data that we collect is:\n\n1. **Non-identifiable:** No session IDs or user identifiers are collected\n2. **Metadata-only:** We do not collect any of our users’ proprietary code or data\n3. **For development only:** We do not buy or sell any user data\n\nPlease see our `documentation \u003Chttps:\u002F\u002Fdocs.daft.ai\u002Fen\u002Fstable\u002Ftelemetry\u002F>`_ for more details.\n\n.. image:: https:\u002F\u002Fstatic.scarf.sh\u002Fa.png?x-pxid=31f8d5ba-7e09-4d75-8895-5252bbf06cf6\n\nRelated Projects\n----------------\n\n+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+\n| Engine                                            | Query Optimizer | Multimodal    | Distributed | Arrow Backed    | Vectorized Execution Engine | Out-of-core |\n+===================================================+=================+===============+=============+=================+=============================+=============+\n| Daft                                              | Yes             | Yes           | Yes         | Yes             | Yes                         | Yes         |\n+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+\n| `Pandas \u003Chttps:\u002F\u002Fgithub.com\u002Fpandas-dev\u002Fpandas>`_  | No              | Python object | No          | optional >= 2.0 | Some(Numpy)                 | No          |\n+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+\n| `Polars \u003Chttps:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars>`_     | Yes             | Python object | No          | Yes             | Yes                         | Yes         |\n+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+\n| `Modin \u003Chttps:\u002F\u002Fgithub.com\u002Fmodin-project\u002Fmodin>`_ | Yes             | Python object | Yes         | No              | Some(Pandas)                | Yes         |\n+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+\n| `Ray Data \u003Chttps:\u002F\u002Fgithub.com\u002Fray-project\u002Fray>`_  | No              | Yes           | Yes         | Yes             | Some(PyArrow)               | Yes         |\n+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+\n| `PySpark \u003Chttps:\u002F\u002Fgithub.com\u002Fapache\u002Fspark>`_      | Yes             | No            | Yes         | Pandas UDF\u002FIO   | Pandas UDF                  | Yes         |\n+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+\n| `Dask DF \u003Chttps:\u002F\u002Fgithub.com\u002Fdask\u002Fdask>`_         | No              | Python object | Yes         | No              | Some(Pandas)                | Yes         |\n+---------------------------------------------------+-----------------+---------------+-------------+-----------------+-----------------------------+-------------+\n\nLicense\n-------\n\nDaft has an Apache 2.0 license - please see the LICENSE file.\n\n.. |Quickstart Image| image:: https:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft\u002Fassets\u002F17691182\u002Fdea2f515-9739-4f3e-ac58-cd96d51e44a8\n   :alt: Dataframe code to load a folder of images from AWS S3 and create thumbnails\n   :height: 256\n\n.. |Benchmark Image| image:: https:\u002F\u002Fraw.githubusercontent.com\u002FEventual-Inc\u002FDaft\u002Frefs\u002Fheads\u002Fmain\u002Fassets\u002Fbenchmark.png\n   :alt: AI Benchmarks\n\n.. |Banner| image:: https:\u002F\u002Fdaft.ai\u002Fimages\u002Fdiagram.png\n   :target: https:\u002F\u002Fwww.daft.ai\n   :alt: Daft dataframes can load any data such as PDF documents, images, protobufs, csv, parquet and audio files into a table dataframe structure for easy querying\n\n.. |CI| image:: https:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft\u002Factions\u002Fworkflows\u002Fpr-test-suite.yml\u002Fbadge.svg\n   :target: https:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft\u002Factions\u002Fworkflows\u002Fpr-test-suite.yml?query=branch:main\n   :alt: GitHub Actions tests\n\n.. |PyPI| image:: https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fdaft.svg?label=pip&logo=PyPI&logoColor=white\n   :target: https:\u002F\u002Fpypi.org\u002Fproject\u002Fdaft\n   :alt: PyPI\n\n.. |Latest Tag| image:: https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Ftag\u002FEventual-Inc\u002FDaft?label=latest&logo=GitHub\n   :target: https:\u002F\u002Fgithub.com\u002FEventual-Inc\u002FDaft\u002Ftags\n   :alt: latest tag\n\n.. |Coverage| image:: https:\u002F\u002Fcodecov.io\u002Fgh\u002FEventual-Inc\u002FDaft\u002Fbranch\u002Fmain\u002Fgraph\u002Fbadge.svg?token=J430QVFE89\n   :target: https:\u002F\u002Fcodecov.io\u002Fgh\u002FEventual-Inc\u002FDaft\n   :alt: Coverage\n\n.. |Slack| image:: https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fslack-@distdata-purple.svg?logo=slack\n   :target: https:\u002F\u002Fjoin.slack.com\u002Ft\u002Fdist-data\u002Fshared_invite\u002Fzt-3rh9jr9iv-tmmTNOlQpfvhEy2NTMWS_w\n   :alt: slack community\n\n.. |TrendShift| image:: https:\u002F\u002Ftrendshift.io\u002Fapi\u002Fbadge\u002Frepositories\u002F8239\n   :target: https:\u002F\u002Ftrendshift.io\u002Frepositories\u002F8239\n   :alt: Eventual-Inc\u002FDaft | Trendshift\n   :width: 250px\n   :height: 55px\n","Daft 是一个面向AI和多模态工作负载的高性能数据引擎，能够处理任意规模的图像、音频、视频和结构化数据。其核心功能包括原生多模态处理能力，支持在单一框架内处理多种类型的数据；内置AI操作，可以使用OpenAI、Transformers或自定义模型进行大规模的LLM提示、嵌入生成和数据分类；采用Python作为主要编程语言，并以Rust为底层实现，提供卓越性能的同时避免了JVM的复杂性；支持从本地到分布式集群（如Ray、Kubernetes）的无缝扩展；具有广泛的连接性，可以从S3、GCS等多种存储中访问数据；并具备开箱即用的可靠性特性，智能内存管理和合理的默认设置减少了配置上的麻烦。Daft适用于需要高效处理大规模多媒体及结构化数据集的AI应用场景，特别是在构建复杂的多模态分析与机器学习管道时。","2026-06-11 03:04:42","top_language"]