[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9842":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":15,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":38,"readmeContent":39,"aiSummary":40,"trendingCount":16,"starSnapshotCount":16,"syncStatus":41,"lastSyncTime":42,"discoverSource":43},9842,"awesome-opensource-ai","alvinreal\u002Fawesome-opensource-ai","alvinreal","Curated list of the best truly open-source AI projects, models, tools, and infrastructure.","https:\u002F\u002Fawesomeosai.com",null,"Python",3847,436,36,12,0,108,285,47,109.92,"Other",false,"main",true,[26,27,28,29,30,31,32,33,34,35,36,37],"agents","ai","artificial-intelligence","awesome","awesome-list","generative-ai","llm","machine-learning","mlops","open-source","open-source-ai","rag","2026-06-12 04:00:47","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"assets\u002Fosai.png\" alt=\"Awesome Open Source AI\" width=\"120\" \u002F>\n\n\n\n# Awesome Open Source AI\n\n*A curated list of **battle-tested, production-proven** open-source AI models, libraries, infrastructure, and developer tools. Only elite-tier projects make this list. Updated May 6, 2026. CI verified - auto-fixed.*\n\n[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fawesome.re)\n[![PRs Welcome](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen.svg?style=flat-square)](.\u002FCONTRIBUTING.md)\n[![License: CC0-1.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-CC0--1.0-blue.svg?style=flat-square)](.\u002FLICENSE)\n\n\u003Csub>by **Boring Dystopia Development**\u003C\u002Fsub>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fboringdystopia.ai\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fboringdystopia.ai-111111?style=for-the-badge&logo=vercel&logoColor=white\" alt=\"boringdystopia.ai\" \u002F>\n  \u003C\u002Fa>&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002Falvinunreal\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-@alvinunreal-000000?style=for-the-badge&logo=x&logoColor=white\" alt=\"X @alvinunreal\" \u002F>\n  \u003C\u002Fa>&nbsp;\n  \u003Ca href=\"https:\u002F\u002Ft.me\u002Fboringdystopiadevelopment\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTelegram-Join%20channel-2CA5E0?style=for-the-badge&logo=telegram&logoColor=white\" alt=\"Telegram Join channel\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n---\n\u003Cdiv align=\"center\">\n\n**[ 🌱 Emerging ](.\u002FEMERGING.md)** • **[ Explore the List ](#-contents)** • **[ Submission Guidelines ](#contributing)** • **[ License ](#license)**\n\n\u003C\u002Fdiv>\n\n## 📋 Contents\n\n- [🧬 1. Core Frameworks & Libraries](#-1-core-frameworks--libraries)\n- [🧠 2. Open Foundation Models](#-2-open-foundation-models)\n- [⚡ 3. Inference Engines & Serving](#-3-inference-engines--serving)\n- [🤖 4. Agentic AI & Multi-Agent Systems](#-4-agentic-ai--multi-agent-systems)\n- [🔍 5. Retrieval-Augmented Generation (RAG) & Knowledge](#-5-retrieval-augmented-generation-rag--knowledge)\n- [🎨 6. Generative Media Tools](#-6-generative-media-tools)\n- [🛠️ 7. Training & Fine-tuning Ecosystem](#section-7)\n- [📊 8. MLOps \u002F LLMOps & Production](#-8-mlops--llmops--production)\n- [📈 9. Evaluation, Benchmarks & Datasets](#-9-evaluation-benchmarks--datasets)\n- [🛡️ 10. AI Safety, Alignment & Interpretability](#section-10)\n- [🧩 11. Specialized Domains](#-11-specialized-domains)\n- [🖥️ 12. User Interfaces & Self-hosted Platforms](#section-12)\n- [🧪 13. Developer Tools & Integrations](#-13-developer-tools--integrations)\n- [📚 14. Resources & Learning](#-14-resources--learning)\n\n---\n\n### 🧬 1. Core Frameworks & Libraries\n\n> Core libraries and frameworks used to build, train, and run AI and machine learning systems.\n\n#### Deep Learning Frameworks\n\n- **[PyTorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpytorch\u002Fpytorch?style=social) - Dynamic computation graphs, Pythonic API, dominant in research and production. The current standard for most frontier AI work.\n- **[TensorFlow](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftensorflow)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftensorflow\u002Ftensorflow?style=social) - End-to-end platform with excellent production deployment, TPU support, and large-scale serving tools.\n- **[JAX](https:\u002F\u002Fgithub.com\u002Fjax-ml\u002Fjax)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fjax-ml\u002Fjax?style=social) + **[Flax](https:\u002F\u002Fgithub.com\u002Fgoogle\u002Fflax)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgoogle\u002Fflax?style=social) - High-performance numerical computing with composable transformations (JIT, vmap, grad). Rising favorite for research and scientific ML.\n- **[dm-haiku](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Fdm-haiku)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgoogle-deepmind\u002Fdm-haiku?style=social) - JAX-based neural network library from Google DeepMind. Elegant functional API with state management, widely used in DeepMind's research. Apache 2.0 licensed.\n- **[Equinox](https:\u002F\u002Fgithub.com\u002Fpatrick-kidger\u002Fequinox)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpatrick-kidger\u002Fequinox?style=social) - Elegant easy-to-use neural networks and scientific computing in JAX. Callable PyTrees with filtered transformations, seamless interoperability with the JAX ecosystem. Apache 2.0 licensed.\n- **[Diffrax](https:\u002F\u002Fgithub.com\u002Fpatrick-kidger\u002Fdiffrax)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpatrick-kidger\u002Fdiffrax?style=social) - Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable ODE\u002FSDE\u002FCDE solvers for scientific machine learning and neural differential equations. Apache 2.0 licensed.\n- **[vit-pytorch](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fvit-pytorch)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Flucidrains\u002Fvit-pytorch?style=social) - Comprehensive Vision Transformer (ViT) implementations in PyTorch. Reference implementations of all major vision transformer variants including ViT, DeiT, Swin, and more. MIT licensed.\n- **[NumPyro](https:\u002F\u002Fgithub.com\u002Fpyro-ppl\u002Fnumpyro)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpyro-ppl\u002Fnumpyro?style=social) - Probabilistic programming with NumPy powered by JAX for autograd and JIT compilation. Bayesian modeling and inference at scale.\n- **[Keras](https:\u002F\u002Fgithub.com\u002Fkeras-team\u002Fkeras)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fkeras-team\u002Fkeras?style=social) - High-level, beginner-friendly API that now runs on multiple backends (TensorFlow, JAX, PyTorch). Perfect for rapid experimentation.\n- **[tinygrad](https:\u002F\u002Fgithub.com\u002Ftinygrad\u002Ftinygrad)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftinygrad\u002Ftinygrad?style=social) - Minimalist deep learning framework with tiny code footprint. The \"you like pytorch? you like micrograd? you love tinygrad!\" philosophy - simple yet powerful.\n- **[PaddlePaddle](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddle)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FPaddlePaddle\u002FPaddle?style=social) - Industrial deep learning platform from Baidu serving 23+ million developers and 760,000+ companies. China's first independent R&D framework with advanced distributed training and deployment capabilities.\n- **[PyTorch Geometric](https:\u002F\u002Fgithub.com\u002Fpyg-team\u002Fpytorch_geometric)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpyg-team\u002Fpytorch_geometric?style=social) - Library for deep learning on irregular input data such as graphs, point clouds, and manifolds. Part of the PyTorch ecosystem.\n- **[timm (PyTorch Image Models)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpytorch-image-models)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhuggingface\u002Fpytorch-image-models?style=social) - The largest collection of PyTorch image encoders and backbones. 900+ pretrained models including ResNet, EfficientNet, Vision Transformer, ConvNeXt, and more with training and inference scripts. Apache 2.0 licensed.\n- **[Triton](https:\u002F\u002Fgithub.com\u002Ftriton-lang\u002Ftriton)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftriton-lang\u002Ftriton?style=social) - Language and compiler for writing highly efficient custom deep-learning primitives. Powers kernel optimizations in PyTorch, JAX, and other frameworks. MIT licensed.\n- **[GGML](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fggml)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fggml-org\u002Fggml?style=social) - Tensor library for machine learning. The foundational C\u002FC++ library powering llama.cpp and many on-device inference engines. MIT licensed.\n- **[MLX](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fml-explore\u002Fmlx?style=social) - Array framework for machine learning on Apple silicon. Efficient unified memory design with NumPy-like API, automatic differentiation, and multi-device support. MIT licensed.\n\n#### High-Performance Compute Libraries\n\n- **[oneDNN](https:\u002F\u002Fgithub.com\u002Fuxlfoundation\u002FoneDNN)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fuxlfoundation\u002FoneDNN?style=social) - oneAPI Deep Neural Network Library. Cross-platform performance library of basic building blocks for deep learning, optimized for Intel CPUs, GPUs, and Arm architectures. Apache 2.0 licensed.\n- **[ONNX](https:\u002F\u002Fgithub.com\u002Fonnx\u002Fonnx)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fonnx\u002Fonnx?style=social) - Open standard for machine learning interoperability. Open Neural Network Exchange provides an open ecosystem that empowers AI developers to choose the right tools as their project evolves. Apache 2.0 licensed.\n- **[IREE](https:\u002F\u002Fgithub.com\u002Firee-org\u002Firee)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Firee-org\u002Firee?style=social) - Retargetable MLIR-based machine learning compiler and runtime toolkit. Lowers ML models to unified IR that scales from datacenter to mobile and edge deployments. Apache 2.0 licensed.\n\n#### Rust ML Frameworks\n\n- **[Burn](https:\u002F\u002Fgithub.com\u002Ftracel-ai\u002Fburn)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftracel-ai\u002Fburn?style=social) - Next-generation deep learning framework in Rust. Backend-agnostic with CPU, GPU, WebAssembly support.\n- **[Candle (Hugging Face)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fcandle)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhuggingface\u002Fcandle?style=social) - Minimalist ML framework for Rust. PyTorch-like API with focus on performance and simplicity.\n- **[linfa](https:\u002F\u002Fgithub.com\u002Frust-ml\u002Flinfa)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Frust-ml\u002Flinfa?style=social) - Comprehensive Rust ML toolkit with classical algorithms. scikit-learn equivalent for Rust with clustering, regression, and preprocessing.\n\n#### Julia ML Frameworks\n\n- **[Flux.jl](https:\u002F\u002Fgithub.com\u002FFluxML\u002FFlux.jl)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FFluxML\u002FFlux.jl?style=social) - 100% pure-Julia ML stack with lightweight abstractions on top of native GPU and AD support. Elegant, hackable, and fully integrated with Julia's scientific computing ecosystem.\n- **[MLJ.jl](https:\u002F\u002Fgithub.com\u002FJuliaAI\u002FMLJ.jl)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FJuliaAI\u002FMLJ.jl?style=social) - Comprehensive Julia machine learning framework providing a unified interface to 200+ models with meta-algorithms for selection, tuning, and evaluation. MIT licensed.\n- **[ModelingToolkit.jl](https:\u002F\u002Fgithub.com\u002FSciML\u002FModelingToolkit.jl)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FSciML\u002FModelingToolkit.jl?style=social) - High-performance symbolic-numeric modeling framework for scientific machine learning. Automatically generates fast functions for model components like Jacobians and Hessians with automatic sparsification and parallelization. MIT licensed.\n\n#### NLP & Transformers\n\n- **[spaCy (Explosion AI)](https:\u002F\u002Fgithub.com\u002Fexplosion\u002FspaCy)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fexplosion\u002FspaCy?style=social) - Industrial-strength natural language processing with 75+ languages, transformer pipelines, and production-grade NER, parsing, and text classification.\n- **[Transformers (Hugging Face)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhuggingface\u002Ftransformers?style=social) - The de facto standard library for pretrained NLP models. 1M+ models, 250,000+ downloads\u002Fday. BERT, GPT, Llama, Qwen, and hundreds more.\n- **[sentence-transformers](https:\u002F\u002Fgithub.com\u002FUKPLab\u002Fsentence-transformers)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FUKPLab\u002Fsentence-transformers?style=social) - Classic library for sentence and image embeddings.\n- **[tokenizers (Hugging Face)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftokenizers)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhuggingface\u002Ftokenizers?style=social) - Fast state-of-the-art tokenizers for training and inference.\n- **[fairseq2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Ffairseq2)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ffacebookresearch\u002Ffairseq2?style=social) - FAIR Sequence Modeling Toolkit 2. Complete rewrite of fairseq with modern PyTorch APIs, native support for LLM training (70B+ models), vLLM integration, and first-party recipes for instruction finetuning and preference optimization. MIT licensed.\n\n#### Data Processing & Manipulation\n\n- **[Pandas](https:\u002F\u002Fgithub.com\u002Fpandas-dev\u002Fpandas)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpandas-dev\u002Fpandas?style=social) - The gold standard for data analysis and manipulation in Python.\n- **[Polars](https:\u002F\u002Fgithub.com\u002Fpola-rs\u002Fpolars)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpola-rs\u002Fpolars?style=social) - Blazing-fast DataFrame library (Rust backend) - modern alternative to pandas for large-scale workloads.\n- **[cuDF](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcudf)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Frapidsai\u002Fcudf?style=social) - GPU DataFrame library from RAPIDS. Accelerates pandas workflows on NVIDIA GPUs with zero code changes using cuDF.pandas accelerator mode.\n- **[Modin](https:\u002F\u002Fgithub.com\u002Fmodin-project\u002Fmodin)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmodin-project\u002Fmodin?style=social) - Parallel pandas DataFrames. Scale pandas workflows by changing a single line of code - distributes data and computation automatically.\n- **[Dask](https:\u002F\u002Fgithub.com\u002Fdask\u002Fdask)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdask\u002Fdask?style=social) - Parallel computing for big data - scales pandas\u002FNumPy\u002Fscikit-learn to clusters.\n- **[NumPy](https:\u002F\u002Fgithub.com\u002Fnumpy\u002Fnumpy)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fnumpy\u002Fnumpy?style=social) - Fundamental array computing library that powers almost every AI stack.\n- **[SciPy](https:\u002F\u002Fgithub.com\u002Fscipy\u002Fscipy)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fscipy\u002Fscipy?style=social) - Scientific computing algorithms (optimization, linear algebra, statistics, signal processing).\n- **[NetworkX](https:\u002F\u002Fgithub.com\u002Fnetworkx\u002Fnetworkx)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fnetworkx\u002Fnetworkx?style=social) - Creation, manipulation, and study of complex networks. The foundational graph analysis library for Python data science.\n- **[cuGraph](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcugraph)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Frapidsai\u002Fcugraph?style=social) - GPU graph analytics library with NetworkX-compatible API. 10-100x faster than CPU for large-scale graph algorithms. Apache 2.0 licensed.\n- **[Vaex](https:\u002F\u002Fgithub.com\u002Fvaexio\u002Fvaex)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fvaexio\u002Fvaex?style=social) - Out-of-Core hybrid Apache Arrow\u002FNumPy DataFrame for Python. Visualize and explore billion-row datasets at millions of rows per second. MIT licensed.\n- **[Datashader](https:\u002F\u002Fgithub.com\u002Fholoviz\u002Fdatashader)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fholoviz\u002Fdatashader?style=social) - High-performance large data visualization. Renders billions of points interactively without aggregation artifacts. BSD-3-Clause licensed.\n- **[Zarr](https:\u002F\u002Fgithub.com\u002Fzarr-developers\u002Fzarr-python)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fzarr-developers\u002Fzarr-python?style=social) - Chunked, compressed, N-dimensional array storage. Scalable tensor data format optimized for cloud and parallel computing. MIT licensed.\n- **[NVIDIA DALI](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FDALI)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNVIDIA\u002FDALI?style=social) - GPU-accelerated data loading and augmentation library with highly optimized building blocks for deep learning applications. Apache 2.0 licensed.\n- **[Narwhals](https:\u002F\u002Fgithub.com\u002Fnarwhals-dev\u002Fnarwhals)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fnarwhals-dev\u002Fnarwhals?style=social) - Lightweight compatibility layer between DataFrame libraries. Write Polars-like code that works seamlessly across pandas, Polars, cuDF, Modin, and more. MIT licensed.\n- **[Ibis](https:\u002F\u002Fgithub.com\u002Fibis-project\u002Fibis)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fibis-project\u002Fibis?style=social) - Portable Python dataframe library with 20+ backends. Write pandas-like code that runs locally with DuckDB or scales to production databases (BigQuery, Snowflake, PostgreSQL) by changing one line. Apache 2.0 licensed.\n- **[skrub](https:\u002F\u002Fgithub.com\u002Fskrub-data\u002Fskrub)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fskrub-data\u002Fskrub?style=social) - Machine learning with dataframes for dirty categorical data. Preprocessing and feature engineering for heterogeneous data with seamless pandas\u002FPolars integration. BSD-3-Clause licensed.\n- **[Oxen](https:\u002F\u002Fgithub.com\u002FOxen-AI\u002FOxen)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FOxen-AI\u002FOxen?style=social) - Lightning fast data version control for machine learning. Optimized for large datasets with efficient diffing, branching, and collaboration. Apache 2.0 licensed.\n- **[Pandera](https:\u002F\u002Fgithub.com\u002Funionai-oss\u002Fpandera)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Funionai-oss\u002Fpandera?style=social) - Statistical data testing and validation for dataframes. Pydantic-like API for pandas, Polars, and other dataframe libraries with type hints and lazy validation. MIT licensed.\n- **[Snorkel](https:\u002F\u002Fgithub.com\u002Fsnorkel-team\u002Fsnorkel)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsnorkel-team\u002Fsnorkel?style=social) - System for quickly generating training data with weak supervision. Programmatically label, build, and manage training data using labeling functions and probabilistic consensus models. Powers Snorkel Flow and used by Google, Apple, and Intel. Apache 2.0 licensed.\n- **[DuckDB](https:\u002F\u002Fgithub.com\u002Fduckdb\u002Fduckdb)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fduckdb\u002Fduckdb?style=social) - High-performance analytical in-process SQL database system. Fast, reliable, portable, and easy to use with rich SQL dialect support. Perfect for data processing and analytics workloads. MIT licensed.\n- **[FiftyOne](https:\u002F\u002Fgithub.com\u002Fvoxel51\u002Ffiftyone)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fvoxel51\u002Ffiftyone?style=social) - Visual AI development toolkit for visualizing, labeling, and evaluating visual datasets and models. Supercharges computer vision workflows with dataset exploration and model analysis. Apache 2.0 licensed.\n- **[Label Studio](https:\u002F\u002Fgithub.com\u002FHumanSignal\u002Flabel-studio)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FHumanSignal\u002Flabel-studio?style=social) - Multi-type data labeling and annotation tool with standardized output format. Configurable interface for images, text, audio, video, and time series with ML-assisted labeling. Apache 2.0 licensed.\n- **[Delta Lake](https:\u002F\u002Fgithub.com\u002Fdelta-io\u002Fdelta)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdelta-io\u002Fdelta?style=social) - Open-source storage framework enabling Lakehouse architecture with ACID transactions, scalable metadata handling, and unified batch\u002Fstreaming processing. Apache 2.0 licensed.\n- **[Apache Iceberg](https:\u002F\u002Fgithub.com\u002Fapache\u002Ficeberg)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fapache\u002Ficeberg?style=social) - High-performance open table format for huge analytic tables. Brings SQL table reliability to big data with time travel, hidden partitioning, and schema evolution. Works with Spark, Trino, Flink, Presto, Hive and Impala. Apache 2.0 licensed.\n- **[Apache Hudi](https:\u002F\u002Fgithub.com\u002Fapache\u002Fhudi)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fapache\u002Fhudi?style=social) - Open data lakehouse platform for ingesting, indexing, storing, serving, transforming and managing data across cloud environments. Supports upserts, deletes and incremental processing on big data with built-in ingestion tools for Spark and Flink. Apache 2.0 licensed.\n- **[lakeFS](https:\u002F\u002Fgithub.com\u002Ftreeverse\u002FlakeFS)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftreeverse\u002FlakeFS?style=social) - Data version control for your data lake that transforms object storage into Git-like repositories. Enables atomic, versioned data lake operations with branching, committing, and merging for data pipelines. Apache 2.0 licensed.\n- **[Apache Airflow](https:\u002F\u002Fgithub.com\u002Fapache\u002Fairflow)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fapache\u002Fairflow?style=social) - Platform to programmatically author, schedule, and monitor workflows. Industry-standard orchestration for data pipelines and ML workflows with 500+ integrations. Apache 2.0 licensed.\n- **[Apache Spark](https:\u002F\u002Fgithub.com\u002Fapache\u002Fspark)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fapache\u002Fspark?style=social) - Unified analytics engine for large-scale data processing. In-memory cluster computing with high-level APIs in Python, Scala, Java, and R. Powers MLlib for distributed machine learning and Structured Streaming for real-time data. Apache 2.0 licensed.\n- **[Apache Flink](https:\u002F\u002Fgithub.com\u002Fapache\u002Fflink)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fapache\u002Fflink?style=social) - Stream processing framework with powerful batch and streaming capabilities. High-throughput, low-latency runtime with exactly-once processing guarantees. Ideal for real-time AI inference pipelines and event-driven ML applications. Apache 2.0 licensed.\n- **[Apache Beam](https:\u002F\u002Fgithub.com\u002Fapache\u002Fbeam)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fapache\u002Fbeam?style=social) - Unified programming model for batch and streaming data processing. Write pipelines once, run anywhere on Flink, Spark, or Google Cloud Dataflow. Portable, extensible, and enterprise-ready for AI data pipelines. Apache 2.0 licensed.\n- **[Scrapy](https:\u002F\u002Fgithub.com\u002Fscrapy\u002Fscrapy)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fscrapy\u002Fscrapy?style=social) - Fast, high-level web crawling and scraping framework for Python. Extract structured data from websites at scale with built-in support for handling common challenges like pagination, cookies, and concurrent requests. BSD-3-Clause licensed.\n- **[Temporal](https:\u002F\u002Fgithub.com\u002Ftemporalio\u002Ftemporal)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftemporalio\u002Ftemporal?style=social) - Durable execution platform for reliable workflow orchestration. Build resilient data pipelines and ML workflows that survive failures and continue execution exactly where they left off. MIT licensed.\n- **[Luigi](https:\u002F\u002Fgithub.com\u002Fspotify\u002Fluigi)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fspotify\u002Fluigi?style=social) - Python module for building complex pipelines of batch jobs. Handles dependency resolution, workflow management, visualization, and Hadoop integration. Built at Spotify and battle-tested in production. Apache 2.0 licensed.\n- **[Mage.ai](https:\u002F\u002Fgithub.com\u002Fmage-ai\u002Fmage-ai)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmage-ai\u002Fmage-ai?style=social) - Modern open-source data pipeline tool for integrating and transforming data. AI-native ETL\u002FELT platform with 100+ integrations, real-time monitoring, and collaborative features. Apache 2.0 licensed.\n- **[Hamilton](https:\u002F\u002Fgithub.com\u002Fapache\u002Fhamilton)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fapache\u002Fhamilton?style=social) - Declarative dataflow framework for building testable, modular, self-documenting data pipelines. Encode lineage and metadata directly in Python functions. Originally from Stitch Fix, now Apache incubating. Apache 2.0 licensed.\n- **[D-Tale](https:\u002F\u002Fgithub.com\u002Fman-group\u002Fdtale)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fman-group\u002Fdtale?style=social) - Visualizer for pandas data structures with a Flask back-end and React front-end. Interactive data exploration with charting, filtering, and code export. LGPL-2.1 licensed.\n- **[Sweetviz](https:\u002F\u002Fgithub.com\u002Ffbdesignpro\u002Fsweetviz)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ffbdesignpro\u002Fsweetviz?style=social) - Beautiful, high-density visualizations for exploratory data analysis in two lines of code. Self-contained HTML reports for dataset comparison and target analysis. MIT licensed.\n- **[TextAttack](https:\u002F\u002Fgithub.com\u002FQData\u002FTextAttack)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FQData\u002FTextAttack?style=social) - Python framework for adversarial attacks, data augmentation, and model training in NLP. Augment datasets to increase model robustness and generate adversarial examples. MIT licensed.\n\n#### Classical ML & Gradient Boosting\n\n- **[scikit-learn](https:\u002F\u002Fgithub.com\u002Fscikit-learn\u002Fscikit-learn)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fscikit-learn\u002Fscikit-learn?style=social) - Industry-standard library for traditional machine learning (classification, regression, clustering, pipelines).\n- **[XGBoost](https:\u002F\u002Fgithub.com\u002Fdmlc\u002Fxgboost)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdmlc\u002Fxgboost?style=social) - Scalable, high-performance gradient boosting library. Still dominates Kaggle and tabular competitions.\n- **[LightGBM](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FLightGBM)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmicrosoft\u002FLightGBM?style=social) - Microsoft's ultra-fast gradient boosting framework, optimized for speed and memory.\n- **[CatBoost](https:\u002F\u002Fgithub.com\u002Fcatboost\u002Fcatboost)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fcatboost\u002Fcatboost?style=social) - Gradient boosting that handles categorical features natively with great out-of-the-box performance.\n- **[sktime](https:\u002F\u002Fgithub.com\u002Fsktime\u002Fsktime)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsktime\u002Fsktime?style=social) - Unified framework for machine learning with time series. Scikit-learn compatible API for forecasting, classification, clustering, and anomaly detection.\n- **[StatsForecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fstatsforecast)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNixtla\u002Fstatsforecast?style=social) - Lightning-fast statistical forecasting with ARIMA, ETS, CES, and Theta models. Optimized for high-performance time series workloads.\n- **[MLForecast](https:\u002F\u002Fgithub.com\u002FNixtla\u002Fmlforecast)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNixtla\u002Fmlforecast?style=social) - Scalable machine learning for time series forecasting. Train any sklearn-compatible model on millions of time series with efficient feature engineering. Apache 2.0 licensed.\n- **[cuML](https:\u002F\u002Fgithub.com\u002Frapidsai\u002Fcuml)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Frapidsai\u002Fcuml?style=social) - GPU-accelerated machine learning algorithms with scikit-learn compatible API. 10-50x faster than CPU implementations for large datasets. Apache 2.0 licensed.\n- **[SynapseML](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FSynapseML)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmicrosoft\u002FSynapseML?style=social) - Distributed machine learning on Apache Spark. Scalable, composable APIs for text analytics, vision, anomaly detection with seamless Python\u002FScala\u002FR\u002F.NET integration. MIT licensed.\n- **[Darts](https:\u002F\u002Fgithub.com\u002Funit8co\u002Fdarts)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Funit8co\u002Fdarts?style=social) - User-friendly forecasting and anomaly detection for time series. Unifies classical statistical models (ARIMA, ETS) with modern neural networks (N-BEATS, TFT, DeepAR) in a single scikit-learn compatible API. Apache 2.0 licensed.\n- **[PyTorch Forecasting](https:\u002F\u002Fgithub.com\u002Fsktime\u002Fpytorch-forecasting)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsktime\u002Fpytorch-forecasting?style=social) - Time series forecasting with PyTorch. Multiple neural architectures (N-BEATS, TFT, DeepAR) with in-built interpretation capabilities, built on PyTorch Lightning for distributed training. MIT licensed.\n\n#### Data Engineering & Feature Stores\n\n- **[DataHub](https:\u002F\u002Fgithub.com\u002Fdatahub-project\u002Fdatahub)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdatahub-project\u002Fdatahub?style=social) - The #1 open-source metadata platform for data and AI. Data discovery, governance, and observability with 80+ connectors, column-level lineage, and AI assistant integration. Originally built at LinkedIn. Apache 2.0 licensed.\n- **[OpenMetadata](https:\u002F\u002Fgithub.com\u002Fopen-metadata\u002FOpenMetadata)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopen-metadata\u002FOpenMetadata?style=social) - Unified metadata platform for data discovery, observability, and governance. Column-level lineage, semantic search, and team collaboration with 70+ data service connectors. Apache 2.0 licensed.\n- **[Amundsen](https:\u002F\u002Fgithub.com\u002Famundsen-io\u002Famundsen)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Famundsen-io\u002Famundsen?style=social) - Data discovery and metadata engine from Lyft. PageRank-style search for data resources with usage-based ranking. LF AI & Data Foundation project. Apache 2.0 licensed.\n\n#### Data Transformation & Analytics Engineering\n\n- **[dbt-core](https:\u002F\u002Fgithub.com\u002Fdbt-labs\u002Fdbt-core)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdbt-labs\u002Fdbt-core?style=social) - Transform data using software engineering best practices. The industry-standard framework for analytics engineering with 15M+ monthly downloads. Enables version control, testing, and documentation for SQL transformations. Apache 2.0 licensed.\n- **[SQLMesh](https:\u002F\u002Fgithub.com\u002FTobikoData\u002Fsqlmesh)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTobikoData\u002Fsqlmesh?style=social) - Scalable and efficient data transformation framework with dbt compatibility. Features automatic data lineage, time travel, and virtual data environments for testing. Optimized for large-scale data warehouses. Apache 2.0 licensed.\n\n#### Data Quality & Validation\n\n- **[Deequ](https:\u002F\u002Fgithub.com\u002Fawslabs\u002Fdeequ)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fawslabs\u002Fdeequ?style=social) - Library built on top of Apache Spark for defining \"unit tests for data\". Measures data quality in large datasets with constraint verification, anomaly detection, and incremental validation. Used at Amazon for production data quality. Apache 2.0 licensed.\n- **[Great Expectations](https:\u002F\u002Fgithub.com\u002Fgreat-expectations\u002Fgreat_expectations)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgreat-expectations\u002Fgreat_expectations?style=social) - Always know what to expect from your data. Data validation, profiling, and documentation for data pipelines. Apache 2.0 licensed.\n- **[ydata-profiling](https:\u002F\u002Fgithub.com\u002Fydataai\u002Fydata-profiling)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fydataai\u002Fydata-profiling?style=social) - One line of code for comprehensive data quality profiling and exploratory data analysis. Generates detailed reports for Pandas and Spark DataFrames including statistics, correlations, missing values, and data quality alerts. MIT licensed.\n- **[Soda Core](https:\u002F\u002Fgithub.com\u002Fsodadata\u002Fsoda-core)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsodadata\u002Fsoda-core?style=social) - Data contracts engine for the modern data stack. Define data quality checks in YAML and automatically validate schema and data across your pipelines. Supports 20+ data sources including Snowflake, BigQuery, and PostgreSQL. Apache 2.0 licensed.\n- **[TFX (TensorFlow Extended)](https:\u002F\u002Fgithub.com\u002Ftensorflow\u002Ftfx)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftensorflow\u002Ftfx?style=social) - End-to-end platform for deploying production ML pipelines. Data validation, transformation, model training, and serving with TensorFlow. Powers Google's production ML infrastructure. Apache 2.0 licensed.\n\n#### Data Labeling & Annotation\n\n- **[Label Studio](https:\u002F\u002Fgithub.com\u002FHumanSignal\u002Flabel-studio)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FHumanSignal\u002Flabel-studio?style=social) - Multi-type data labeling and annotation platform for computer vision, NLP, and audio. Supports image classification, object detection, named entity recognition, and more with customizable interfaces. Apache 2.0 licensed.\n- **[FiftyOne](https:\u002F\u002Fgithub.com\u002Fvoxel51\u002Ffiftyone)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fvoxel51\u002Ffiftyone?style=social) - Open-source dataset curation and model analysis tool for computer vision. Visualize, explore, and improve image and video datasets with tight integration to annotation tools. Apache 2.0 licensed.\n- **[Doccano](https:\u002F\u002Fgithub.com\u002Fdoccano\u002Fdoccano)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdoccano\u002Fdoccano?style=social) - Open-source text annotation tool for machine learning practitioners. Features text classification, sequence labeling, and sequence-to-sequence tasks for sentiment analysis, NER, and summarization. MIT licensed.\n- **[Snorkel](https:\u002F\u002Fgithub.com\u002Fsnorkel-team\u002Fsnorkel)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsnorkel-team\u002Fsnorkel?style=social) - System for quickly generating training data with weak supervision. Programmatically label data using labeling functions instead of manual annotation. Apache 2.0 licensed.\n- **[OpenRefine](https:\u002F\u002Fgithub.com\u002FOpenRefine\u002FOpenRefine)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FOpenRefine\u002FOpenRefine?style=social) - Free, open-source power tool for working with messy data. Clean, transform, and extend data with web services. Formerly Google Refine. BSD-3-Clause licensed.\n\n#### AutoML & Hyperparameter Optimization\n\n- **[Optuna](https:\u002F\u002Fgithub.com\u002Foptuna\u002Foptuna)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Foptuna\u002Foptuna?style=social) - Modern, define-by-run hyperparameter optimization with pruning and visualizations. Extremely popular in 2026.\n- **[AutoGluon](https:\u002F\u002Fgithub.com\u002Fautogluon\u002Fautogluon)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fautogluon\u002Fautogluon?style=social) - AWS AutoML toolkit for tabular, image, text, and multimodal data - state-of-the-art with almost zero code.\n- **[FLAML](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FFLAML)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmicrosoft\u002FFLAML?style=social) - Microsoft's fast & lightweight AutoML focused on efficiency and low compute.\n- **[Katib (Kubeflow)](https:\u002F\u002Fgithub.com\u002Fkubeflow\u002Fkatib)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fkubeflow\u002Fkatib?style=social) - Kubernetes-native AutoML for hyperparameter tuning, early stopping, and neural architecture search. Framework-agnostic with support for TensorFlow, PyTorch, XGBoost, and custom training operators. Apache 2.0 licensed.\n- **[AutoKeras](https:\u002F\u002Fgithub.com\u002Fkeras-team\u002Fautokeras)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fkeras-team\u002Fautokeras?style=social) - Neural architecture search on top of Keras.\n\n#### Interactive ML Apps & Notebooks\n\n- **[Streamlit](https:\u002F\u002Fgithub.com\u002Fstreamlit\u002Fstreamlit)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fstreamlit\u002Fstreamlit?style=social) - The fastest way to build and share data apps. Transform Python scripts into beautiful web applications with minimal code. Widely used for ML model demos, data visualization, and internal tools.\n- **[Gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgradio-app\u002Fgradio?style=social) - Build and share delightful machine learning apps, all in Python. The de facto standard for creating interactive ML demos with automatic UI generation from function signatures. Powers thousands of Hugging Face Spaces.\n- **[Marimo](https:\u002F\u002Fgithub.com\u002Fmarimo-team\u002Fmarimo)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmarimo-team\u002Fmarimo?style=social) - A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.\n\n#### Model Training & Optimization Utilities\n\n- **[Hugging Face Accelerate](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Faccelerate)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhuggingface\u002Faccelerate?style=social) - Simple API to make training scripts run on any hardware (multi-GPU, TPU, mixed precision) with minimal code changes.\n- **[DeepSpeed](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FDeepSpeed)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmicrosoft\u002FDeepSpeed?style=social) - Microsoft's deep learning optimization library for extreme-scale training (ZeRO, offloading, MoE).\n- **[Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhuggingface\u002Ftransformers?style=social) - Library of pretrained transformer models and utilities for text, vision, audio, and multimodal training and inference.\n- **[FlashAttention](https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FDao-AILab\u002Fflash-attention?style=social) - Fast exact attention kernels that reduce memory usage and accelerate transformer training and inference.\n- **[xFormers](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fxformers)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ffacebookresearch\u002Fxformers?style=social) - Optimized transformer building blocks and attention operators for PyTorch.\n- **[PyTorch Lightning](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Flightning)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FLightning-AI\u002Flightning?style=social) - High-level wrapper for PyTorch that removes boilerplate and adds best practices.\n- **[fastai](https:\u002F\u002Fgithub.com\u002Ffastai\u002Ffastai)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ffastai\u002Ffastai?style=social) - Deep learning library providing practitioners with high-level components for state-of-the-art results. Built on PyTorch with a focus on usability and transfer learning. Apache 2.0 licensed.\n- **[PyTorch Ignite](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fignite)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpytorch\u002Fignite?style=social) - High-level library for training and evaluating neural networks in PyTorch with an engine, events & handlers system for maximum flexibility. BSD-3-Clause licensed.\n- **[ONNX Runtime](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002Fonnxruntime)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmicrosoft\u002Fonnxruntime?style=social) - High-performance inference and training for ONNX models across hardware.\n- **[einops](https:\u002F\u002Fgithub.com\u002Farogozhnikov\u002Feinops)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Farogozhnikov\u002Feinops?style=social) - Flexible, powerful tensor operations for readable and reliable code. Supports PyTorch, JAX, TensorFlow, NumPy, MLX.\n- **[safetensors](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fsafetensors)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhuggingface\u002Fsafetensors?style=social) - Simple, safe way to store and distribute tensors. Fast, secure alternative to pickle for model serialization.\n- **[torchmetrics](https:\u002F\u002Fgithub.com\u002FLightning-AI\u002Ftorchmetrics)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FLightning-AI\u002Ftorchmetrics?style=social) - Machine learning metrics for distributed, scalable PyTorch applications. 80+ metrics with built-in distributed synchronization.\n- **[torchao](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fao)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpytorch\u002Fao?style=social) - PyTorch native quantization and sparsity for training and inference. Drop-in optimizations for production deployment.\n- **[SHAP](https:\u002F\u002Fgithub.com\u002Fshap\u002Fshap)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fshap\u002Fshap?style=social) - Game theoretic approach to explain the output of any machine learning model. Industry standard for model interpretability.\n- **[skorch](https:\u002F\u002Fgithub.com\u002Fskorch-dev\u002Fskorch)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fskorch-dev\u002Fskorch?style=social) - Scikit-learn compatible neural network library that wraps PyTorch. Seamlessly integrate PyTorch models with scikit-learn pipelines, grid search, and cross-validation.\n- **[Composer](https:\u002F\u002Fgithub.com\u002Fmosaicml\u002Fcomposer)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmosaicml\u002Fcomposer?style=social) - Supercharge your model training. MosaicML's PyTorch training library with built-in algorithms for efficient training (FSDP, gradient compression, progressive resizing) and seamless distributed training on large-scale clusters. Apache 2.0 licensed.\n- **[NVIDIA Apex](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fapex)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNVIDIA\u002Fapex?style=social) - PyTorch extension for mixed precision training and distributed training optimizations. Powers many production deep learning workloads with tools for automatic mixed precision (AMP), distributed data parallel, and fused optimizers. BSD-3-Clause licensed.\n\n---\n\n### 🧠 2. Open Foundation Models\n\n> Pretrained language, multimodal, speech, and video models with publicly available weights.\n\n#### Large Language Models (Base + Chat)\n\n- **[RWKV-7 \"Goose\" (BlinkDL)](https:\u002F\u002Fgithub.com\u002FBlinkDL\u002FRWKV-LM)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FBlinkDL\u002FRWKV-LM?style=social) - Novel RNN architecture with transformer-level LLM performance. 100% attention-free, linear-time, constant-space (no kv-cache), infinite ctx_len. Linux Foundation AI project with runtime already deployed in Windows & Office.\n- **[Qwen3 (Alibaba)](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FQwenLM\u002FQwen3?style=social) - Flagship dense and MoE models with hybrid thinking modes (32B\u002F235B). Apache 2.0 licensed with 128K context and superior agentic capabilities.\n- **[Qwen3.6 (Alibaba)](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3.6)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FQwenLM\u002FQwen3.6?style=social) - Latest flagship series released April 2026 with 1M context window, agentic coding performance competitive with Claude 4.5 Opus, and enhanced multimodal capabilities.\n\n- **[MiMo-V2-Flash (Xiaomi)](https:\u002F\u002Fgithub.com\u002FXiaomiMiMo\u002FMiMo-V2-Flash)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FXiaomiMiMo\u002FMiMo-V2-Flash?style=social) - 309B MoE model (15B active) with hybrid attention and Multi-Token Prediction for efficient high-speed reasoning. Apache 2.0 licensed.\n- **[Nemotron (NVIDIA)](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FNemotron)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNVIDIA-NeMo\u002FNemotron?style=social) - Open and efficient models for agentic AI with training recipes, deployment guides, and use-case examples. Apache 2.0 licensed.\n- **[Gemma 4 (Google)](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Fgemma)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgoogle-deepmind\u002Fgemma?style=social) - Released April 2026 in four sizes (E2B, E4B, 26B MoE, 31B Dense). First major update in a year with Apache 2.0 license, complex logic, and agentic workflows.\n- **[Kimi K2 (Moonshot AI)](https:\u002F\u002Fgithub.com\u002FMoonshotAI\u002FKimi-K2)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FMoonshotAI\u002FKimi-K2?style=social) - State-of-the-art 1T parameter MoE model with 32B activated parameters and 128K context. Trained with Muon optimizer for exceptional reasoning and coding performance.\n- **[Kimi K2.5 (Moonshot AI)](https:\u002F\u002Fgithub.com\u002FMoonshotAI\u002FKimi-K2.5)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FMoonshotAI\u002FKimi-K2.5?style=social) - Frontier open-weight MoE model with 256K context, strong coding and reasoning performance, and native multimodal + tool-use support for agentic workflows.\n- **[Phi-4 (Microsoft)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FPhiCookBook)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmicrosoft\u002FPhiCookBook?style=social) - Small but highly capable models optimized for reasoning, edge devices, and on-device inference. Includes Phi-4-reasoning variants with thinking capabilities.\n- **[GLM-5 (Zhipu AI)](https:\u002F\u002Fgithub.com\u002Fzai-org\u002FGLM-5)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fzai-org\u002FGLM-5?style=social) - Strong open model line with solid coding, reasoning, and agentic-task performance.\n- **[OLMo 2 (Allen AI)](https:\u002F\u002Fgithub.com\u002Fallenai\u002FOLMo)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fallenai\u002FOLMo?style=social) - Fully open-source LLMs (1B–32B) with complete transparency: models, data, training code, and logs. Designed by scientists, for scientists.\n- **[Llama 4 (Meta)](https:\u002F\u002Fgithub.com\u002Fmeta-llama\u002Fllama-models)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmeta-llama\u002Fllama-models?style=social) - First native multimodal MoE open-source models (Scout: 10M context, Maverick: 400B+ params). Released April 2025 with enterprise-grade capabilities.\n- **[GPT-OSS (OpenAI)](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fgpt-oss)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopenai\u002Fgpt-oss?style=social) - OpenAI's first open-weight models since GPT-2 (120B and 20B MoE). Apache 2.0 licensed with state-of-the-art performance for their size class. Released August 2025.\n- **[Mamba (State Space Models)](https:\u002F\u002Fgithub.com\u002Fstate-spaces\u002Fmamba)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fstate-spaces\u002Fmamba?style=social) - Novel State Space Model architecture with linear-time inference and transformer-level performance. 100% attention-free with constant memory usage, enabling efficient long-sequence modeling. Pretrained models from 130M to 2.8B parameters trained on 300B-600B tokens. Apache 2.0 licensed.\n- **[Pythia (EleutherAI)](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Fpythia)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEleutherAI\u002Fpythia?style=social) - Suite of interpretability-focused LLMs (70M to 12B parameters) with fully open training data, intermediate checkpoints, and analysis tools. Designed for studying learning dynamics and interpretability with public domain training data. Apache 2.0 licensed.\n- **[T5 (Google)](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Ftext-to-text-transfer-transformer)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgoogle-research\u002Ftext-to-text-transfer-transformer?style=social) - Text-to-Text Transfer Transformer that unified NLP tasks under a single encoder-decoder architecture. The foundation for Flan-T5 and many downstream applications. One of the first OSI-validated fully open-source language models with training data and code. Apache 2.0 licensed.\n- **[GPT-NeoX-20B (EleutherAI)](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Fgpt-neox)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEleutherAI\u002Fgpt-neox?style=social) - 20B parameter autoregressive language model trained on the Pile dataset. One of the largest dense open-source models with publicly available weights at release. Complete training codebase with distributed training support. Apache 2.0 licensed.\n\n#### Coding & Reasoning Models\n\n- **[DeepSeek-Coder-V2 \u002F R1-Coder](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-Coder)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdeepseek-ai\u002FDeepSeek-Coder?style=social) - Best-in-class open coding model (236B MoE). Outperforms closed models on many code benchmarks.\n- **[Qwen3-Coder-Next (Alibaba)](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-Coder)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FQwenLM\u002FQwen3-Coder?style=social) - Leading open coding model. Strong Pareto frontier for cost-effective agent deployment.\n\n#### Multimodal Models (Vision + Language)\n\n- **[MMaDA (Gen-Verse)](https:\u002F\u002Fgithub.com\u002FGen-Verse\u002FMMaDA)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGen-Verse\u002FMMaDA?style=social) - Open-sourced multimodal large diffusion language model with unified architecture for text, image generation and multimodal reasoning. MIT licensed, NeurIPS 2025.\n- **[Qwen3-VL (Alibaba)](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-VL)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FQwenLM\u002FQwen3-VL?style=social) - Latest flagship VLM with native 256K context (expandable to 1M), visual agent capabilities, 3D grounding, and superior multimodal reasoning. Major leap over Qwen2.5-VL.\n- **[GLM-4.5V \u002F GLM-4.1V-Thinking (Zhipu AI)](https:\u002F\u002Fgithub.com\u002Fzai-org\u002FGLM-V)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fzai-org\u002FGLM-V?style=social) - Strong multimodal reasoning with scalable reinforcement learning. Compares favorably with Gemini-2.5-Flash on benchmarks.\n- **[MiniCPM-o 2.6](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FMiniCPM-o)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FOpenBMB\u002FMiniCPM-o?style=social) - Gemini 2.5 Flash level MLLM for vision, speech, and full-duplex multimodal live streaming on your phone. Apache 2.0 licensed.\n- **[Gemma 4 (Google)](https:\u002F\u002Fgithub.com\u002Fgoogle-deepmind\u002Fgemma)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgoogle-deepmind\u002Fgemma?style=social) - Multimodal model supporting vision-language input, optimized for efficiency, complex logic, and on-device use.\n- **[Magma (Microsoft)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FMagma)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmicrosoft\u002FMagma?style=social) - Foundation model for multimodal AI agents that perceives the world and takes goal-driven actions across digital and physical environments. CVPR 2025.\n- **[OpenCLIP](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002Fopen_clip)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmlfoundations\u002Fopen_clip?style=social) - Open source implementation of CLIP with trained models and training code. Includes state-of-the-art trained ViT-G\u002F14 models and comprehensive zero-shot evaluation suite.\n- **[Show-o](https:\u002F\u002Fgithub.com\u002Fshowlab\u002FShow-o)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fshowlab\u002FShow-o?style=social) - Unified multimodal model for both multimodal understanding and text-to-image generation with transformative autoregressive modeling. Apache 2.0 licensed.\n- **[Moondream (m87-labs)](https:\u002F\u002Fgithub.com\u002Fm87-labs\u002Fmoondream)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fm87-labs\u002Fmoondream?style=social) - Tiny vision language model (0.5B and 2B parameters) that runs anywhere. Powerful image understanding with remarkably small footprint for edge devices and real-time applications. Apache 2.0 licensed.\n- **[VILA (NVIDIA)](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FVILA)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNVlabs\u002FVILA?style=social) - Family of state-of-the-art vision language models for diverse multimodal AI tasks across edge, data center, and cloud. Features NVILA 8B\u002F15B with efficient training and deployment. Apache 2.0 licensed.\n- **[OmniGen (VectorSpaceLab)](https:\u002F\u002Fgithub.com\u002FVectorSpaceLab\u002FOmniGen)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FVectorSpaceLab\u002FOmniGen?style=social) - Unified image generation model that handles text-to-image, subject-driven generation, identity-preserving generation, and image editing from multi-modal prompts without additional plugins. MIT licensed.\n- **[Skywork-R1V (Skywork AI)](https:\u002F\u002Fgithub.com\u002FSkyworkAI\u002FSkywork-R1V)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FSkyworkAI\u002FSkywork-R1V?style=social) - Advanced multimodal reasoning model specializing in vision-language tasks with chain-of-thought capabilities. State-of-the-art open multimodal reasoning with 76.0 on MMMU benchmark. MIT licensed.\n- **[Depth Anything V2](https:\u002F\u002Fgithub.com\u002FDepthAnything\u002FDepth-Anything-V2)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FDepthAnything\u002FDepth-Anything-V2?style=social) - Foundation model for monocular depth estimation trained on 595K synthetic and 62M+ real images. Provides robust, fine-grained depth estimation for any image. Apache 2.0 licensed.\n\n#### Speech & Audio Models (TTS, STT, Music)\n\n- **[NVIDIA NeMo Speech](https:\u002F\u002Fgithub.com\u002FNVIDIA-NeMo\u002FNeMo)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNVIDIA-NeMo\u002FNeMo?style=social) - Scalable generative AI framework for Speech AI including ASR, TTS, and speech LLMs. Includes state-of-the-art Canary and Parakeet models with 25+ European language support. Apache 2.0 licensed.\n- **[FunASR](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FFunASR)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmodelscope\u002FFunASR?style=social) - Fundamental end-to-end speech recognition toolkit with SOTA pretrained models. Supports ASR, VAD, speaker verification, diarization, and multi-talker ASR. Industrial-grade with 31-language support and real-time transcription services. MIT licensed.\n- **[Whisper (OpenAI → community forks)](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fwhisper)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopenai\u002Fwhisper?style=social) - The gold-standard open speech-to-text model. Massive community fine-tunes available.\n- **[faster-whisper (SYSTRAN)](https:\u002F\u002Fgithub.com\u002FSYSTRAN\u002Ffaster-whisper)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FSYSTRAN\u002Ffaster-whisper?style=social) - Reimplementation of Whisper using CTranslate2 for up to 4x faster inference with same accuracy. Supports batched processing and 8-bit quantization.\n- **[OuteTTS \u002F CosyVoice 2](https:\u002F\u002Fgithub.com\u002Fedwko\u002FOuteTTS)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fedwko\u002FOuteTTS?style=social) - High-quality open TTS with natural prosody and multilingual support.\n- **[Fish Speech \u002F StyleTTS 2](https:\u002F\u002Fgithub.com\u002Ffishaudio\u002Ffish-speech)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ffishaudio\u002Ffish-speech?style=social) - Zero-shot TTS with excellent voice cloning. Extremely popular in 2026.\n- **[MusicGen \u002F AudioCraft (Meta)](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Faudiocraft)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ffacebookresearch\u002Faudiocraft?style=social) - Open music and audio generation models.\n- **[VibeVoice (Microsoft)](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FVibeVoice)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmicrosoft\u002FVibeVoice?style=social) - Open-source frontier voice AI with expressive, longform conversational speech synthesis. 7B parameter TTS with streaming support.\n- **[Qwen3-TTS (Alibaba)](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-TTS)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FQwenLM\u002FQwen3-TTS?style=social) - Open TTS series supporting stable, expressive, and streaming speech generation with free-form voice design and vivid voice cloning. Natural language instruction-driven control over timbre, emotion, and prosody. Apache 2.0 licensed.\n- **[Chatterbox (Resemble AI)](https:\u002F\u002Fgithub.com\u002Fresemble-ai\u002Fchatterbox)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fresemble-ai\u002Fchatterbox?style=social) - State-of-the-art open TTS family with 350M parameter Turbo variant. Single-step generation with native paralinguistic tags for realistic dialogue.\n- **[Dia (Nari Labs)](https:\u002F\u002Fgithub.com\u002Fnari-labs\u002Fdia)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fnari-labs\u002Fdia?style=social) - 1.6B parameter TTS generating ultra-realistic dialogue in one pass with nonverbal communications (laughter, coughing). Emotion and tone control via audio conditioning.\n- **[Voxtral TTS (Mistral)](https:\u002F\u002Fgithub.com\u002Fmistralai\u002Fmistral-inference)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmistralai\u002Fmistral-inference?style=social) - 4B parameter state-of-the-art TTS with zero-shot voice cloning, 9-language support, and ~90ms time-to-first-audio for voice agents.\n- **[Ultravox (Fixie AI)](https:\u002F\u002Fgithub.com\u002Ffixie-ai\u002Fultravox)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ffixie-ai\u002Fultravox?style=social) - Fast multimodal LLM for real-time voice. Production-grade speech-to-text with streaming audio input and low-latency response for conversational AI applications. MIT licensed.\n- **[WhisperSpeech](https:\u002F\u002Fgithub.com\u002FWhisperSpeech\u002FWhisperSpeech)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FWhisperSpeech\u002FWhisperSpeech?style=social) - Open source text-to-speech system built by inverting Whisper. High-quality voice cloning with zero-shot capabilities. MIT licensed.\n- **[VoxCPM](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FVoxCPM)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FOpenBMB\u002FVoxCPM?style=social) - Tokenizer-free diffusion autoregressive TTS with 2B parameters. Supports 30+ languages with automatic detection, creative voice design from text descriptions, and high-fidelity voice cloning. Apache 2.0 licensed.\n- **[F5-TTS](https:\u002F\u002Fgithub.com\u002FSWivid\u002FF5-TTS)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FSWivid\u002FF5-TTS?style=social) - Flow matching-based TTS with fluent and faithful speech synthesis. Zero-shot voice cloning with high naturalness and prosody accuracy. MIT licensed.\n- **[CosyVoice](https:\u002F\u002Fgithub.com\u002FFunAudioLLM\u002FCosyVoice)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FFunAudioLLM\u002FCosyVoice?style=social) - Multi-lingual large voice generation model with full-stack inference, training and deployment capabilities. Supports cross-lingual voice cloning and emotional expression control. Apache 2.0 licensed.\n- **[ChatTTS](https:\u002F\u002Fgithub.com\u002F2noise\u002FChatTTS)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002F2noise\u002FChatTTS?style=social) - Generative speech model optimized for daily dialogue. Natural, expressive conversational speech synthesis with fine-grained prosody control. AGPL-3.0 licensed.\n- **[SpeechBrain](https:\u002F\u002Fgithub.com\u002Fspeechbrain\u002Fspeechbrain)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fspeechbrain\u002Fspeechbrain?style=social) - PyTorch-based speech toolkit for ASR, TTS, speaker recognition, and speech enhancement. Modular, extensible framework with state-of-the-art recipes. Apache 2.0 licensed.\n\n#### Video & Animation Models\n\n- **[Open-Sora (HPC-AI Tech)](https:\u002F\u002Fgithub.com\u002Fhpcaitech\u002FOpen-Sora)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhpcaitech\u002FOpen-Sora?style=social) - Democratizing efficient video production for all. Complete open-source video generation system with 11B model achieving commercial-level quality. Apache 2.0 licensed.\n- **[CogVideoX (Zhipu AI \u002F community)](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCogVideo)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHUDM\u002FCogVideo?style=social) - High-quality open text-to-video model (5B-12B).\n- **[Mochi 1 (Genmo)](https:\u002F\u002Fgithub.com\u002Fgenmoai\u002Fmochi)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgenmoai\u002Fmochi?style=social) - 10B open video model with impressive motion and consistency.\n\n#### Image Generation Models\n\n- **[Stable Diffusion XL](https:\u002F\u002Fgithub.com\u002FStability-AI\u002Fgenerative-models)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FStability-AI\u002Fgenerative-models?style=social) - Next-generation image generation model with significantly improved quality, 1024px native resolution, and better prompt adherence. Foundation for SDXL-based video models. CreativeML Open RAIL++-M licensed.\n- **[OmniGen (VectorSpaceLab)](https:\u002F\u002Fgithub.com\u002FVectorSpaceLab\u002FOmniGen)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FVectorSpaceLab\u002FOmniGen?style=social) - Unified image generation model handling text-to-image, subject-driven generation, identity-preserving generation, and image editing from multi-modal prompts in a single framework. MIT licensed.\n\n#### Additional Vision-Language Models\n\n- **[MiniCPM-V (OpenBMB)](https:\u002F\u002Fgithub.com\u002FOpenBMB\u002FMiniCPM-V)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FOpenBMB\u002FMiniCPM-V?style=social) - GPT-4V level multimodal LLM for single image, multi-image and high-FPS video understanding on edge devices. 8B parameters with superior OCR and reasoning capabilities. Apache 2.0 licensed.\n\n---\n\n### ⚡ 3. Inference Engines & Serving\n\n> Inference runtimes, serving systems, and optimization tools for running models locally or in production.\n\n#### Local \u002F On-device Inference\n\n- **[llama.cpp](https:\u002F\u002Fgithub.com\u002Fggml-org\u002Fllama.cpp)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fggml-org\u002Fllama.cpp?style=social) - Pure C\u002FC++ inference engine with GGUF format support. The gold standard for CPU\u002FGPU\u002FApple Silicon on-device running. Includes llama-server for OpenAI-compatible API. Now at 100K+ stars.\n- **[Ollama](https:\u002F\u002Fgithub.com\u002Follama\u002Follama)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Follama\u002Follama?style=social) - Dead-simple local LLM runner with a one-line install, model registry, and OpenAI-compatible API.\n- **[MLX](https:\u002F\u002Fgithub.com\u002Fml-explore\u002Fmlx)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fml-explore\u002Fmlx?style=social) (Apple) - High-performance array framework + LLM inference optimized for Apple Silicon.\n- **[MLC-LLM](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fmlc-llm)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmlc-ai\u002Fmlc-llm?style=social) - Deployment engine that compiles and runs LLMs across browsers, mobile devices, and local hardware.\n- **[WebLLM](https:\u002F\u002Fgithub.com\u002Fmlc-ai\u002Fweb-llm)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmlc-ai\u002Fweb-llm?style=social) - High-performance in-browser LLM inference engine. Runs models directly in the browser with WebGPU acceleration.\n- **[llama-cpp-python](https:\u002F\u002Fgithub.com\u002Fabetlen\u002Fllama-cpp-python)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fabetlen\u002Fllama-cpp-python?style=social) - Official Python bindings for llama.cpp.\n- **[KoboldCpp](https:\u002F\u002Fgithub.com\u002FLostRuins\u002Fkoboldcpp)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FLostRuins\u002Fkoboldcpp?style=social) - User-friendly llama.cpp fork focused on role-playing and creative writing.\n- **[RamaLama](https:\u002F\u002Fgithub.com\u002Fcontainers\u002Framalama)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fcontainers\u002Framalama?style=social) - Container-centric tool for simplifying local AI model serving. Automatically detects GPUs, pulls optimized container images, and runs models securely in rootless containers with enterprise-grade isolation.\n- **[LiteRT-LM](https:\u002F\u002Fgithub.com\u002Fgoogle-ai-edge\u002FLiteRT-LM)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgoogle-ai-edge\u002FLiteRT-LM?style=social) - Google's production-ready inference framework for deploying LLMs on edge devices. Cross-platform support for Android, iOS, Web, Desktop, and IoT with GPU\u002FNPU acceleration. Powers on-device GenAI in Chrome and Chromebook Plus. Apache 2.0 licensed.\n- **[exo](https:\u002F\u002Fgithub.com\u002Fexo-explore\u002Fexo)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fexo-explore\u002Fexo?style=social) - Run frontier AI locally by connecting all your devices into an AI cluster. Features automatic device discovery, RDMA over Thunderbolt for 99% latency reduction, topology-aware auto parallel, and tensor parallelism. Uses MLX backend for distributed inference across Apple Silicon devices. Apache 2.0 licensed.\n\n#### High-performance Serving & API Servers\n\n- **[llm-d](https:\u002F\u002Fgithub.com\u002Fllm-d\u002Fllm-d)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fllm-d\u002Fllm-d?style=social) - Kubernetes-native distributed LLM inference framework. Donated to CNCF by RedHat, Google, and IBM. Intelligent scheduling, KV-cache optimization, and state-of-the-art performance across accelerators.\n- **[LMDeploy](https:\u002F\u002Fgithub.com\u002FInternLM\u002Flmdeploy)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FInternLM\u002Flmdeploy?style=social) - Toolkit for compressing, deploying, and serving LLMs from OpenMMLab. 4-bit inference with 2.4x higher performance than FP16, distributed multi-model serving across machines.\n- **[vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fvllm-project\u002Fvllm?style=social) - State-of-the-art serving engine with PagedAttention and continuous batching. Currently the fastest production-grade LLM server.\n- **[LMCache](https:\u002F\u002Fgithub.com\u002FLMCache\u002FLMCache)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FLMCache\u002FLMCache?style=social) - Supercharge LLM inference with the fastest KV Cache layer. 3-10x delay savings and GPU cycle reduction for multi-round QA and RAG. Integrates seamlessly with vLLM for distributed, high-throughput deployments. Apache 2.0 licensed.\n- **[vLLM Production Stack](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fproduction-stack)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fvllm-project\u002Fproduction-stack?style=social) - Kubernetes-native production stack for vLLM inference. Automated deployment, autoscaling, and monitoring for enterprise-grade LLM serving. Built by the vLLM team for seamless integration.\n- **[nano-vLLM](https:\u002F\u002Fgithub.com\u002FGeeeekExplorer\u002Fnano-vllm)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGeeeekExplorer\u002Fnano-vllm?style=social) - Minimalist vLLM implementation in ~1,200 lines of Python. Educational yet performant with prefix caching, tensor parallelism, and CUDA graph acceleration. Comparable inference speeds to full vLLM. MIT licensed.\n- **[SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsgl-project\u002Fsglang?style=social) - Next-gen serving framework with RadixAttention. Powers xAI's production workloads at 100K+ GPUs scale.\n- **[TensorRT-LLM](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT-LLM)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNVIDIA\u002FTensorRT-LLM?style=social) - NVIDIA's official high-performance inference backend.\n- **[Aphrodite Engine](https:\u002F\u002Fgithub.com\u002Faphrodite-engine\u002Faphrodite-engine)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Faphrodite-engine\u002Faphrodite-engine?style=social) - vLLM fork optimized for role-play and creative writing. Supports extensive quantization methods (AQLM, AWQ, GPTQ, GGUF, FP8) and modern samplers. Active development with multi-LoRA and speculative decoding support.\n- **[AIBrix](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Faibrix)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fvllm-project\u002Faibrix?style=social) - Cost-efficient and pluggable infrastructure components for GenAI inference. Kubernetes-native control plane for vLLM with distributed KV cache, heterogeneous GPU serving, and intelligent routing. Apache 2.0 licensed.\n- **[Triton Inference Server](https:\u002F\u002Fgithub.com\u002Ftriton-inference-server\u002Fserver)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftriton-inference-server\u002Fserver?style=social) - NVIDIA's production-grade open-source inference serving software. Supports multiple frameworks (TensorRT, PyTorch, ONNX) with optimized cloud and edge deployment.\n- **[mistral.rs](https:\u002F\u002Fgithub.com\u002FEricLBuehler\u002Fmistral.rs)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEricLBuehler\u002Fmistral.rs?style=social) - Fast, flexible Rust-native LLM inference engine built on Candle. Supports text, vision, audio, image generation, and embeddings with hardware-aware auto-tuning.\n- **[KTransformers](https:\u002F\u002Fgithub.com\u002Fkvcache-ai\u002Fktransformers)** ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fkvcache-ai\u002Fktransformers?style=social) - Flexible framework for heterogeneous CPU-GPU LLM inference and fine-tuning. Enables running large MoE models by offloading experts to CPU with BF1","awesome-opensource-ai 是一个精心整理的开源AI项目、模型、工具和基础设施列表。该项目主要使用Python语言编写，涵盖了从核心框架到生产部署的各个方面，包括深度学习框架（如PyTorch和TensorFlow）、推理引擎、多智能体系统、生成式媒体工具等。所有列出的项目都经过实战验证，并且只收录了顶级项目。awesome-opensource-ai 适合于希望在AI领域寻找高质量开源资源的研究人员、开发者以及企业使用，无论是进行学术研究还是实际应用开发都能从中受益。",2,"2026-06-11 03:24:59","top_topic"]