omlx

jundot

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

AI 简介

oMLX 是一款专为 Apple Silicon Mac 设计的轻量级大语言模型（LLM）推理服务，支持连续批处理与分层 KV 缓存（内存 + SSD），并通过 macOS 菜单栏进行图形化管理。其核心特点是基于 MLX 框架实现高效本地推理，自动在内存与 SSD 间智能调度 KV 缓存，保持上下文复用性，显著提升多轮对话与长上下文场景下的响应效率与资源利用率。适用于开发者日常本地调试、桌面端 AI 工具集成、离线编程辅助等对低延迟、高可控性有要求的 macOS 本地 LLM 应用场景。

Python

apple-silicon inference-server llm macos mlx openai-api

在 GitHub 查看

17.7k

Stars

1.5k

Forks

Watchers

543

Issues

Star 增长

今日0

近 7 天0

近 30 天+521

综合评分44.52

默认分支main

omlx

Star 增长

加入交流群