[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-10983":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":36,"readmeContent":37,"aiSummary":38,"trendingCount":16,"starSnapshotCount":16,"syncStatus":39,"lastSyncTime":40,"discoverSource":41},10983,"tokenspeed","lightseekorg\u002Ftokenspeed","lightseekorg","TokenSpeed is a speed-of-light LLM inference engine.","https:\u002F\u002Flightseek.org\u002Fblog\u002Flightseek-tokenspeed.html",null,"Python",1412,149,11,5,0,32,49,467,96,104.53,"MIT License",false,"main",true,[27,28,29,30,31,32,33,34,35,5],"blackwell","deepseek","gpt-oss","kimi","lightseek","llm","minimax","qwen","speed-of-light","2026-06-12 04:00:53","\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fbanner\u002Ftokenspeed-banner.png\" alt=\"TokenSpeed: Tokens at the speed of light\" width=\"100%\" \u002F>\n\u003C\u002Fp>\n\nTokenSpeed is a speed-of-light LLM inference engine designed for **agentic workloads**, with TensorRT-LLM-level performance and vLLM-level usability. Our goal is to be the most performant inference engine for production agentic workloads.\n\nCore components:\n\n- **Modeling layer**: local-SPMD design with a static compiler that generates\n  collective communication from module-boundary placement annotations, so users\n  do not hand-write parallelism logic.\n- **Scheduler**: C++ control plane and Python execution plane. Request\n  lifecycle, KV cache ownership, and overlap timing are encoded as a\n  finite-state machine, with safe KV resource reuse enforced by the type system at compile time.\n- **Kernels**: pluggable, layered kernel system with a portable public API and\n  a centralized registry including one of the fastest **MLA**\n  (Multi-head Latent Attention) implementations on Blackwell for agentic workload.\n- **Entrypoint**: SMG-integrated AsyncLLM for low-overhead CPU-side request\n  handling.\n\n## Performance Comparison\n\n\u003Cimg src=\".\u002Fassets\u002Fperf\u002Ftokenspeed-kimi-k2.5-performance.png\" alt=\"TokenSpeed vs. TensorRT-LLM Pareto curves on agentic workload (Kimi K2.5, B200)\" width=\"800\" margin=\"10px\">\u003C\u002Fimg>\n\n## Preview Status\n\nThis version is a preview release for reproducing the Kimi K2.5 on B200 and\nTokenSpeed MLA on B200 results from the [TokenSpeed blog](https:\u002F\u002Flightseek.org\u002Fblog\u002Flightseek-tokenspeed.html). Several major PRs are\nstill in progress and have not been merged yet.\n\nOngoing work includes:\n\n- Model coverage: Qwen 3.6, DeepSeek V4, and MiniMax M2.7.\n- Runtime features: PD, EPLB, KV store, Mamba cache, VLM, and metrics.\n- Platform optimization: Hopper optimization, MI350 optimization, and related\n  runtime improvements.\n\nThese features are still being cleaned up and will be merged into `main` over\nthe next few weeks. TokenSpeed is currently under heavy development and is\nintended to showcase the new runtime design and technical direction. Do not use\nthis preview release for production deployments.\n\n## Documentation\n\nStart here:\n\n- [Docs Index](https:\u002F\u002Flightseek.org\u002Ftokenspeed\u002F)\n- [Getting Started](https:\u002F\u002Flightseek.org\u002Ftokenspeed\u002Fguides\u002Fgetting-started)\n- [Launching a Server](https:\u002F\u002Flightseek.org\u002Ftokenspeed\u002Fguides\u002Flaunching)\n- [Model Recipes](https:\u002F\u002Flightseek.org\u002Ftokenspeed\u002Frecipes\u002Fmodels)\n- [Server Parameters](https:\u002F\u002Flightseek.org\u002Ftokenspeed\u002Fconfiguration\u002Fserver)\n- [Compatible Parameters](https:\u002F\u002Flightseek.org\u002Ftokenspeed\u002Fconfiguration\u002Fcompatible-parameters)\n- [Parallelism](https:\u002F\u002Flightseek.org\u002Ftokenspeed\u002Fserving\u002Fparallelism)\n","TokenSpeed 是一个高性能的大型语言模型推理引擎，专为代理工作负载设计。它结合了TensorRT-LLM级别的性能和vLLM级别的易用性。核心功能包括本地SPMD设计、C++控制平面与Python执行平面相结合的调度器、可插拔的分层内核系统以及集成SMG的AsyncLLM入口点。这些技术特点确保了高效的并行处理能力和资源重用的安全性。TokenSpeed特别适用于需要快速响应和高吞吐量的语言模型推理场景，如在线服务或大规模数据处理任务。当前版本为预览版，主要用于展示新的运行时设计和技术方向，不建议用于生产环境部署。",2,"2026-06-11 03:31:05","CREATED_QUERY"]