[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80585":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":24,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":30,"readmeContent":31,"aiSummary":32,"trendingCount":16,"starSnapshotCount":16,"syncStatus":14,"lastSyncTime":33,"discoverSource":34},80585,"orbit","Sphere-AI-Lab\u002Forbit","Sphere-AI-Lab","Stable and Efficient Reinforcement Learning for Trillion-Parameter LLMs","https:\u002F\u002Fspherelab.ai\u002Forbit\u002F",null,"Python",137,8,2,1,0,10,54,3,2.86,"Apache License 2.0",false,"main",true,[26,27,28,29],"cuda","low-precision","reinforcement-learning","transformers","2026-06-12 02:04:04","\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Forbit-logo.png\" alt=\"Orbit\" width=\"500\"\u002F>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  A lightweight, ultra-scale RL post-training framework built around low-precision bases and BF16 adapters &mdash; so frontier-scale RL fits on a single node.\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fspherelab.ai\u002Forbit\u002F\">\u003Cimg alt=\"Blog post\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fblog-spherelab.ai%2Forbit-8A2BE2.svg\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg alt=\"License: Apache 2.0\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache_2.0-green.svg\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n## Why Orbit\n\nToday's leading LLMs cross the trillion-parameter mark, and the conventional RL post-training recipe demands high-precision, multi-node, full-parameter updates. Orbit takes a different route: hold the base at its **deployment precision** (INT4 \u002F FP4 \u002F FP8) and put gradients on a tiny **BF16 OFT or LoRA adapter**. The result &mdash; RL post-training of 1T-class models on a single 8&times;B200 node, with no precision gap between training and rollout.\n\nWe have used Orbit to run stable, end-to-end RL on **Kimi-K2.6 (~1T)**, **DeepSeek V4-Flash**, **DeepSeek V4-Pro (~1.6T)**, and the **Qwen3 MoE** family &mdash; all on a single-node setup.\n\n## Highlights\n\n|   | Capability | What it means |\n|---|---|---|\n| 🪶 | **Adapter-first RL** | BF16 OFT\u002FLoRA adapters on a frozen low-precision base. Same kernels and quantization scheme at train and serve time. |\n| 🛰️ | **Single-node trillion-scale** | 1T-class models fit on a single 8&times;B200 node. No cross-node orchestration, no precision drift. |\n| ⚡ | **Low-precision native** | First-class support for INT4, NVFP4, FP8, and BF16, with parity preflight gates between Megatron and SGLang. |\n| 🧩 | **PEFT-native** | LoRA and OFT adapters; PEFT KL launchers compute reference log-probs in-model (no separate reference workers), with async adapter double-buffering. |\n\n## Installation\n\n> **Supported runtime:** Python 3.12, CUDA 13.2, PyTorch 2.11. This is currently the only path the public launchers and helper scripts target.\n\nOrbit expects sibling backend checkouts next to it:\n\n```text\n\u003Cworkspace>\u002Forbit\n\u003Cworkspace>\u002FMegatron-Bridge\n\u003Cworkspace>\u002FMegatron-Bridge\u002F3rdparty\u002FMegatron-LM\n\u003Cworkspace>\u002Fsglang\n```\n\nKeep these checkouts at the refs recorded in `pyproject.toml` under `tool.orbit.release.backend-pins`. `tool.uv.sources` currently points at local paths; when the backend repos are public, swap those entries for Git URLs at the same `rev` values.\n\n### Set up the environment\n\nThe whole CUDA stack builds from a single `uv sync`. `env.sh` carries the bits that can't live in `pyproject.toml` (site CUDA paths, build toggles, runtime loader paths), all auto-detected:\n\n```bash\ncd orbit\nuv python pin 3.12\nsource env.sh                    # auto-detects CUDA_HOME \u002F GPU arch \u002F python\nuv sync --extra allinone         # builds torch, TE, sglang, megatron, deep-ep, deep-gemm, sgl-kernel, flash-attn, ... from source\n```\n\nThe first build compiles everything from source, budget **around 1–2 hours on a CUDA 13.2 + B200 machine**. Override knobs before `source env.sh` (`CUDA_HOME`, `TORCH_CUDA_ARCH_LIST`, `MAX_JOBS`, `UV_CACHE_DIR`).\n\n> Alternatively, [CUDA-13-install.md](CUDA-13-install.md) installs the layer from prebuilt wheels.\n\n> **Release maintainers:** verify a public clean-room install with `scripts\u002Frelease\u002Fclean_room_gate.sh` after setting `PUBLIC_ORBIT_URL`. This gate targets the future public Git-ref release; it is not expected to pass against the interim local-path backend sources.\n\n## Quickstart\n\nRun any recipe under `examples\u002F` &mdash; each launcher is an independent bash entrypoint with all hyperparameters inlined.\n\n```bash\n# A high-precision BF16 OFT run on Qwen3-4B\nbash examples\u002Fhigh_precision\u002Frun-qwen3-4b-instruct-2507-bf16-math-oft.sh\n\n# A low-precision FP8 OFT run on Qwen3-4B\nbash examples\u002Flow_precision\u002Frun-qwen3-4b-fp8-math-oft.sh\n```\n\nSite-specific paths are passed in through environment variables. The most common ones:\n\n| Variable | Required | Purpose |\n|---|:---:|---|\n| `ORBIT_VENV` | usually | Python environment with Orbit + backends. |\n| `CUDA_HOME` | usually | CUDA 13.2 toolkit root. |\n| `TRAIN_JSONL` | yes | Training prompt JSONL. |\n| `HF_CKPT` | yes | HuggingFace checkpoint directory (quantized for low-precision recipes). |\n| `MEGATRON_LOAD` | yes | Megatron distributed checkpoint root. |\n| `TEST_JSONL` | if eval is on | Eval JSONL. Skip with `DISABLE_EVAL=1`. |\n| `SAVE_DIR` | no | Output checkpoint directory. |\n| `ENABLE_WANDB` | no | `auto` enables W&B if `$HOME\u002F.wandb_key` exists; `0` disables. |\n\nSee [`examples\u002FREADME.md`](examples\u002FREADME.md) for the full environment knob reference and the async PEFT double-buffer notes.\n\n### One-step smoke test\n\nTo exercise the command path without spending real cycles:\n\n```bash\nNUM_ROLLOUT=1 TOTAL_EPOCHS=1 TRAIN_ROWS=1 \\\nROLLOUT_BATCH_SIZE=1 N_SAMPLES_PER_PROMPT=1 GLOBAL_BATCH_SIZE=1 \\\nDISABLE_EVAL=1 ENABLE_WANDB=0 \\\nbash examples\u002Fhigh_precision\u002Frun-qwen3-4b-instruct-2507-bf16-math-oft.sh\n```\n\nTo inspect the final Python argv without starting Ray or loading the model, prepend `ORBIT_DRY_RUN_ARGV=1`.\n\n## Roadmap\n\nOrbit is under active development. On deck:\n\n- [ ] **More launcher recipes** &mdash; broader model coverage (additional Qwen, Llama, GLM, and DeepSeek variants), more datasets, and more precision combinations.\n- [ ] **Docker \u002F containerized environments** &mdash; reproducible images and a documented env-setup path so getting a launcher running takes minutes, not a CUDA-13.2 module hunt.\n- [ ] **On-policy distillation** &mdash; recipes and reference runs for `ADVANTAGE_ESTIMATOR=on_policy_distillation`, including teacher\u002Fstudent preflight.\n- [ ] **Public Git-ref backends** &mdash; flip `tool.uv.sources` from local paths to public Git URLs for Megatron-Bridge, Megatron-LM, and SGLang once the upstream repos land.\n- [ ] **Troubleshooting & ops docs** &mdash; common resolver, import, and launcher smoke failures, plus a multi-node story for sites that have capacity beyond a single 8&times;B200 box.\n\nHave a request? Open an issue or PR.\n## Citation\n\n```bib\n@article{spherelab2026orbit,\n  author = {Qiu, Zeju and Chen, Le and Liu, Lixin and Xiao, Tim Z.\n            and Feng, Yao and Huang, Yangyi and Liu, Zhen and Shi, Han\n            and Wen, Yandong and Yu, Zhouliang and Sch{\\\"o}lkopf, Bernhard\n            and Liu, Weiyang},\n  title  = {Orbit: Stable and Efficient Reinforcement Learning for Trillion-Parameter LLMs},\n  journal = {SphereLab Blog},\n  year   = {2026},\n  note   = {https:\u002F\u002Fspherelab.ai\u002Forbit}\n}\n```\n\n## Acknowledgements\n\nOrbit stands on the shoulders of these excellent projects:\n\n- [MILES](https:\u002F\u002Fgithub.com\u002Fradixark\u002Fmiles)\n- [verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl)\n- [slime](https:\u002F\u002Fgithub.com\u002FTHUDM\u002Fslime)\n- [SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang)\n- [Megatron-Bridge](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-Bridge)\n- [Megatron-LM](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FMegatron-LM)\n- [DeepSeek-V4](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-V4-Pro\u002Ftree\u002Fmain)\n- [PyTorch](https:\u002F\u002Fgithub.com\u002Fpytorch\u002Fpytorch)\n\n## License\n\nOrbit is released under the [Apache License 2.0](LICENSE).\n","Orbit 是一个为万亿参数大规模语言模型设计的稳定且高效的强化学习后训练框架。其核心功能包括使用低精度基础和BF16适配器进行RL后训练，使得前沿规模的RL可以在单节点上运行。技术特点包括支持INT4、FP4、FP8和BF16等低精度格式，并通过冻结低精度基底并在小规模BF16 OFT或LoRA适配器上放置梯度来实现单节点万亿级模型训练。适合场景包括需要在资源受限环境下对超大规模语言模型进行高效强化学习调优的应用场合。","2026-06-11 04:01:17","CREATED_QUERY"]