[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80899":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},80899,"Drift","ant-research\u002FDrift","ant-research","Drift: DLM Reinforcement Learning Training Framework","",null,"Python",142,24,8,1,0,7,43,108,37,4.19,"Apache License 2.0",false,"main",[],"2026-06-12 02:04:08","# Drift: \u003Cu>D\u003C\u002Fu>LM \u003Cu>R\u003C\u002Fu>e\u003Cu>i\u003C\u002Fu>n\u003Cu>f\u003C\u002Fu>orcement Learning \u003Cu>T\u003C\u002Fu>raining Framework \u003Cbr>\nDrift is an easy-to-use and extensible reinforcement learning framework for diffusion language models.\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo.png\" alt=\"Drift Overview\" width=\"800\"\u002F>\n\u003C\u002Fp>\n\n## Features\n\n- **Multi-model support** — Compatible with [LLaDA](https:\u002F\u002Fgithub.com\u002FML-GSAI\u002FLLaDA) and [Dream](https:\u002F\u002Fgithub.com\u002FHKUNLP\u002FDream) series models, with more diffusion LMs coming soon.\n- **Flexible masking strategies** — Sequential masking, random masking, coupled random masking, and all masking, with configurable temperature-based sampling.\n- **Accelerated rollout** — Block-wise parallel decoding with dynamic confidence thresholds for faster generation.\n- **Diverse RLVR tasks** — Math, Code, Sudoku, and Countdown reward functions out of the box.\n\n## Installation\n\n```bash\nconda create --name drift python=3.10\nconda activate drift\n\npip install torch==2.6.0\npip install deepspeed==0.18.4\npip install --no-cache-dir \\\n  https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.7.4.post1\u002Fflash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\npip install -r requirements.txt\n```\n\n## Data Format\n**To help you get started quickly, math and code datasets are already provided in the `data\u002F` directory.**\n\nTraining data should be placed in the `data\u002F` directory as JSON files. The framework supports **math** (`MATH500`, `GSM8K`), **code** (`MBPP`, `HumanEval`), and **planning** (`sudoku`, `countdown`) tasks, and is easily extensible to custom tasks.\n\nFor detailed field specifications and examples of each data type, see [Data Format Specification](data\u002FDATA_FORMAT.md).\n\n## Training\n\nTraining is launched via [Accelerate](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Faccelerate) with DeepSpeed ZeRO-3. Configuration is managed through YAML files in `configs\u002F`.\n\n**Single-node training:**\n\n```bash\naccelerate launch trainer\u002Fmain_rl.py config=configs\u002Fllada_code.yaml\n```\n\n**Config Examples:**\n\n| Config | Model | Task |\n|--------|-------|------|\n| `llada_math.yaml` | LLaDA-8B | Math |\n| `llada_code.yaml` | LLaDA-8B | Code |\n| `dream_math.yaml` | Dream-7B | Math |\n| `dream_code.yaml` | Dream-7B | Code |\n\n**Multi-node training:**\n\nFor distributed training across multiple machines, set the standard environment variables and pass them to Accelerate:\n\n```bash\n# Run on each node:\naccelerate launch \\\n  --num_machines=$WORLD_SIZE \\\n  --machine_rank=$RANK \\\n  --main_process_ip=$MASTER_ADDR \\\n  --main_process_port=$MASTER_PORT \\\n  --num_processes=$((WORLD_SIZE * 8)) \\\n  trainer\u002Fmain_rl.py config=configs\u002Fllada_code.yaml\n```\n\n| Variable | Description |\n|----------|-------------|\n| `WORLD_SIZE` | Total number of nodes |\n| `RANK` | Rank of the current node (0-indexed) |\n| `MASTER_ADDR` | IP address of the rank-0 node |\n| `MASTER_PORT` | Free port on the rank-0 node |\n\n\n**Some training parameters** (set in YAML):\n\n```yaml\ntraining:\n  mask_strategy: \"sequential_masking\"   # masking strategy\n  reward_funcs: [\"math\"]               # reward function(s)\n\nrollout:\n  num_generations: 4                   # samples per prompt\n  steps: 256                           # diffusion steps\n  remasking_strategy: [\"low_confidence_dynamic\"] # denoising strategy\n```\n\n## Evaluation\n\nRun evaluation on one or more checkpoints:\n\n```bash\naccelerate launch trainer\u002Feval.py config=configs\u002Feval\u002Feval_llada_code.yaml\n```\n\nEvaluation configs are located in `configs\u002Feval\u002F` and support passing multiple checkpoint paths for batch evaluation.\n\n\n## Acknowledgement\n\nThis framework builds upon [dLLM-RL](https:\u002F\u002Fgithub.com\u002FWuyxin\u002FdLLM-RL) and [fastdllm](https:\u002F\u002Fgithub.com\u002Fzowaa\u002Ffastdllm), with its model foundations drawn from [Dream](https:\u002F\u002Fgithub.com\u002FHKUNLP\u002FDream) and [LLaDA](https:\u002F\u002Fgithub.com\u002FML-GSAI\u002FLLaDA). We gratefully acknowledge these teams for their valuable contributions to open-source research and development.\n","Drift是一个用于扩散语言模型的强化学习训练框架。它支持多种模型，包括LLaDA和Dream系列，并且提供了灵活的掩码策略，如顺序掩码、随机掩码等，以适应不同的任务需求。该框架通过块级并行解码与动态置信度阈值加速了生成过程，同时内置了数学、代码、数独和倒计时等多种RLVR任务的奖励函数。适用于需要高效训练扩散语言模型的研究者或开发者，在处理复杂文本生成任务时尤其有用。项目采用Python编写，遵循Apache License 2.0许可协议。",2,"2026-06-11 04:02:45","CREATED_QUERY"]