[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72122":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":16,"starSnapshotCount":16,"syncStatus":37,"lastSyncTime":38,"discoverSource":39},72122,"EasyR1","hiyouga\u002FEasyR1","hiyouga","EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL","https:\u002F\u002Fverl.readthedocs.io",null,"Python",4997,375,24,47,0,12,28,69,36,29.73,"Apache License 2.0",false,"main",[26,27,28,29,30,31,32,33],"ai","deepseek","gpt","llm","nlp","qwen","reinforcement-learning","rl","2026-06-12 02:02:58","# EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework\n\n[![GitHub Repo stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhiyouga\u002FEasyR1)](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FEasyR1\u002Fstargazers)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002Fllamafactory_ai)](https:\u002F\u002Ftwitter.com\u002Fllamafactory_ai)\n[![Docker Pulls](https:\u002F\u002Fimg.shields.io\u002Fdocker\u002Fpulls\u002Fhiyouga\u002Fverl)](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fhiyouga\u002Fverl\u002Ftags)\n\n### Used by [Amazon Web Services](https:\u002F\u002Faws.amazon.com\u002Fcn\u002Fblogs\u002Fchina\u002Fbuilding-llm-model-hub-based-on-llamafactory-and-easyr1\u002F)\n\nThis project is a clean fork of the original [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) project to support vision language models, we thank all the authors for providing such a high-performance RL training framework.\n\nEasyR1 is efficient and scalable due to the design of **[HybirdEngine](https:\u002F\u002Farxiv.org\u002Fabs\u002F2409.19256)** and the latest release of **[vLLM](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm)**'s SPMD mode.\n\n## Features\n\n- Supported models\n  - Llama3\u002FQwen2\u002FQwen2.5\u002FQwen3 language models\n  - Qwen2-VL\u002FQwen2.5-VL\u002FQwen3-VL vision language models\n  - DeepSeek-R1 distill models\n\n- Supported algorithms\n  - GRPO\n  - DAPO ![new](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fnew-orange)\n  - Reinforce++\n  - ReMax\n  - RLOO\n  - GSPO ![new](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fnew-orange)\n  - CISPO ![new](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fnew-orange)\n\n- Supported datasets\n  - Any text, vision-text dataset in a [specific format](#custom-dataset)\n\n- Supported tricks\n  - Padding-free training\n  - LoRA training ![new](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fnew-orange)\n  - Resuming from the latest\u002Fbest checkpoint\n  - Wandb & SwanLab & Mlflow & Tensorboard tracking\n\n## Requirements\n\n### Software Requirements\n\n- Python 3.9+\n- transformers>=4.54.0\n- flash-attn>=2.4.3\n- vllm>=0.8.3\n\nWe provide a [Dockerfile](.\u002FDockerfile) to easily build environments.\n\nWe recommend using the [pre-built docker image](https:\u002F\u002Fhub.docker.com\u002Fr\u002Fhiyouga\u002Fverl) in EasyR1.\n\n```bash\ndocker pull hiyouga\u002Fverl:ngc-th2.8.0-cu12.9-vllm0.11.0\ndocker run -it --ipc=host --gpus=all hiyouga\u002Fverl:ngc-th2.8.0-cu12.9-vllm0.11.0\n```\n\nIf your environment does not support Docker, you can consider using **Apptainer**:\n\n```bash\napptainer pull easyr1.sif docker:\u002F\u002Fhiyouga\u002Fverl:ngc-th2.8.0-cu12.9-vllm0.11.0\napptainer shell --nv --cleanenv --bind \u002Fmnt\u002Fyour_dir:\u002Fmnt\u002Fyour_dir easyr1.sif\n```\n\nUse `USE_MODELSCOPE_HUB=1` to download models from the ModelScope hub.\n\n### Hardware Requirements\n\n\\* *estimated*\n\n| Method                   | Bits |  1.5B  |   3B   |   7B   |   32B   |   72B   |\n| ------------------------ | ---- | ------ | ------ | ------ | ------- | ------- |\n| GRPO Full Fine-Tuning    |  AMP | 2*24GB | 4*40GB | 8*40GB | 16*80GB | 32*80GB |\n| GRPO Full Fine-Tuning    | BF16 | 1*24GB | 1*40GB | 4*40GB |  8*80GB | 16*80GB |\n| GRPO LoRA Fine-Tuning    |  AMP | 1*12GB | 1*24GB | 2*32GB |  2*80GB |  4*80GB |\n\n> [!NOTE]\n> Use `worker.actor.fsdp.torch_dtype=bf16` and `worker.actor.optim.strategy=adamw_bf16` to enable bf16 training.\n\n## Tutorial: Run Qwen2.5-VL GRPO on [Geometry3K](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhiyouga\u002Fgeometry3k) Dataset in Just 3 Steps\n\n![image](assets\u002Fqwen2_5_vl_7b_geo.png)\n\n### Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FEasyR1.git\ncd EasyR1\npip install -e .\n```\n\n### GRPO Full Training\n\n```bash\nbash examples\u002Fqwen2_5_vl_7b_geo3k_grpo.sh\n```\n\n### GRPO LoRA Training\n\n```bash\nbash examples\u002Fqwen3_vl_4b_geo3k_grpo_lora.sh\n```\n\n### Merge Checkpoint in Hugging Face Format\n\n```bash\npython3 scripts\u002Fmodel_merger.py --local_dir checkpoints\u002Feasy_r1\u002Fexp_name\u002Fglobal_step_1\u002Factor\n```\n\n> [!TIP]\n> If you encounter issues with connecting to Hugging Face, consider using `export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com`.\n>\n> If you want to use SwanLab logger, consider using `bash examples\u002Fqwen2_5_vl_7b_geo3k_swanlab.sh`.\n\n## Custom Dataset\n\nPlease refer to the example datasets to prepare your own dataset.\n\n- Text dataset: https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhiyouga\u002Fmath12k\n- Image-text dataset: https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhiyouga\u002Fgeometry3k\n- Multi-image-text dataset: https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhiyouga\u002Fjourneybench-multi-image-vqa\n- Text-image mixed dataset: https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhiyouga\u002Frl-mixed-dataset\n\n## How to Understand GRPO in EasyR1\n\n![image](assets\u002Feasyr1_grpo.png)\n\n- To learn about the GRPO algorithm, you can refer to [Hugging Face's blog](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftrl\u002Fv0.16.1\u002Fen\u002Fgrpo_trainer).\n\n## How to Run 70B+ Model in Multi-node Environment\n\n1. Start the Ray head node.\n\n```bash\nray start --head --port=6379 --dashboard-host=0.0.0.0\n```\n\n2. Start the Ray worker node and connect to the head node.\n\n```bash\nray start --address=\u003Chead_node_ip>:6379\n```\n\n3. Check the Ray resource pool.\n\n```bash\nray status\n```\n\n4. Run training script on the Ray head node only.\n\n```bash\nbash examples\u002Fqwen2_5_vl_7b_geo3k_grpo.sh\n```\n\nSee the **[veRL's official doc](https:\u002F\u002Fverl.readthedocs.io\u002Fen\u002Flatest\u002Fstart\u002Fmultinode.html)** for more details about multi-node training and Ray debugger.\n\n## Other Baselines\n\nWe also reproduced the following two baselines of the [R1-V](https:\u002F\u002Fgithub.com\u002Fdeep-agent\u002FR1-V) project.\n- [CLEVR-70k-Counting](examples\u002Fbaselines\u002Fqwen2_5_vl_3b_clevr.sh): Train the Qwen2.5-VL-3B-Instruct model on counting problem.\n- [GeoQA-8k](examples\u002Fbaselines\u002Fqwen2_5_vl_3b_geoqa8k.sh): Train the Qwen2.5-VL-3B-Instruct model on GeoQA problem.\n\n## Performance Baselines\n\nSee [baselines.md](assets\u002Fbaselines.md).\n\n## Awesome Work using EasyR1\n\n- **MMR1**: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FLengSicong\u002FMMR1)](https:\u002F\u002Fgithub.com\u002FLengSicong\u002FMMR1) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2509.21268-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21268)\n- **Vision-R1**: Incentivizing Reasoning Capability in Multimodal Large Language Models. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FOsilly\u002FVision-R1)](https:\u002F\u002Fgithub.com\u002FOsilly\u002FVision-R1) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2503.06749-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.06749)\n- **Seg-Zero**: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdvlab-research\u002FSeg-Zero)](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FSeg-Zero) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2503.06520-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.06520)\n- **MetaSpatial**: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FPzySeere\u002FMetaSpatial)](https:\u002F\u002Fgithub.com\u002FPzySeere\u002FMetaSpatial) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2503.18470-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.18470)\n- **Temporal-R1**: Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fappletea233\u002FTemporal-R1)](https:\u002F\u002Fgithub.com\u002Fappletea233\u002FTemporal-R1) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2506.01908-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.01908)\n- **NoisyRollout**: Reinforcing Visual Reasoning with Data Augmentation. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FJohn-AI-Lab\u002FNoisyRollout)](https:\u002F\u002Fgithub.com\u002FJohn-AI-Lab\u002FNoisyRollout) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2504.13055-blue)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.13055)\n- **GUI-R1**: A Generalist R1-Style Vision-Language Action Model For GUI Agents. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fritzz-ai\u002FGUI-R1)](https:\u002F\u002Fgithub.com\u002Fritzz-ai\u002FGUI-R1) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2504.10458-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.10458)\n- **FAST-GRPO**: Fast-Slow Thinking framework that dynamically adapts reasoning depth based on question characteristics. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FMr-Loevan\u002FFAST)](https:\u002F\u002Fgithub.com\u002FMr-Loevan\u002FFAST) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2504.18458-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.18458)\n- **R1-Track**: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FWangbiao2\u002FR1-Track)](https:\u002F\u002Fgithub.com\u002FWangbiao2\u002FR1-Track)\n- **VisionReasoner**: Unified Visual Perception and Reasoning via Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdvlab-research\u002FVisionReasoner)](https:\u002F\u002Fgithub.com\u002Fdvlab-research\u002FVisionReasoner) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2505.12081-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.12081)\n- **MM-UPT**: Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fwaltonfuture\u002FMM-UPT)](https:\u002F\u002Fgithub.com\u002Fwaltonfuture\u002FMM-UPT) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2505.22453-blue)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.22453)\n- **RL-with-Cold-Start**: Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fwaltonfuture\u002FRL-with-Cold-Start)](https:\u002F\u002Fgithub.com\u002Fwaltonfuture\u002FRL-with-Cold-Start) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2505.22334-blue)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.22334)\n- **ViGoRL**: Grounded Reinforcement Learning for Visual Reasoning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGabesarch\u002Fgrounded-rl)](https:\u002F\u002Fgithub.com\u002FGabesarch\u002Fgrounded-rl) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2505.22334-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.23678)\n- **Revisual-R1**: Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FCSfufu\u002FRevisual-R1)](https:\u002F\u002Fgithub.com\u002FCSfufu\u002FRevisual-R1) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2506.04207-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04207)\n- **SophiaVL-R1**: Reinforcing MLLMs Reasoning with Thinking Reward. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fkxfan2002\u002FSophiaVL-R1)](https:\u002F\u002Fgithub.com\u002Fkxfan2002\u002FSophiaVL-R1) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2505.17018-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.17018)\n- **Vision-Matters**: Simple Visual Perturbations Can Boost Multimodal Math Reasoning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FYutingLi0606\u002FVision-Matters)](https:\u002F\u002Fgithub.com\u002FYutingLi0606\u002FVision-Matters) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2506.09736-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09736)\n- **VTool-R1**: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FVTOOL-R1\u002Fvtool-r1)](https:\u002F\u002Fgithub.com\u002FVTOOL-R1\u002Fvtool-r1) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2505.19255-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.19255)\n- **Long-RL**: Scaling RL to Long Sequences. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNVlabs\u002FLong-RL)](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FLong-RL) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2507.07966-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.07966)\n- **EditGRPO**: Reinforcement Learning with Post-Rollout Edits for Clinically Accurate Chest X-Ray Report Generation. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftaokz\u002FEditGRPO)](https:\u002F\u002Fgithub.com\u002Ftaokz\u002FEditGRPO)\n- **ARES**: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fshawn0728\u002FARES)](https:\u002F\u002Fgithub.com\u002Fshawn0728\u002FARES) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2510.08457-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.08457)\n- **VPPO**: Spotlight on Token Perception for Multimodal Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhuaixuheqing\u002FVPPO-RL)](https:\u002F\u002Fgithub.com\u002Fhuaixuheqing\u002FVPPO-RL) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2510.09285-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.09285)\n- **IE-Critic-R1**: Advancing the Explanatory Measurement of Text-Driven Image Editing for Human Perception Alignment. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FCoobiw\u002FIE-Critic-R1)](https:\u002F\u002Fgithub.com\u002FCoobiw\u002FIE-Critic-R1) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2511.18055-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2511.18055)\n- **OneThinker**: All-in-one Reasoning Model for Image and Video. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftulerfeng\u002FOneThinker)](https:\u002F\u002Fgithub.com\u002Ftulerfeng\u002FOneThinker) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2512.03043-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.03043)\n- **MetaphorStar**: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FMING-ZCH\u002FMetaphorStar)](https:\u002F\u002Fgithub.com\u002FMING-ZCH\u002FMetaphorStar) [![[arxiv]](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Farxiv-2602.10575-blue)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.10575)\n\n## TODO\n\n- Support ulysses parallelism for VLMs (middle priority).\n- Support more VLM architectures.\n\n> [!NOTE]\n> We will not provide scripts for supervised fine-tuning and inference in this project. If you have such requirements, we recommend using [LlamaFactory](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLlamaFactory).\n\n### Known bugs\n\nThese features are temporarily disabled for now, we plan to fix them one-by-one in the future updates.\n\n- Vision language models are not compatible with ulysses parallelism yet.\n\n## Discussion Group\n\n👋 Join our [WeChat group](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002Fllamafactory-community\u002Fblob\u002Fmain\u002Fwechat\u002Feasyr1.jpg).\n\n## FAQs\n\n> ValueError: Image features and image tokens do not match: tokens: 8192, features 9800\n\nIncrease the `data.max_prompt_length` or reduce the `data.max_pixels`.\n\n> RuntimeError: CUDA Error: out of memory at \u002Fworkspace\u002Fcsrc\u002Fcumem_allocator.cpp:62\n\nReduce the `worker.rollout.gpu_memory_utilization` and enable `worker.actor.offload.offload_params`.\n\n> RuntimeError: 0 active drivers ([]). There should only be one.\n\nUninstall `deepspeed` from the current python environment.\n\n## Citation\n\nCore contributors: [Yaowei Zheng](https:\u002F\u002Fgithub.com\u002Fhiyouga), [Junting Lu](https:\u002F\u002Fgithub.com\u002FAL-377), [Shenzhi Wang](https:\u002F\u002Fgithub.com\u002FShenzhi-Wang), [Zhangchi Feng](https:\u002F\u002Fgithub.com\u002FBUAADreamer), [Dongdong Kuang](https:\u002F\u002Fgithub.com\u002FKuangdd01), Yuwen Xiong and Richong Zhang\n\nWe also thank Guangming Sheng and Chi Zhang for helpful discussions.\n\n```bibtex\n@misc{zheng2025easyr1,\n  title        = {EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework},\n  author       = {Yaowei Zheng, Junting Lu, Shenzhi Wang, Zhangchi Feng, Dongdong Kuang, Yuwen Xiong, Richong Zhang},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FEasyR1}},\n  year         = {2025}\n}\n```\n\nWe recommend to also cite the original work.\n\n```bibtex\n@article{sheng2024hybridflow,\n  title   = {HybridFlow: A Flexible and Efficient RLHF Framework},\n  author  = {Guangming Sheng and Chi Zhang and Zilingfeng Ye and Xibin Wu and Wang Zhang and Ru Zhang and Yanghua Peng and Haibin Lin and Chuan Wu},\n  year    = {2024},\n  journal = {arXiv preprint arXiv: 2409.19256}\n}\n```\n","EasyR1是一个基于veRL的高效、可扩展的多模态强化学习训练框架。它支持多种语言和视觉-语言模型，包括Llama3、Qwen系列以及DeepSeek-R1等，并集成了GRPO、DAPO、Reinforce++等多种先进的强化学习算法。项目采用了HybirdEngine设计和vLLM的SPMD模式来提高训练效率与性能，同时支持无填充训练、LoRA训练等实用技巧。此外，EasyR1还提供了对特定格式的数据集的支持，并能通过Wandb、SwanLab、Mlflow或Tensorboard进行训练过程监控。该框架适用于需要快速迭代和优化大规模多模态模型的研究人员及开发者，在自然语言处理与计算机视觉交叉领域的应用尤为广泛。",2,"2026-06-11 03:40:27","high_star"]