[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72202":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":16,"starSnapshotCount":16,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},72202,"R1-V","StarsfieldAI\u002FR1-V","StarsfieldAI","Witness the aha moment of VLM with less than $3.","",null,"Python",4059,283,45,91,0,1,5,29.36,false,"main",[],"2026-06-12 02:03:00","# R1-V: Reinforcing Super Generalization Ability in Vision Language Models with Less Than $3\n\nNews: We released new VLM-RL environments, training codebase and research paper [G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning](https:\u002F\u002Fgithub.com\u002Fchenllliang\u002FG1), check it out!\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc52a448f-d666-4ca6-958b-86267d56de0e) \n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDeep-Agent\u002FR1-V\u002Freleases\">\u003Cimg alt=\"GitHub release\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Frelease\u002FDeep-Agent\u002FR1-V.svg\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n> ### Roadmap for R1-V\n> We are building a general framework for RLVR in VLM. We believe in the power of **trenches** and **longtermism**.\n>\n> Our Interest: General Vision-Language Intelligence & Visual\u002FGUI Agent\n> \n> Our Goal: 🔄 Algorithm Enhancement ⚡ Efficiency Optimization 🎯 Task Diversity 🌲 Impactful Open Source Research. \n>\n> Welcome Ideas and Contribution. Stay tuned!\n\n\n**Blogs:**\n\n\n[🎯 RLVR in Vision Language Models: Findings, Questions and Directions](https:\u002F\u002Fdeepagent.notion.site\u002Frlvr-in-vlms)\n\n**Resources:** \n\n[🤗 R1V Training Dataset: CLEVR-70k-Counting](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FleonardPKU\u002Fclevr_cogen_a_train)\n\n[🤗 R1V Training Dataset: CLEVR-70k-Complex](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMMInstruction\u002FClevr_CoGenT_TrainA_70K_Complex)\n\n[🤗 R1V Training Dataset: GEOQA-8k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FleonardPKU\u002FGEOQA_R1V_Train_8K)\n\n[🤗 R1-Distilled Visual Reasoning Dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMMInstruction\u002FClevr_CoGenT_TrainA_R1)\n\n**R1-V Team:** \n\n[Liang Chen](https:\u002F\u002Fgithub.com\u002Fchenllliang) · [Lei Li](https:\u002F\u002Flilei-nlp.github.io) · [Haozhe Zhao](https:\u002F\u002Fhaozhezhao.github.io\u002F) · [Yifan Song](https:\u002F\u002Fgithub.com\u002FYifan-Song793) · [Vinci](https:\u002F\u002Fgithub.com\u002F0xvincii) · [Zihao Yue](https:\u002F\u002Fyuezih.github.io\u002F) \n\n**Contributors**:\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDeep-Agent\u002FR1-V\u002Fgraphs\u002Fcontributors\">\n  \u003Cimg src=\"https:\u002F\u002Fcontrib.rocks\u002Fimage?repo=Deep-Agent\u002FR1-V&max=30\" \u002F>\n\u003C\u002Fa>\n\n\n\n---\n\n### Updates\n\n- 2025-02-27: vLLM trainer supports Qwen2.5-VL now, refer to `.\u002Fsrc\u002Fscripts\u002Frun_grpo_vllm_qwen25vl.sh` for script and env update.\n- 2025-02-21: We write a [blog post](https:\u002F\u002Fdeepagent.notion.site\u002Frlvr-in-vlms) summarizing the main findings and questions in our visual RLVR experimetns, check it out!\n- 2025-02-12: We fixed the batched decoding error. The orignial RL training scirpt now is 3x speeded up.\n- 2025-02-12: R1-V now supports vLLM to accelerate training (`pip install vllm==0.7.2` before use) and SFT.\n- 2025-02-11: R1-V now supports Qwen2.5-VL and [GEOQA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11370) task.\n- 2025-02-06: We upload the evaluation script and polish the README. We are writing a blog post summarizing the statistics, findings and underexplored questions. \n- 2025-02-03: We upload the training codebase.\n- 2025-02-03: We curate and upload some verified Deepseek-R1 visual reasoning traces with some special tricks (see `R1-V\u002Fsrc\u002Fdistill_r1\u002F`). Current training code does not rely on it, feel free to explore.\n- 2025-02-03: We release the R1-V repo.\n\n\n### For contributors\n- Our top development priority is addressing the issues marked with `help wanted` labels, and we welcome ideas\u002FPRs from the community to help solve them.\n\n---\n\n![Image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe86a3ff2-a9c6-4548-8200-6c3c382d60e6)\n\n![Image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fb3512920-ef30-4d6d-9bfe-c64e4570a067)\n*Note: In our later experiment, we found that letting the 2b base model directly output the result instead of following `\u003Cthink>\u003C\u002Fthink>\u003Canswer>\u003C\u002Fanswer>` would lead to a much higher score (86%) on SuperClevr. It suggests that enforcing Chain-of-Thought reasoning may be not only unnecessary but potentially detrimental to the 2B model performance.*\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff5191b1e-dde2-42b7-9ec9-10f7f6213c12)\n\n\n## Setup\n\n```bash\nconda create -n r1-v python=3.11 \nconda activate r1-v\n\nbash setup.sh\n```\n\n> [!NOTE] \n> If you meet bug when running the script, first try align your environments with `.\u002Fsrc\u002Frequirements.txt`\n\n\n### Supported Models\n\n1. Qwen2-VL\n2. Qwen2.5-VL \n\n### Supported Training Datasets\n\n1. [🤗 R1V Training Dataset: CLEVR-70k-Counting](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FleonardPKU\u002Fclevr_cogen_a_train): Item Counting Problems\n\n2. [🤗 R1V Training Dataset: CLEVR-70k-Complex](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FMMInstruction\u002FClevr_CoGenT_TrainA_70K_Complex): Number Related Reasoning \n\n3. [🤗 R1V Training Dataset: GEOQA-8k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FleonardPKU\u002FGEOQA_R1V_Train_8K): Geometry Reasoning\n\n\n### Supported Evaluations\n\n1. [SuperClevr-200](https:\u002F\u002Fgithub.com\u002FDeep-Agent\u002FR1-V?tab=readme-ov-file#superclevr): Item Counting Problems\n2. [GeoQA-Test-Direct-Answer-735](https:\u002F\u002Fgithub.com\u002FDeep-Agent\u002FR1-V?tab=readme-ov-file#geoqa): Geometry Reasoning\n\n## Training\n\n### GRPO\n\n```bash\ncd src\u002Fr1-v\n\nexport DEBUG_MODE=\"true\" # Enable Debug if you want to see the rollout of model during RL\nexport LOG_PATH=\".\u002Fdebug_log_2b.txt\"\n\ntorchrun --nproc_per_node=\"8\" \\\n    --nnodes=\"1\" \\\n    --node_rank=\"0\" \\\n    --master_addr=\"127.0.0.1\" \\\n    --master_port=\"12345\" \\\n    src\u002Fopen_r1\u002Fgrpo.py \\\n    --output_dir \u003COUTPUT_DIR> \\\n    --model_name_or_path \u003CPATH-TO-Qwen2-VL-2B-Instruct> \\ \n    --dataset_name leonardPKU\u002Fclevr_cogen_a_train \\  \n    --deepspeed local_scripts\u002Fzero3.json \\\n    --max_prompt_length 512 \\\n    --max_completion_length 512 \\\n    --per_device_train_batch_size 1 \\\n    --gradient_accumulation_steps 2 \\\n    --logging_steps 1 \\\n    --bf16 \\\n    --report_to wandb \\\n    --gradient_checkpointing false \\\n    --attn_implementation flash_attention_2 \\\n    --max_pixels 401408 \\\n    --num_train_epochs 2 \\\n    --run_name Qwen2-VL-2B-GRPO-CLEVR-70k \\\n    --save_steps 100 \\\n    --save_only_model true \\\n    --num_generations 8   # number of outputs G in grpo, reduce it would lead to faster training and smaller memory cost but higher variance  \n\n```\n\n> [!NOTE] \n> 1. To reproduce the result, keep the per_device_train_batch_size to 1 for now, as there is a revealed bug about batched training. See the [reproduction report](https:\u002F\u002Fgithub.com\u002FDeep-Agent\u002FR1-V\u002Fissues\u002F4#issuecomment-2633348354) here. We realize it is important for effiency and are working on solving it with the community.\n> 2. If you meet **OOM Error**, you can try reduce `--num_generations`\n> 3. To use vLLM to speed up, please refer to this [script](https:\u002F\u002Fgithub.com\u002FDeep-Agent\u002FR1-V\u002Fblob\u002Fmain\u002Fsrc\u002Fscripts\u002Frun_grpo_vllm.sh).\n\n\n### SFT\n\nWe also provide SFT code, please follow the script and edit the config to customize the sft task.\n\n```bash\naccelerate launch --config_file src\u002Fr1-v\u002Fconfigs\u002Fzero2.yaml src\u002Fr1-v\u002Fsrc\u002Fopen_r1\u002Fsft.py --config src\u002Fr1-v\u002Fconfigs\u002Fqwen2vl_sft_config.yaml \n```\n\n## Evaluation\n\n\n### SuperCLEVR\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4f48233c-0546-432f-94e6-723f91fbd086)\n\nWe provide the example script to evaluate OOD counting performance on a subset of SuperCLEVR within 1 minute. You can also modify the script and dataset to test on your own dataset.\n\n\n\n```bash\ncd .\u002Fsrc\u002Feval\nwget https:\u002F\u002Fwww.cs.jhu.edu\u002F~zhuowan\u002Fzhuowan\u002FSuperCLEVR\u002Fto_be_released\u002Fimages.zip\nunzip images.zip\n\n# change the model path in the script\npython test_qwen2vl_counting_superclevr.py \n\n# tested scores: \n# Qwen2VL-2B-Instruct: 48.0%\n# Qwen2VL-2B-Instruct-GRPO-100step: 82.5%\n```\n\n### GEOQA\n\n\u003Cimg width=\"379\" alt=\"截屏2025-02-11 13 38 50\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F0282872d-bfe5-40fa-ac00-8986450a0b1e\" \u002F>\n\u003Cimg width=\"379\" alt=\"截屏2025-02-11 14 54 16\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F053ebb99-5f19-4599-be51-a7c335ab2b8b\" \u002F>\n\n\n\nWe provide the example script to evaluate on the test set (direct answer form) of [GEOQA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11370).\n\n\n```bash\n# prepare images for testing\ncd .\u002Fsrc\u002Feval\ngit lfs install\ngit clone https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FLuckyjhg\u002FGeo170K\ncd Geo170K\nunzip images.zip\n\n\n# Evaluation Script\npython test_qwen2vl_geoqa.py\n\n# tested scores: \n# Qwen2VL-7B-Instruct: 30.63%\n# Qwen2VL-7B-Instruct-GRPO-2epochs: 38.72%\n\n# Qwen2.5VL-3B-Instruct: 35.41%\n# Qwen2.5VL-3B-Instruct-GRPO-1epochs: 47.48%\n```\n\nTo enable faster inference with multiple GPUs, you could also use the script in `R1-V\u002Fsrc\u002Fscripts\u002Ftest_grpo_geoqa_multigpu.sh`\n```\nbash src\u002Fscripts\u002Ftest_grpo_geoqa_multigpu.sh\n```\n\n\n\n## Acknowledgements\n\nWe sincerely thank [DeepSeek](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-R1), [Open-R1](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fopen-r1), [QwenVL](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen2.5-VL), [Open-R1-Multimodal](https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002Fopen-r1-multimodal) (our initial codebase), [CLEVR](https:\u002F\u002Fcs.stanford.edu\u002Fpeople\u002Fjcjohns\u002Fclevr\u002F), [SuperCLEVR](https:\u002F\u002Fgithub.com\u002FLizw14\u002FSuper-CLEVR), [G-LLAVA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.11370) for providing open source resources and to build the project. Special thanks to [Kimi](https:\u002F\u002Fkimi.moonshot.cn\u002F), [bAInance Labs](https:\u002F\u002Fbainancelabs.com\u002F) for supporting computation resources and [Yuxin Wu](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=mJQI-gUAAAAJ&hl=en), [Xinyu Zhou](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=Jv4LCj8AAAAJ&hl=en), [Baobao Chang](https:\u002F\u002Fscholar.google.com.au\u002Fcitations?user=LaKNyhQAAAAJ&hl=en) for their valuable advice.\n\n\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=Deep-Agent\u002FR1-V&type=Timeline)](https:\u002F\u002Fstar-history.com\u002F#Deep-Agent\u002FR1-V&Timeline)\n\n## Citation\n\n```bib\n@misc{chen2025r1v,\n  author       = {Chen, Liang and Li, Lei and Zhao, Haozhe and Song, Yifan and Vinci},\n  title        = {R1-V: Reinforcing Super Generalization Ability in Vision-Language Models with Less Than \\$3},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FDeep-Agent\u002FR1-V}},\n  note         = {Accessed: 2025-02-02},\n  year         = {2025}\n}\n```\n\n\n\n","R1-V 是一个旨在以不到3美元的成本增强视觉语言模型（VLM）泛化能力的项目。该项目通过强化学习技术，提升模型在感知和推理任务中的表现，并提供了多种训练数据集和环境来支持这一目标。其核心功能包括高效的算法优化、多样化的任务支持以及对开源研究的贡献。R1-V 适用于需要低成本但高性能视觉理解与交互场景的研究者和开发者，尤其是在探索通用视觉-语言智能及视觉\u002FGUI代理方面有需求的用户。",2,"2026-06-11 03:40:49","high_star"]