[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72469":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72469,"Logic-RL","Unakar\u002FLogic-RL","Unakar","Reproduce R1 Zero on Logic Puzzle","",null,"Python",2451,164,12,13,0,1,4,3,61.55,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:05","\n# Logic-RL\n\n\u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14768'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2502.14768-b31b1b.svg'>\u003C\u002Fa> &nbsp;\n\nLogic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning \n---\n\n## News\n[2025\u002F03\u002F20] We release the [ADORA: A Scalable Paradigm for Steering Learning Trajectories ](https:\u002F\u002Fgithub.com\u002FShadeCloak\u002FADORA?tab=readme-ov-file).\n\n[2025\u002F03\u002F19] For stable length control, refer to https:\u002F\u002Fgithub.com\u002Flblankl\u002FShort-RL\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cimg src=\".\u002Fpics\u002Fteaser.png\" width=\"800\" alt=\"Teaser Image\">\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">Main results\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n## Benchmark\n\n| Model                                                             | 2ppl | 3ppl | 4ppl | 5ppl | 6ppl | 7ppl | 8ppl |\n|------------------------------------------------------------------------|------|------|------|------|------|------|------|\n| o3-mini-high                | 0.99 | 0.98 | 0.97 | 0.95 | 0.94 | 0.89 | 0.83 |\n| o1-2024-12-17               | 0.83 | 0.51 | 0.38 | 0.38 | 0.35 | 0.30 | 0.20 |\n| GPT-4o                      | 0.68 | 0.57 | 0.49 | 0.32 | 0.23 | 0.21 | 0.11 |\n| Deepseek-Math-7b            | 0.35 | 0.21 | 0.08 | 0.06 | 0.02 | 0.00 | 0.00 |\n| Qwen2.5-7B-Instruct-1M      | 0.49 | 0.40 | 0.25 | 0.11 | 0.02 | 0.06 | 0.01 |\n| Qwen2.5-7B-Logic-RL (ours)  | 0.99 | 0.99 | 0.94 | 0.92 | 0.91 | 0.80 | 0.67 |\n\n\n---\n\n## Installation\n\n```bash\nconda create -n logic python=3.9\npip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\npip3 install vllm==0.6.3 ray\npip3 install flash-attn --no-build-isolation\npip install -e .  # For verl integration\npip install wandb IPython matplotlib\n```\n\n---\n\n## Data Preparation\n\nYou can directly use \u002Fdata.\n\nFor your own data generation, here's a demo:\n\n### Base Model\n```bash\npython .\u002Fexamples\u002Fdata_preprocess\u002Fkk.py \\\n    --local_dir {processed_data_path} \\\n    --data_path {raw_data_path}\n```\n\n### Instruct Model\n```bash\npython .\u002Fexamples\u002Fdata_preprocess\u002Fkk.py \\\n    --template_type=qwen-instruct \\\n    --local_dir {processed_data_path} \\\n    --data_path {raw_data_path}\n```\n\n---\n\n## Training Execution\n```bash\nconda activate logic\nbash main_grpo.sh  # 4×A100 80G\n```\n\n---\n\n## ⚙️ Implementation Details\n\n| Component              | Location                          |\n|------------------------|-----------------------------------|\n| Reward Modeling     | `verl\u002Futils\u002Freward_score\u002Fkk.py`   |\n| Data Preprocessing   | `examples\u002Fdata_preprocess\u002Fkk.py`  |\n\n---\n\n\n## Citation\n```\n@misc{xie2025logicrlunleashingllmreasoning,\n      title={Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning}, \n      author={Tian Xie and Zitian Gao and Qingnan Ren and Haoming Luo and Yuqian Hong and Bryan Dai and Joey Zhou and Kai Qiu and Zhirong Wu and Chong Luo},\n      year={2025},\n      eprint={2502.14768},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.14768}, \n}\n```\n\n---\n\n## Acknowledgements\n- [Verl](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) 🔗\n- [TinyZero](https:\u002F\u002Fgithub.com\u002FJiayi-Pan\u002FTinyZero) 🔗\n- [Knights and Knaves (K&K) puzzles dataset](https:\u002F\u002Fgithub.com\u002FAlphaPav\u002Fmem-kk-logic) 🔗\n\n---\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=Unakar\u002FLogic-RL&type=Date)](https:\u002F\u002Fstar-history.com\u002F#Unakar\u002FLogic-RL&Date)\n","Logic-RL 是一个基于规则的强化学习项目，旨在提升大型语言模型在逻辑谜题上的推理能力。该项目通过引入一种新的强化学习方法，显著提高了模型在不同难度级别的逻辑问题上的表现，特别是在多步骤推理任务中展现出色的效果。技术上，它利用了PyTorch框架和VLLM库来实现高效的训练流程，并支持自定义数据集的预处理与使用。适用于需要增强AI系统逻辑推理能力的研究场景或开发环境，比如教育软件、智能辅助决策系统等。",2,"2026-06-11 03:42:11","high_star"]