[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71928":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},71928,"TinyZero","Jiayi-Pan\u002FTinyZero","Jiayi-Pan","Minimal reproduction of DeepSeek R1-Zero","",null,"Python",13150,1584,124,71,0,9,27,52,106.3,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:02","# TinyZero\n\n> **⚠️ Deprecation Notice:** This repo is no longer actively maintained. For running RL experiments, please directly use the latest [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) library.\n> For the archived original documentation, see [OLD_README.md](.\u002FOLD_README.md).\n\n![image](cover.png)\n\nTinyZero is a reproduction of [DeepSeek R1 Zero](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-R1) in countdown and multiplication tasks. We built upon [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl).\n\nThrough RL, the 3B base LM develops self-verification and search abilities all on its own.\n\nYou can experience the Aha moment yourself for \u003C $30.\n\nTwitter thread: https:\u002F\u002Fx.com\u002Fjiayi_pirate\u002Fstatus\u002F1882839370505621655\n\nFull experiment log: https:\u002F\u002Fwandb.ai\u002Fjiayipan\u002FTinyZero\n\n> 📢: We release [Adaptive Parallel Reasoning](https:\u002F\u002Fgithub.com\u002FParallel-Reasoning\u002FAPR), where we explore a new dimension in scaling reasoning models.\n\n## Installation\n\n```\nconda create -n zero python=3.9\n# install torch [or you can skip this step and let vllm install the correct version for you]\npip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n# install vllm\npip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1\npip3 install ray\n\n# verl\npip install -e .\n\n# flash attention 2\npip3 install flash-attn --no-build-isolation\n# quality of life\npip install wandb IPython matplotlib\n```\n\n## Countdown task\n\n**Data Preparation**\n```\nconda activate zero\npython .\u002Fexamples\u002Fdata_preprocess\u002Fcountdown.py --local_dir {path_to_your_dataset}\n```\n\n### Run Training\n```\nconda activate zero\n```\n\nFor the following code, if you see out-of-VRAM, try adding `critic.model.enable_gradient_checkpointing=True` to the script, and check out the discussion [here](https:\u002F\u002Fgithub.com\u002FJiayi-Pan\u002FTinyZero\u002Fissues\u002F5#issuecomment-2624161643).\n\n**Single GPU**\n\n\nWorks for model \u003C= 1.5B. For Qwen2.5-0.5B base, we know it fails to learn reasoning.\n\n```\nexport N_GPUS=1\nexport BASE_MODEL={path_to_your_model}\nexport DATA_DIR={path_to_your_dataset}\nexport ROLLOUT_TP_SIZE=1\nexport EXPERIMENT_NAME=countdown-qwen2.5-0.5b\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\nbash .\u002Fscripts\u002Ftrain_tiny_zero.sh\n```\n\n**3B+ model**\nIn this case, the base model is able to develop sophisticated reasoning skills.\n```\nexport N_GPUS=2\nexport BASE_MODEL={path_to_your_model}\nexport DATA_DIR={path_to_your_dataset}\nexport ROLLOUT_TP_SIZE=2\nexport EXPERIMENT_NAME=countdown-qwen2.5-3b\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\nbash .\u002Fscripts\u002Ftrain_tiny_zero.sh\n```\n\n### Instruct Ablation\nWe experiment with Qwen-2.5-3B Instruct too.\n**Data Preparation**\nTo follow chat template, we need to reprocess the data:\n```\nconda activate zero\npython examples\u002Fdata_preprocess\u002Fcountdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}\n```\n\n**Training**\n```\nexport N_GPUS=2\nexport BASE_MODEL={path_to_your_model}\nexport DATA_DIR={path_to_your_dataset}\nexport ROLLOUT_TP_SIZE=2\nexport EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct\nexport VLLM_ATTENTION_BACKEND=XFORMERS\n\nbash .\u002Fscripts\u002Ftrain_tiny_zero.sh\n```\n\n## Acknowledgements\n* We run our experiments based on [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl).\n* We use Qwen2.5 series base model [Qwen2.5](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen2.5).\n\n## Citation\n```\n@misc{tinyzero,\nauthor       = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan and Hao Peng and Alane Suhr},\ntitle        = {TinyZero},\nhowpublished = {https:\u002F\u002Fgithub.com\u002FJiayi-Pan\u002FTinyZero},\nnote         = {Accessed: 2025-01-24},\nyear         = {2025}\n}\n```\n","TinyZero 是一个基于 DeepSeek R1-Zero 的极简复现项目，专注于倒计时和乘法任务。通过强化学习（RL），30亿参数的基础语言模型能够自主发展出自我验证和搜索能力。该项目使用 Python 编写，并基于 veRL 库构建，支持在单个或多个 GPU 上进行训练，对于 30 亿参数以上的模型，可以开发出复杂的推理技能。适合用于低成本（小于 30 美元）的实验环境来探索 AI 模型如何通过强化学习获得新能力的研究者与开发者。请注意，此仓库已不再积极维护，建议直接使用最新的 veRL 库进行 RL 实验。",2,"2026-06-11 03:39:31","high_star"]