[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80715":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":12,"lastSyncTime":26,"discoverSource":27},80715,"EffOPD","caiyuchen-ustc\u002FEffOPD","caiyuchen-ustc","Repository for EffOPD. We are working on polishing the details.",null,"Python",65,2,44,0,12,21,3,46.53,false,"main",true,[],"2026-06-12 04:01:29","# EffOPD\r\n\r\nThis repository contains the implementation of **EffOPD**.\r\n\r\nThe analysis in the earlier part of our paper uses the code in the `analysis` folder.\r\n\r\n## Codebase\r\n\r\nEffOPD is implemented based on verl and GOPD. We mainly modify the following files:\r\n\r\n- `ppo_trainer.yaml`\r\n- `fsdp_workers.py`\r\n- `ray_trainer.py`\r\n\r\n## Training EffOPD\r\n\r\nThe training dataset can be download from : https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FKeven16\u002FG-OPD-Training-Data\r\n\r\nTo launch EffOPD training, please start from the original bash script used for training OPD, and add the following arguments:\r\n\r\n```bash\r\ntrainer.enable_iterative_test=True \\\r\ntrainer.max_test_iterations=5 \\\r\ndata.iterative_test_files=xxx.parquet\r\n```\r\n\r\nHere:\r\n\r\n- `trainer.enable_iterative_test=True` enables the EffOPD extrapolation search.\r\n- `trainer.max_test_iterations=5` sets the maximum number of extrapolated candidate parameters to evaluate at each exponential checkpoint. In our experiments, this value is set to `5`.\r\n- `data.iterative_test_files=xxx.parquet` specifies the data file used to construct the lightweight validation set for immediate validation. Please replace `xxx.parquet` with the actual path to the validation parquet file.\r\n\r\nIf you find this project interesting, feel free to ⭐ star the repository or open an issue for discussion!\r\n\r\nIf you use this code in your research, please cite:\r\n```bibtex\r\n@article{cai2026learning,\r\n  title={Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation},\r\n  author={Cai, Yuchen and Cao, Ding and Lin, Liang and Luo, Chunxi and Xu, Xin and Yang, Kai and Liu, Weijie and Yang, Saiyong and Zhao, Tianxiang and Sun, Guangzhong and others},\r\n  journal={arXiv preprint arXiv:2605.11739},\r\n  year={2026}\r\n}\r\n\r\n@article{cai2025predictability,\r\n  title={On Predictability of Reinforcement Learning Dynamics for Large Language Models},\r\n  author={Cai, Yuchen and Cao, Ding and Xu, Xin and Yao, Zijun and Huang, Yuqing and Tan, Zhenyu and Zhang, Benyi and Liu, Guiquan and Fang, Junfeng},\r\n  journal={arXiv preprint arXiv:2510.00553},\r\n  year={2025}\r\n}\r\n","EffOPD 是一个用于改进和优化强化学习模型训练过程的项目。它基于 verl 和 GOPD 实现，主要通过修改 `ppo_trainer.yaml`、`fsdp_workers.py` 和 `ray_trainer.py` 等文件来实现功能增强。EffOPD 引入了迭代测试机制，允许用户在每次指数检查点时评估最多 5 个外推候选参数，并通过轻量级验证集进行即时验证，从而提高模型训练效率。该项目适合需要对大规模语言模型进行高效训练和优化的研究者及开发者使用。","2026-06-11 04:01:46","CREATED_QUERY"]