THU-BPM

RLCSD

THU-BPM

Source code of paper "RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation"

Python

MIT License

large-language-models llm on-policy-distillation opd opsd reinforcement-learning self-distillation

在 GitHub 查看官方网站

51

Stars

2

Forks

2

Watchers

2

Issues

Star 增长

今日0

近 7 天0

近 30 天0

综合评分37.43

默认分支main

暂无 README 内容

项目可能尚未同步完成，请稍后查看