[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80693":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":13,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":21,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},80693,"SEAL","yihaohu0118\u002FSEAL","yihaohu0118","The source code for SEAL",null,"Python",69,4,1,0,18,23,2.1,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:04:05","\u003Ch1 align=\"center\">\n  SEAL: Synergistic Co-Evolution of Agents and Learning Environments\n\u003C\u002Fh1>\n\n\u003Cdiv align=\"center\">\n\n[![Homepage](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHomepage-SEAL-blue.svg)](https:\u002F\u002Fyihaohu0118.github.io\u002FSEAL\u002F)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-PDF-red.svg)](https:\u002F\u002Fyihaohu0118.github.io\u002FSEAL\u002Fstatic\u002Fpdfs\u002Fseal-paper.pdf)\n[![Poster](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPoster-PDF-orange.svg)](https:\u002F\u002Fyihaohu0118.github.io\u002FSEAL\u002Fstatic\u002Fpdfs\u002Fseal-poster.pdf)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache--2.0-green.svg)](LICENSE)\n\n\u003C\u002Fdiv>\n\n\n\u003Cp align=\"center\">\n  \u003Cb>Tool-Use Agents · Self-Evolution · Reinforcement Learning\u003C\u002Fb>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  Yihao Hu\u003Csup>*,1,2\u003C\u002Fsup>, Zhihao Wen\u003Csup>*,1\u003C\u002Fsup>, Xiujin Liu\u003Csup>3\u003C\u002Fsup>, Pan Wang\u003Csup>1,4\u003C\u002Fsup>, Xin Zhang\u003Csup>1\u003C\u002Fsup>, Wei Wu\u003Csup>1\u003C\u002Fsup>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Csup>*\u003C\u002Fsup>Equal Contribution · \u003Csup>1\u003C\u002Fsup>Ant Group · \u003Csup>2\u003C\u002Fsup>Westlake University · \u003Csup>3\u003C\u002Fsup>University of Michigan-Ann Arbor · \u003Csup>4\u003C\u002Fsup>University of Science and Technology of China\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fyihaohu0118.github.io\u002FSEAL\u002Fstatic\u002Fimages\u002Fseal-overview.png\" alt=\"SEAL overview\" width=\"86%\">\n\u003C\u002Fp>\n\n## Overview\n\nSEAL is a closed-loop co-evolution framework for interactive tool-use agents. It collects on-policy trajectories under executable verification, diagnoses failed rollouts into turn-level failure labels, and uses these diagnoses as a shared signal for both training-time interface evolution and model-side policy optimization.\n\nIn SEAL, the agent reveals its capability gaps, the learning interface adapts around these failures, and the policy internalizes the resulting feedback through GRPO. Evaluation remains strict: tool semantics, task labels, and the verifier are unchanged.\n\n## Highlights\n\n- **Verifier-grounded diagnosis:** executable traces are mapped to failure types such as invalid tool calls, argument mismatches, missing tool calls, recovery failures, and response mismatches.\n- **Training-time interface evolution:** BFCL observations expose schema affordances and recovery-oriented feedback without changing the test-time environment.\n- **Diagnosis-guided optimization:** diagnostic profiles reweight GRPO advantages while preserving the original verifier reward.\n\n## Quick Start\n\n```bash\ngit clone git@github.com:yihaohu0118\u002FSEAL.git\ncd SEAL\n\nconda create -n seal python=3.10 -y\nconda activate seal\npip install -r requirements.txt\n```\n\nPrepare and launch the BFCL environment:\n\n```bash\nbash env_service\u002Fenvironments\u002Fbfcl\u002Fsetup.sh\nconda activate bfcl\nbash env_service\u002Flaunch_script\u002Fbfcl.sh\n```\n\nRun the SEAL training recipe:\n\n```bash\nconda activate seal\npython launcher.py --conf exp\u002FSEAL.yaml\n```\n\n## Repository Layout\n\n```text\nexp\u002FSEAL.yaml                         Full SEAL training configuration\nenv_service\u002Fenvironments\u002Fbfcl\u002F        BFCL executable environment and interface adaptation\nagentevolver\u002Fmodule\u002Ftocf\u002F             Diagnostic state and advantage reweighting\nagentevolver\u002Fmodule\u002Ftask_manager\u002F     Rewards, task adapters, and BFCL grader\ndata\u002F                                 Released BFCL train\u002Fevaluation split files\n```\n\n## Citation\n\n```bibtex\n@article{hu2026seal,\n  title={SEAL: Synergistic Co-Evolution of Agents and Learning Environments},\n  author={Hu, Yihao and Wen, Zhihao and Liu, Xiujin and Wang, Pan and Zhang, Xin and Wu, Wei},\n  journal={Preprint},\n  year={2026},\n  url={https:\u002F\u002Fyihaohu0118.github.io\u002FSEAL\u002F}\n}\n```\n\n## License\n\nThis project is released under the [Apache License 2.0](LICENSE).\n","SEAL是一个用于交互式工具使用代理的闭环协同进化框架。该项目通过执行验证收集在线策略轨迹，将失败的运行诊断为逐级失败标签，并利用这些诊断作为训练时接口演化和模型侧策略优化的共享信号。其核心技术特点包括基于验证器的诊断、训练时接口演化以及诊断引导的优化。特别适合于需要提升工具使用代理能力，同时保持严格评估标准的场景，如复杂任务自动化、智能助手开发等。SEAL采用Python语言编写，遵循Apache License 2.0开源协议。",2,"2026-06-11 04:01:39","CREATED_QUERY"]