[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2421":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":13,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":15,"lastSyncTime":29,"discoverSource":30},2421,"continual-learning-bench","pgasawa\u002Fcontinual-learning-bench","pgasawa","Continual Learning Bench","https:\u002F\u002Fcontinual-learning-bench.com",null,"Python",145,18,81,2,0,26,51,54,81.94,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:00:14","# Continual Learning Bench\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo.png\" alt=\"Continual Learning Bench\" width=\"200\"\u002F>\n\u003C\u002Fp>\n\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-join-5865F2?logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002F7bxjNdfbfH) [![Website](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-continual--learning--bench.com-2ea44f?logo=googlechrome&logoColor=white)](https:\u002F\u002Fcontinual-learning-bench.com\u002F) [![Docs](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocs-continual--learning--bench.com%2Fdocs-blue?logo=readthedocs&logoColor=white)](https:\u002F\u002Fcontinual-learning-bench.com\u002Fdocs\u002F)\n\n**Continual Learning Bench** measures how well AI agents learn from past environment interactions, the defining challenge for agents that operate over extended horizons and are expected to improve online.\n\n## Quickstart\n\nContinual Learning Bench requires Python 3.13 or later, [uv](https:\u002F\u002Fgithub.com\u002Fastral-sh\u002Fuv), and a local Docker installation for tasks and systems that run containerized workspaces.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fpgasawa\u002Fcontinual-learning-bench && cd continual-learning-bench\nuv sync --all-extras && source .venv\u002Fbin\u002Factivate\npre-commit install\nclbench setup --all\n```\n\nAdd any model provider keys you need to a `.env` file in the project root, then verify the CLI and run a small benchmark:\n\n```bash\nclbench list\nclbench run exploitable_poker --schedule quick_test --system icl\n```\n\nThe [Quickstart Guide](https:\u002F\u002Fcontinual-learning-bench.com\u002Fdocs\u002Fquickstart\u002F) walks through the same flow in more detail, including how to inspect tasks, systems, schedules, and run outputs.\n\n## Further Documentation\n\n- [Installation](https:\u002F\u002Fcontinual-learning-bench.com\u002Fdocs\u002Finstallation\u002F) — Full setup notes and prerequisites\n- [Task Gallery](https:\u002F\u002Fcontinual-learning-bench.com\u002Ftasks.html) — Browse all available tasks\n- [Leaderboard](https:\u002F\u002Fcontinual-learning-bench.com\u002Fleaderboard.html) — See how models compare\n- [Viewers](https:\u002F\u002Fcontinual-learning-bench.com\u002Fdocs\u002Fviewers\u002F) — Viewing and comparing results\n- [Docs](https:\u002F\u002Fcontinual-learning-bench.com\u002Fdocs) — Concepts, metrics, contribution guides, and more\n\n## Core Components\n\n### Dataset of Tasks\n\nEach task spans multiple episodes in a shared environment, so agents that remember feedback and adapt should outperform those that solve each instance from scratch. Tasks include:\n\n- a set of constructed task instances with a learnable structure,\n- an evaluation script and reward metric measuring improvement over episodes,\n- schedules and variants for repeatable comparisons across systems.\n\nTasks live in `src\u002Ftasks\u002F`. See the [Task Gallery](https:\u002F\u002Fcontinual-learning-bench.com\u002Ftasks.html) for an overview that's easy to browse.\n\n### Systems\n\nSystems are the agents evaluated by the benchmark. Built-in baselines and model-backed systems live in `src\u002Fsystems\u002F`, and custom systems can be added to test new memory, retrieval, prompting, or tool-use strategies.\n\n### Execution Harness\n\nThe harness connects systems to task environments, manages multi-episode rollouts, records traces, and writes viewer artifacts for analysis. After installing, explore available options with:\n\n```bash\nclbench run --help\nclbench run-all --help\n```\n\n## Contribution\n\n- [Roadmap](https:\u002F\u002Fcontinual-learning-bench.com\u002Fdocs\u002Froadmap\u002F) — See what we're working on and where you can help\n- [Contributing a New Task](https:\u002F\u002Fcontinual-learning-bench.com\u002Fdocs\u002Fcontributing-tasks\u002F) — Add a benchmark environment\n- [Contributing a New System](https:\u002F\u002Fcontinual-learning-bench.com\u002Fdocs\u002Fcontributing-systems\u002F) — Add an evaluated agent\n\nContributions are welcome, especially new tasks that stress-test long-horizon learning. Reach out on [Discord](https:\u002F\u002Fdiscord.gg\u002F7bxjNdfbfH) to discuss ideas before diving in.\n\n## Citing Us\n\nIf you found Continual Learning Bench useful, please cite us as:\n\n```bibtex\n@misc{clbench2026,\n      title={Continual Learning Bench},\n      author={Parth Asawa and Chris Glaze and Gabe Orlanski and Ramya Ramakrishnan and Benji Xu and Asim Biswal and Vincent Sunn Chen and Frederic Sala and Matei Zaharia and Joseph E. Gonzalez},\n      year={2026},\n}\n```\n","Continual Learning Bench 是一个用于评估AI代理在持续学习场景下表现的工具。其核心功能包括一系列可重复的任务，这些任务设计用来测试代理从过去环境互动中学习的能力，并且提供了一个执行框架来连接系统与任务环境。该项目采用Python编写，支持自定义系统以测试不同的记忆、检索、提示或工具使用策略。它特别适合于需要长期运行并期望在线改进的AI代理的研究和开发场景。通过这个平台，研究人员可以比较不同模型的表现，并探索更有效的持续学习方法。","2026-06-11 02:49:52","CREATED_QUERY"]