[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81053":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},81053,"Repo2RLEnv","huggingface\u002FRepo2RLEnv","huggingface","Convert any Repo into an RL Environment ","",null,"Python",264,37,5,1,0,94,103,236,282,4.74,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:04:10","\n\n\u003Cp align=\"center\">\n  \u003Ch1 align=\"center\">Repo2RLEnv\u003C\u002Fh1>\n  \u003Cp align=\"center\">\u003Cb>Turn any GitHub repository into a verifiable RL environment for training and evaluation.\u003C\u002Fb>\u003C\u002Fp>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Frepo2rlenv\u002F\">\u003Cimg alt=\"PyPI\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Frepo2rlenv?color=blue\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpypi.org\u002Fproject\u002Frepo2rlenv\u002F\">\u003Cimg alt=\"Python versions\" src=\"https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fpyversions\u002Frepo2rlenv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002FRepo2RLEnv\u002Factions\u002Fworkflows\u002Fci.yml\">\u003Cimg alt=\"CI\" src=\"https:\u002F\u002Fgithub.com\u002Fhuggingface\u002FRepo2RLEnv\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg\">\u003C\u002Fa>\n  \u003Ca href=\".\u002FLICENSE\">\u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-green\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fharbor-framework\u002Fharbor\">\u003Cimg alt=\"Harbor\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fspec-Harbor-FFD21F\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#quickstart\">Quickstart\u003C\u002Fa> ·\n  \u003Ca href=\"#pipelines\">Pipelines\u003C\u002Fa> ·\n  \u003Ca href=\"#what-you-get-out\">Output\u003C\u002Fa> ·\n  \u003Ca href=\"#documentation\">Docs\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fbanner.png\" alt=\"Repo2RLEnv — turn any repo into verifiable RL environments\" width=\"100%\">\n\u003C\u002Fp>\n\nRepo2RLEnv synthesizes **verifiable** training and evaluation data from existing repositories, exports it into a uniform spec, and pushes it straight to the Hugging Face Hub. The output spec is [Harbor](https:\u002F\u002Fgithub.com\u002Fharbor-framework\u002Fharbor)'s, so every dataset you produce drops directly into any Harbor-compatible runtime — no glue code.\n\n\n## Quickstart\n\n```bash\n# Install (pick one)\nuv add repo2rlenv                                 # add to a uv-managed project\nuvx repo2rlenv --help                             # one-shot, no install\npip install repo2rlenv                            # classic\n\n# Auth: nothing to set up if you've done `gh auth login` and `huggingface-cli login`.\n# Otherwise:  export GITHUB_TOKEN=... ; export HF_TOKEN=...\n\n# Generate a dataset locally\nrepo2rlenv generate \\\n  --repo \u003Cowner>\u002F\u003Crepo> \\\n  --pipeline pr_runtime \\\n  --pipeline-opt limit=5 \\\n  --llm anthropic\u002Fclaude-sonnet-4-6 \\\n  --out .\u002Fdatasets\u002F\u003Cdataset-name>\n\n# Validate (fast structural check) and publish\nrepo2rlenv validate .\u002Fdatasets\u002F\u003Cdataset-name>\nrepo2rlenv push .\u002Fdatasets\u002F\u003Cdataset-name> \u003Cyour-org>\u002F\u003Cdataset-name>\n\n# Anyone can pull + run a published dataset on a fresh machine\nrepo2rlenv pull \u003Cyour-org>\u002F\u003Cdataset-name> .\u002Fdatasets\u002F\u003Cdataset-name>\nharbor run -p .\u002Fdatasets\u002F\u003Cdataset-name> -a oracle --env docker\n```\n\nFull walkthrough in [**`docs\u002Fquickstart.md`**](.\u002Fdocs\u002Fquickstart.md).\n\n## How it works\n\nRepo2RLEnv runs **synthesis pipelines** that read real repositories — source code, merged PRs, commits, CVEs — and use them as a *seed* to generate RL environments: tasks with a concrete, solvable objective and a programmatic reward (no human grading).\n\n**Input: any repo. Output: a runnable RL environment** you can point any LLM or coding agent at.\n\n```python\n# every pipeline shares one contract: read a repo, emit verifiable tasks\nclass Pipeline(Protocol):\n    name: ClassVar[PipelineName]\n    def run(self, out_dir: Path) -> PipelineResult: ...   # writes tasks\u002F\u003Cid>\u002F\n```\n\nGenerate from a repo, then run any agent against the result — the environment is scored automatically:\n\n```bash\n# 1. synthesize an environment from a repo\nrepo2rlenv generate --repo pallets\u002Fclick --pipeline pr_runtime \\\n  --pipeline-opt limit=10 --llm anthropic\u002Fclaude-sonnet-4-6 --out .\u002Fenv-click\n\n# 2. run an agent inside the sandbox (swap -a \u002F -m for any of 25+ harnesses)\nexport ANTHROPIC_API_KEY=...   OPENAI_API_KEY=...\nharbor run -p .\u002Fenv-click -a claude-code -m anthropic\u002Fclaude-sonnet-4-6 --ae ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY --env docker\nharbor run -p .\u002Fenv-click -a openhands   -m openai\u002Fgpt-4o               --ae OPENAI_API_KEY=$OPENAI_API_KEY     --env docker\nharbor run -p .\u002Fenv-click -a codex       -m openai\u002Fo3                   --ae OPENAI_API_KEY=$OPENAI_API_KEY     --env docker\nharbor run -p .\u002Fenv-click -a hermes      -m anthropic\u002Fclaude-sonnet-4-6 --ae ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY --env docker\n```\n\nEach agent's per-task reward lands in `\u002Flogs\u002Fverifier\u002Freward.json`, ready for training or eval.\n\n## Pipelines\n\nA pipeline turns a repo into Harbor tasks. **Two are stable** and recommended for production; **six are experimental** — usable today (the CLI prints a warning before they run), with interfaces and output quality still evolving.\n\n### Stable\n\n**[`pr_diff`](.\u002Fdocs\u002Fpipelines\u002Fpr_diff.md)** mines merged pull-request diffs into lightweight, text-only tasks. The agent proposes an edit, and a verifier scores it against the real merged diff — on format, the files it touched, how much it changed, and (via an LLM judge) whether it's semantically right. No per-repo setup: every task ships a thin `python:3.12-slim` image.\n→ Reference dataset: [`AdithyaSK\u002Frepo2rlenv-pr-diff`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAdithyaSK\u002Frepo2rlenv-pr-diff) (100 oracle-verified tasks).\n\n**[`pr_runtime`](.\u002Fdocs\u002Fpipelines\u002Fpr_runtime.md)** is the SWE-bench-style flagship. It mines merged PRs and actually runs the repo's test suite inside a Docker sandbox: the tests the PR fixed must go from failing to passing under the gold patch, while the rest keep passing. That makes it the strongest, least-gameable signal of the set.\n→ Reference dataset: [`AdithyaSK\u002Frepo2rlenv-pr-runtime`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAdithyaSK\u002Frepo2rlenv-pr-runtime) (100 oracle-verified tasks).\n\n### Experimental\n\n> These run normally but emit a warning first — pin a release if you depend on them. Each links to its own page; the gist:\n\n- **[`commit_runtime`](.\u002Fdocs\u002Fpipelines\u002Fcommit_runtime.md)** — mines commit history directly, catching fixes that never went through a PR. Reference dataset: [`AdithyaSK\u002Frepo2rlenv-commit-runtime`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAdithyaSK\u002Frepo2rlenv-commit-runtime) (52 oracle-verified envs).\n- **[`cve_patches`](.\u002Fdocs\u002Fpipelines\u002Fcve_patches.md)** — security tasks from public CVEs, mapped to their fix commits.\n- **[`mutation_bugs`](.\u002Fdocs\u002Fpipelines\u002Fmutation_bugs.md)** — injects synthetic bugs into real code; the agent must restore the tests to green.\n- **[`code_instruct`](.\u002Fdocs\u002Fpipelines\u002Fcode_instruct.md)** — generates a problem + executable verifier from a real source file.\n- **[`equivalence_tests`](.\u002Fdocs\u002Fpipelines\u002Fequivalence_tests.md)** — the agent reimplements a real function; generated tests check it matches the original.\n- **[`refactor_synthesis`](.\u002Fdocs\u002Fpipelines\u002Frefactor_synthesis.md)** — mines refactor commits and verifies behavior is preserved.\n\n### At a glance\n\n| Pipeline | Stability | Sandbox | LLM use | Languages |\n|---|:-:|:-:|---|---|\n| `pr_diff` | stable | thin | at verify — judges the solution | any |\n| `pr_runtime` | stable | ✅ | at env build — one-time, cached | Py · Go · Node · Rust |\n| `commit_runtime` | experimental | ✅ | at env build — one-time, cached | Py · Go · Node · Rust |\n| `cve_patches` | experimental | ✅ | at env build — one-time, cached | Py · Go · Node · Rust |\n| `mutation_bugs` | experimental | ✅ | at synthesis — writes the task | Py |\n| `code_instruct` | experimental | ✅ | at synthesis — writes the task | Py |\n| `equivalence_tests` | experimental | ✅ | at synthesis — writes the task | Py |\n| `refactor_synthesis` | experimental | ✅ | at env build — one-time, cached | Py |\n\n**What the columns mean**\n- **Sandbox** — whether the task runs inside Docker. `✅` = a per-repo image is built once by the [bootstrap phase](#bootstrap) and cached; `thin` = no bootstrap, just a generic `python:3.12-slim` image.\n- **LLM use** — *when* a language model is invoked, which sets where your API cost goes:\n  - **at env build** — only during bootstrap (constructing the Docker image); cached, so generation itself is LLM-free.\n  - **at synthesis** — the model authors the task (problem + verifier) for every task generated.\n  - **at verify** — the model judges the agent's solution at scoring time (one reward component), and degrades gracefully when no key is set.\n- **Languages** — source languages the pipeline supports.\n\n→ **Full reference** — per-pipeline options, reward design, and dataset cards: [**`docs\u002Fpipelines\u002F`**](.\u002Fdocs\u002Fpipelines\u002FREADME.md).\n\n## Bootstrap\n\nSandbox pipelines need a working Docker environment for the target repo. Repo2RLEnv's **bootstrap phase** builds it automatically — an LLM agent iterates shell commands inside a fresh container until the repo builds and its test suite collects, then commits and content-addresses the image. The expensive step runs **once per (repo, ref)**; every downstream task reuses the cache. `pr_diff` skips it entirely.\n\n```bash\nrepo2rlenv bootstrap --repo \u003Cowner>\u002F\u003Crepo> --llm anthropic\u002Fclaude-sonnet-4-6\n```\n\nDesign, cache layout, cost tracking: [`docs\u002Freference\u002FBOOTSTRAP.md`](.\u002Fdocs\u002Freference\u002FBOOTSTRAP.md).\n\n## What you get out\n\nA dataset that:\n\n- **Is verifiable** — every task carries an executable test (`test_execution`) or a stored oracle diff (`diff_similarity`); your trainer picks the reward type.\n- **Is content-addressed** — a `content_hash` over each task; identical artifacts ⇒ identical hash.\n- **Trains anywhere via Harbor** — TRL, SkyRL, Prime-RL, Tinker, Miles, Slime, harbor.rl.\n- **Evaluates with any agent harness** — Claude Code, OpenHands, Codex CLI, Gemini CLI, …\n- **Is language-agnostic by spec** — runtime pipelines emit a Dockerfile + shell verifier; `pr_diff` is pure text and works for any language.\n- **Publishes natively** to the Hub — `repo2rlenv push` writes a Harbor-compatible `registry.json` so consumers `harbor download` (or `repo2rlenv pull`) with zero glue.\n- **Supports private repos** end-to-end — `gh auth token` resolved automatically; build secrets declared by name; verifier-time secrets forbidden by spec.\n\n## Under the hood\n\nOur focus is **synthesis** — we don't reimplement sandboxes, agent harnesses, or a registry. Tasks are emitted in the [Harbor](https:\u002F\u002Fgithub.com\u002Fharbor-framework\u002Fharbor) format (with a small `[metadata.repo2env]` block for provenance: pipeline, base commit, PR URL, content hash, reward kinds), so they run on Harbor's existing stack — Local Docker \u002F Modal \u002F Daytona \u002F E2B \u002F Runloop, 25+ agent harnesses, parallel execution, and the publishing CLI.\n\n## Contributing a pipeline\n\nPipelines are pluggable by design — adding a synthesis strategy is the main way to extend Repo2RLEnv:\n\n1. Implement the `Pipeline` protocol (`name` + `run() -> PipelineResult`) in `src\u002Frepo2rlenv\u002Fpipelines\u002F`.\n2. Register it in `PIPELINES` and add its options model; new pipelines start `experimental = True`.\n3. `uv run pytest tests\u002Ftest_pipeline_contract.py` enforces the contract.\n\nFull cookbook (oracle invariant, reward design, QA gate): [**`docs\u002Fcontributing\u002FADDING_A_PIPELINE.md`**](.\u002Fdocs\u002Fcontributing\u002FADDING_A_PIPELINE.md). Issues and PRs welcome — see [`CONTRIBUTING.md`](.\u002FCONTRIBUTING.md).\n\n## Documentation\n\n- 🚀 [**`docs\u002Fquickstart.md`**](.\u002Fdocs\u002Fquickstart.md) — install → first dataset → push, in 10 minutes\n- 📖 [**`docs\u002Fpipelines\u002F`**](.\u002Fdocs\u002Fpipelines\u002FREADME.md) — one page per pipeline (when to use, oracle shape, options)\n- 📚 Reference contracts:\n  - [`SPEC.md`](.\u002Fdocs\u002Freference\u002FSPEC.md) — input\u002Foutput contract\n  - [`API.md`](.\u002Fdocs\u002Freference\u002FAPI.md) — Python API for `src\u002Frepo2rlenv\u002F`\n  - [`AUTH.md`](.\u002Fdocs\u002Freference\u002FAUTH.md) — GitHub \u002F HF \u002F LLM auth resolution\n  - [`ENV.md`](.\u002Fdocs\u002Freference\u002FENV.md) — every environment variable the tool reads, in one place\n  - [`BOOTSTRAP.md`](.\u002Fdocs\u002Freference\u002FBOOTSTRAP.md) — LLM-iterated per-repo Docker image\n  - [`AGENTS.md`](.\u002Fdocs\u002Freference\u002FAGENTS.md) — Harbor agent harnesses + RL trace plumbing\n- 🛠 [**`CONTRIBUTING.md`**](.\u002FCONTRIBUTING.md) — dev setup, PR conventions, release flow\n- 🧪 [**`ADDING_A_PIPELINE.md`**](.\u002Fdocs\u002Fcontributing\u002FADDING_A_PIPELINE.md) — cookbook for shipping a new pipeline\n\n## Adjacent projects\n\n- [**Harbor**](https:\u002F\u002Fgithub.com\u002Fharbor-framework\u002Fharbor) — the task format + runtime we **adopt** as our output spec\n- [**RepoLaunch**](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FRepoLaunch) (Microsoft) — LLM-agent env setup; our `bootstrap` is an independent reimplementation\n- [**OpenReward**](https:\u002F\u002Fdocs.openreward.ai) — ORS protocol + extra trainer integrations above Harbor\n- [**SWE-Gym**](https:\u002F\u002Fgithub.com\u002FSWE-Gym\u002FSWE-Gym) — RL-environment framing for SWE-bench-style tasks\n- [**verifiers**](https:\u002F\u002Fgithub.com\u002Fwillccbb\u002Fverifiers) (Prime Intellect), [**OpenEnv**](https:\u002F\u002Fgithub.com\u002Fmeta-pytorch\u002FOpenEnv) (Meta + HF) — adjacent standardization efforts\n\nEvery pipeline that draws from external work carries an Acknowledgment block in its `.py` file. No code is copied — implementations are independent and Apache-2.0 licensed.\n\n## License\n\n[Apache 2.0](.\u002FLICENSE). The original PR\u002Fcommit contents remain under their respective source-repo licenses; datasets redistribute public commits for ML research under fair use.\n","Repo2RLEnv 是一个将任意 GitHub 仓库转换为可验证的强化学习环境的工具。其核心功能是通过读取真实仓库中的源代码、合并的 PR、提交记录等信息，生成具有明确目标和程序化奖励的任务。该项目使用 Python 编写，并支持多种合成管道来处理不同类型的输入数据，最终输出符合 Harbor 规范的数据集，可以直接用于任何兼容 Harbor 的运行时环境中。适用于需要基于现有代码库创建定制化强化学习训练与评估场景的研究者或开发者。",2,"2026-06-11 04:03:19","CREATED_QUERY"]