[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-4404":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},4404,"rlm-forge","Q00\u002Frlm-forge","Q00","Runtime-lifted Recursive Language Model primitive for Hermes Agent and Ouroboros, with TraceGuard evidence gating",null,"Python",108,10,47,0,1,14,3,3.12,"MIT License",false,"main",true,[],"2026-06-12 02:01:02","\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Frlm-forge-hero.webp\" alt=\"RLM-FORGE — Frontier Recursion Lab\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg alt=\"RLM-FORGE sigil\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRLM--FORGE-Hermes%20Inner%20Runtime-7170ff?style=for-the-badge&labelColor=08090a\" \u002F>\n  \u003Cimg alt=\"TraceGuard\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTraceGuard-evidence--gated-10b981?style=for-the-badge&labelColor=08090a\" \u002F>\n  \u003Cimg alt=\"Python\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.12-0f1011?style=for-the-badge&labelColor=08090a\" \u002F>\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">RLM-FORGE\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>A tiny recursive-runtime forge for Hermes Agent.\u003C\u002Fstrong>\n  \u003Cbr \u002F>\n  Ouroboros owns the recursion. Hermes performs bounded inner calls. TraceGuard refuses unsupported synthesis.\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#quickstart\">Quickstart\u003C\u002Fa>\n  ·\n  \u003Ca href=\"#what-this-proves\">What this proves\u003C\u002Fa>\n  ·\n  \u003Ca href=\"#traceguard\">TraceGuard\u003C\u002Fa>\n  ·\n  \u003Ca href=\"docs\u002Farchitecture.md\">Architecture\u003C\u002Fa>\n  ·\n  \u003Ca href=\"paper\u002Fmain.pdf\">Paper\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Ch2 align=\"center\">\n  \u003Ca href=\"paper\u002Fmain.pdf\">Read the RLM-FORGE paper\u003C\u002Fa>\n\u003C\u002Fh2>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"paper\u002Fmain.pdf\">\n    \u003Cimg alt=\"Paper PDF\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-PDF%20artifact-f8fafc?style=for-the-badge&labelColor=111827\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"paper\u002Fmain.tex\">\n    \u003Cimg alt=\"LaTeX source\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSource-LaTeX-38bdf8?style=for-the-badge&labelColor=111827\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Runtime-Lifted Recursive Language Models for Agent Infrastructure.\u003C\u002Fstrong>\n  \u003Cbr \u002F>\n  The paper states the systems claim, corrected benchmark result, TraceGuard enforcement model, layered memory finding, and 24-cell live portability evidence.\n\u003C\u002Fp>\n\n---\n\n> **A Hermes-backed field instrument for runtime-lifted Recursive Language Models.**  \n> Inspired by Zhang\u002FKraska\u002FKhattab — *Recursive Language Models* (arXiv 2512.24601).\n\n```text\n╭────────────────────────────────────────────────────────────────────╮\n│                            RLM-FORGE                               │\n│                                                                    │\n│  user request                                                      │\n│      │                                                             │\n│      ▼                                                             │\n│  Ouroboros outer scaffold       recursion · state · trace replay   │\n│      │                                                             │\n│      ▼                                                             │\n│  Hermes Agent inner runtime      bounded JSON sub-calls            │\n│      │                                                             │\n│      ▼                                                             │\n│  TraceGuard                    parent claims must cite evidence    │\n╰────────────────────────────────────────────────────────────────────╯\n```\n\n```text\nHermes MEMORY.md + RLM-FORGE memory_priors  -> behavioral priors\nfresh child evidence manifest               -> admissible parent evidence\nTraceGuard                                  -> deterministic acceptance gate\n```\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Farchitecture.png\" alt=\"RLM-FORGE runtime structure diagram showing Ouroboros recursion control, Hermes bounded calls, provider execution, TraceGuard evidence gating, and commit or retry paths\" \u002F>\n\u003C\u002Fp>\n\nRLM-FORGE is not a new model architecture. It is a runtime-hosted realization of a Recursive Language Model style execution loop:\n\n- **Hermes Agent** acts as the inner LM runtime.\n- **Ouroboros** owns recursion, scheduling, state mutation, termination, and trace replay.\n- **TraceGuard** validates that parent synthesis only claims facts backed by accepted child evidence handles.\n\nThe result is a compact, replayable, evidence-gated RLM scaffold built on top of the existing Hermes tool\u002Fruntime interface.\n\n---\n\n## What this proves\n\nRLM-FORGE makes one careful claim:\n\n> Recursive execution is useful when it creates structured evidence handles that an outer scaffold can validate. Recursion alone is not trusted.\n\nThe current committed result supports the following bounded public claim:\n\n> RLM-FORGE completed a 24-cell live primary matrix across Hermes GLM, Claude Code Opus, and Codex GPT-5.5 in read-write memory mode. All cells passed the RLM-FORGE+TraceGuard contract, with fresh child evidence validation passing in every cell and zero unsupported parent claims accepted. Memory was used only as an operational prior for schema stability, not as admissible factual evidence.\n\nA set of deterministic memory-runtime benchmarks isolates the performance claim\nwe can safely make today: guarded operational memory can reject answer-memory\ncontamination, separate layered memory roles, and reduce validation\u002Frepair work\nfor known schema failure modes. It does **not** prove live model-quality,\nlatency, token, or cost improvement.\n\n| Benchmark | Strongest deterministic result |\n| --- | --- |\n| Memory contamination robustness | Unguarded adversarial memory accepts answer contamination at 1.0000; TraceGuard accepts it at 0.0000 |\n| Layered memory ablation | Hermes memory + RLM-FORGE memory reaches 1.0000 initial accept with 0.0000 repairs |\n| Adaptive repair memory | Mean repair calls fall from 1.0000 to 0.1250 across repeated related tasks |\n| Guarded memory runtime benefit | Initial TraceGuard accept rises from 0.2500 to 1.0000 versus no memory |\n\nArtifact: [`experiments\u002Fmemory-runtime-benefit-benchmark.md`](experiments\u002Fmemory-runtime-benefit-benchmark.md)\n\nWe also ran the paper itself through the dependency `ooo rlm` path. The run\ndecomposed the paper target into bounded chunks, executed child Hermes calls,\nand synthesized a parent answer from the child results. RLM-FORGE now provides\nproject-local `ouroboros`\u002F`ooo` console-script wrappers for `uv run ouroboros\nrlm` or `uv run ooo rlm` that install TraceGuard immediately after parent\nsynthesis, then delegate to the upstream Ouroboros Typer app. The earlier\npersisted post-run gate over that exact parent output accepted the\nevidence-backed parent and rejected an injected memory-answer claim:\n\n| Case | TraceGuard accepted | Unsupported rate | Rejection |\n| --- | ---: | ---: | --- |\n| exact `ooo rlm` parent | true | 0.0000 | none |\n| parent + unsafe memory answer | false | 0.0667 | `unsupported_fact_id` |\n\nArtifacts:\n[`experiments\u002Fpaper-key-sections-ooo-rlm-demo.md`](experiments\u002Fpaper-key-sections-ooo-rlm-demo.md),\n[`experiments\u002Fpaper-ooo-rlm-traceguard-gate.md`](experiments\u002Fpaper-ooo-rlm-traceguard-gate.md)\n\nOn the current live Hermes long-context truncation fixture, recursive RLM and vanilla single-call Hermes are an honest **tie**:\n\n| Metric | Vanilla single call | Recursive RLM |\n| --- | ---: | ---: |\n| Hermes sub-calls | 1 | 5 |\n| Quality score | 1.00 | 1.00 |\n| Score delta | — | +0.00 |\n| `omitted_fact_safety_score` | 1.00 | 1.00 |\n| `claimed_omitted_fact_ids` | `[]` | `[]` |\n| `cited_retained_fact_ids` | `LC-001..LC-004` | `LC-001..LC-004` |\n\nEarlier artifacts reported a `+0.20` RLM advantage because the scorer treated guarded residual-gap text such as “LC-005 and LC-006 cannot be claimed” as a positive omitted-fact claim. The claim-aware scorer fixes that. The contribution here is **not** a quality win on one fixture; it is the Hermes-backed recursive runtime path plus deterministic evidence enforcement.\n\nPersisted artifact: [`benchmarks\u002Frlm-long-context-truncation-v1.json`](benchmarks\u002Frlm-long-context-truncation-v1.json)\n\nLive primary portability:\n[`experiments\u002Flive-portability-primary.md`](experiments\u002Flive-portability-primary.md)\nruns 8 shared fixtures through three runtime\u002Fprovider families using the same\nRLM-FORGE+TraceGuard contract. This is a 24-cell primary matrix, not the full\n96-cell baseline sweep. Latest aggregate status is **pass**: Hermes+GLM,\nClaude Code, and Codex each complete and pass all 8 primary fixtures. The\nruntime has also seen a concrete missing-handle failure class in an earlier\nHermes+GLM run; that class is now covered by a deterministic TraceGuard\nrepair\u002Fretry loop. If it recurs, the runtime has a recovery path instead of\nonly a reject-and-log path. The latest live run passes before repair is needed,\nso the repair loop is evidence of the runtime control surface rather than a\nclaimed benchmark win in the latest matrix.\n\n| Family | Model alias | Primary cells | Live result | Mean latency |\n| --- | --- | ---: | --- | ---: |\n| `hermes_glm` | `glm-4.7` via Z.AI | 8\u002F8 pass | `pass` | 156s |\n| `claude_code_opus47` | `opus` via Claude Code | 8\u002F8 pass | `pass` | 69s |\n| `codex_gpt55` | `gpt-5.5` via Codex CLI | 8\u002F8 pass | `pass` | 53s |\n\nLive portability smoke:\n[`experiments\u002Flive-portability-smoke.md`](experiments\u002Flive-portability-smoke.md)\nremains the 1-fixture adapter\u002Fauth check that preceded the primary run.\n\nTraceGuard enforcement demo:\n[`experiments\u002Ftraceguard-demo.md`](experiments\u002Ftraceguard-demo.md)\nshows the new evidence gate in action. Safe parent synthesis is accepted,\nan omitted fact is rejected with `unsupported_fact_id`, and chunk-only\nevidence is rejected with `chunk_handle_without_fact`. This turns the main\nclaim from “we measured unsupported claims” into “we can enforce the evidence\ncontract at parent synthesis time.”\n\n---\n\n## Why Hermes\n\nHermes is unusually well-suited to this kind of runtime experiment:\n\n| Hermes property | Why it matters for RLM-FORGE |\n| --- | --- |\n| Provider-agnostic runtime | The inner LM can be swapped with `hermes model` without changing the recursion scaffold. |\n| Tool\u002FRPC-shaped execution | Hermes' structured “one bounded task in, one result out” style maps naturally to RLM sub-call envelopes. |\n| Quiet structured I\u002FO | Ouroboros can call Hermes like a function and validate the resulting JSON. |\n| Isolated subagent potential | Future RLM trees can expand horizontally through Hermes subagents instead of a single serial path. |\n\nRLM-FORGE treats Hermes as the only recursive inference boundary. Hermes proposes local decomposition, atomic execution, summary, or synthesis for one bounded node. Ouroboros alone decides recursion, mutation, retry, and termination.\n\n---\n\n## TraceGuard\n\nTraceGuard is the small deterministic layer that turns a trace into an enforceable contract.\n\n```python\nfrom rlm_forge import build_manifest_from_fixture, validate_parent_synthesis\n\nresult = validate_parent_synthesis(\n    evidence_manifest=build_manifest_from_fixture(fixture),\n    parent_synthesis=parent_json,\n)\n```\n\nRepresentative no-API output:\n\n```text\nsafe_parent_synthesis: ACCEPT (unsupported_claim_rate=0.0000)\nunsafe_omitted_fact: REJECT (unsupported_claim_rate=0.2000)\nchunk_only_no_fact: REJECT (unsupported_claim_rate=1.0000)\n```\n\nTraceGuard rejects two important failure modes:\n\n| Failure mode | Rejection reason |\n| --- | --- |\n| Parent claims an omitted fact not present in accepted child evidence | `unsupported_fact_id` |\n| Parent cites a chunk handle but no supported fact | `chunk_handle_without_fact` |\n\nDemo artifact: [`experiments\u002Ftraceguard-demo.md`](experiments\u002Ftraceguard-demo.md)\n\n---\n\n## Quickstart\n\nRLM-FORGE requires **Python 3.12+**.\n\n### 1. Install Hermes\n\n```bash\ncurl -fsSL https:\u002F\u002Fraw.githubusercontent.com\u002FNousResearch\u002Fhermes-agent\u002Fmain\u002Fscripts\u002Finstall.sh | bash\nsource ~\u002F.bashrc       # or ~\u002F.zshrc\nhermes setup           # configure provider + API key\nhermes --version       # confirm v0.11+\n```\n\nThe simplest provider for judges is OpenRouter: set `OPENROUTER_API_KEY` and select any model.\n\n### 2. Install RLM-FORGE\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FQ00\u002Frlm-forge.git\ncd rlm-forge\npython3.12 -m venv .venv\nsource .venv\u002Fbin\u002Factivate\npip install -e '.[dev]'\n```\n\nThe package pins a git-ref dependency on the Ouroboros commit that contains the RLM modules and claim-aware scorer until those APIs are released on PyPI.\n\n### 3. Verify without a Hermes API key\n\n```bash\npytest -q\npython3 -m rlm_forge.replay benchmarks\u002Frlm-long-context-truncation-v1.json\npython3 scripts\u002Frun-traceguard-demo.py\n```\n\nExpected replay signal:\n\n```text\nquality: vanilla=1.00, rlm=1.00, delta=+0.00, rlm_outperforms_vanilla=False\n```\n\n### 4. Run the live truncation benchmark\n\n```bash\nooo rlm --truncation-benchmark\n```\n\nThis command performs one vanilla Hermes call plus five recursive Hermes sub-calls. Use the replay command above for no-API evaluation.\n\nExpected live shape:\n\n```text\nShared truncation benchmark completed; vanilla Hermes and recursive RLM outputs were recorded.\nhermes_subcalls: vanilla=1, rlm=5\nchunks: selected=4, omitted=2\nquality: vanilla=1.00, rlm=1.00, delta=+0.00, rlm_outperforms_vanilla=False\n```\n\n---\n\n## Two integration paths\n\n| Path | Entry point | Where Hermes is called |\n| --- | --- | --- |\n| Recursive scaffold | `ooo rlm --truncation-benchmark` | `ouroboros.rlm.loop.RLMOuterScaffoldLoop` drives 1 root + 4 chunk sub-calls through `HermesCliRuntime`. |\n| AC decomposition pipeline | `decompose_ac(hermes_runtime=...)` | `ouroboros.execution.decomposition.decompose_ac` accepts an `AgentRuntime` and delegates child-AC generation to Hermes. |\n\nThe default `ooo run` and `ooo evolve` flows keep their original LLM-only behaviour. Passing `hermes_runtime=None` bypasses every RLM-specific branch.\n\n---\n\n## Evidence map for judges\n\n| Claim | Artifact |\n| --- | --- |\n| TraceGuard enforces parent synthesis evidence handles | [`experiments\u002Ftraceguard-demo.md`](experiments\u002Ftraceguard-demo.md) |\n| Evidence-gated recursion is the mechanism, not recursion alone | [`experiments\u002Funsupported-claim-rate-benchmark.md`](experiments\u002Funsupported-claim-rate-benchmark.md) |\n| Guarded memory reduces repair work in a controlled runtime benchmark | [`experiments\u002Fmemory-runtime-benefit-benchmark.md`](experiments\u002Fmemory-runtime-benefit-benchmark.md) |\n| TraceGuard prevents answer-memory contamination from becoming accepted state | [`experiments\u002Fmemory-contamination-robustness-benchmark.md`](experiments\u002Fmemory-contamination-robustness-benchmark.md) |\n| Hermes memory and RLM-FORGE memory have separable runtime roles | [`experiments\u002Flayered-memory-ablation-benchmark.md`](experiments\u002Flayered-memory-ablation-benchmark.md) |\n| Adaptive repair memory reduces repeated repair calls | [`experiments\u002Fadaptive-repair-memory-benchmark.md`](experiments\u002Fadaptive-repair-memory-benchmark.md) |\n| Hermes built-in memory creates an observed layered-memory prompt effect | [`docs\u002Fhermes-layered-memory-flow.md`](docs\u002Fhermes-layered-memory-flow.md) |\n| Paper runs through actual `ooo rlm` child\u002Fparent path | [`experiments\u002Fpaper-key-sections-ooo-rlm-demo.md`](experiments\u002Fpaper-key-sections-ooo-rlm-demo.md) |\n| `uv run ouroboros rlm` \u002F `uv run ooo rlm` get a repo-local in-process TraceGuard gate | [`src\u002Frlm_forge\u002Fooo_rlm_traceguard.py`](src\u002Frlm_forge\u002Fooo_rlm_traceguard.py) |\n| In-process gate catches a raw parent child-handle bug and rejects memory answers | [`experiments\u002Fooo-rlm-inprocess-traceguard-gate.md`](experiments\u002Fooo-rlm-inprocess-traceguard-gate.md) |\n| Live `ooo rlm` run accepts parent synthesis through the in-process gate | [`experiments\u002Fooo-rlm-live-inprocess-traceguard-run.md`](experiments\u002Fooo-rlm-live-inprocess-traceguard-run.md) |\n| Post-run TraceGuard gate rejects injected memory-answer claim | [`experiments\u002Fpaper-ooo-rlm-traceguard-gate.md`](experiments\u002Fpaper-ooo-rlm-traceguard-gate.md) |\n| Claim-aware scorer avoids the earlier false win | [`experiments\u002Fclaim-aware-omitted-fact-suite.md`](experiments\u002Fclaim-aware-omitted-fact-suite.md) |\n| Broad deterministic scorer coverage | [`experiments\u002Fsynthetic-omitted-fact-benchmark.md`](experiments\u002Fsynthetic-omitted-fact-benchmark.md) |\n| Live Hermes fixture remains an honest tie | [`benchmarks\u002Frlm-long-context-truncation-v1.json`](benchmarks\u002Frlm-long-context-truncation-v1.json) |\n| 24-cell live primary portability matrix | [`experiments\u002Flive-portability-primary.md`](experiments\u002Flive-portability-primary.md) |\n| Three-family runtime portability smoke | [`experiments\u002Flive-portability-smoke.md`](experiments\u002Flive-portability-smoke.md) |\n| Architecture boundary | [`docs\u002Farchitecture.md`](docs\u002Farchitecture.md) |\n| Hermes setup notes | [`docs\u002Fhermes-setup.md`](docs\u002Fhermes-setup.md) |\n| Technical note | [`paper\u002Fmain.pdf`](paper\u002Fmain.pdf) |\n\nOffline replay, scorer, TraceGuard, and deterministic ablation artifacts do\nnot require a Hermes API key. The live portability artifacts are persisted for\ninspection; rerunning them requires provider credentials. TraceGuard improves\nunsupported-claim enforcement; it does not change the live fixture quality\nscore, which remains a tie.\n\nAdditional scorer experiment:\n[`experiments\u002Fclaim-aware-omitted-fact-suite.md`](experiments\u002Fclaim-aware-omitted-fact-suite.md)\nruns seven controlled completion shapes without Hermes. It verifies that the\ncorrected scorer accepts guarded gap mentions but rejects positive omitted-fact\nclaims and omitted evidence references.\n\nBroader scorer stress test:\n[`experiments\u002Fsynthetic-omitted-fact-benchmark.md`](experiments\u002Fsynthetic-omitted-fact-benchmark.md)\ngenerates 108 truncation fixtures and scores seven deterministic completion\nstrategies, for 756 total scorer checks. It is not a live-model benchmark; it\nsupports the narrower claim that the evaluation harness separates guarded gap\nmentions, unsupported omitted-fact claims, chunk-only citations, and missing\nboundary reports across many fixture shapes.\n\nContract ablation:\n[`experiments\u002Funsupported-claim-rate-benchmark.md`](experiments\u002Funsupported-claim-rate-benchmark.md)\ncompares six execution contracts over 72 generated fixtures. The\nevidence-gated Hermes-RLM contract has a 0.0000 unsupported-claim rate, while\nthe same recursive shape without evidence gating has a 1.0000 unsupported-claim\nrate. This supports the precise systems claim: recursion is useful because it\ncreates evidence handles that Ouroboros can validate, not because recursion\nalone makes hallucination impossible.\n\nMemory runtime benefit:\n[`experiments\u002Fmemory-runtime-benefit-benchmark.md`](experiments\u002Fmemory-runtime-benefit-benchmark.md)\ncompares three memory policies over 20 fixtures and four provider-failure\nprofiles. The guarded operational memory prior improves initial TraceGuard\nacceptance from 0.2500 to 1.0000 versus the no-memory policy and reduces mean\nrepair calls from 0.2500 to 0.0000. This is a controlled runtime-performance\nresult: memory helps the scaffold avoid known schema\u002Fhandle repair work, while\nTraceGuard still rejects answer-memory contamination.\n\nMemory contribution benchmarks:\n[`experiments\u002Fmemory-contamination-robustness-benchmark.md`](experiments\u002Fmemory-contamination-robustness-benchmark.md),\n[`experiments\u002Flayered-memory-ablation-benchmark.md`](experiments\u002Flayered-memory-ablation-benchmark.md),\nand [`experiments\u002Fadaptive-repair-memory-benchmark.md`](experiments\u002Fadaptive-repair-memory-benchmark.md)\nseparate three stronger claims. TraceGuard turns adversarial answer memory from\naccepted state into a rejected unsupported claim. Hermes-style prompt memory\nand RLM-FORGE guarded memory fix different deterministic failure classes in a\n2x2 ablation. Adaptive operational memory learns from the first missing-handle\nrepair and reduces mean repair calls from 1.0000 to 0.1250 on later related\ntasks.\n\nPaper `ooo rlm` demo:\n[`experiments\u002Fpaper-key-sections-ooo-rlm-demo.md`](experiments\u002Fpaper-key-sections-ooo-rlm-demo.md)\nrecords an actual dependency `ouroboros rlm` run, the terminal counterpart to\n`ooo rlm`, over a mechanically extracted paper contribution target. The run\nuses four child Hermes calls plus one parent synthesis call. The parent\nconsumes all child results and recovers the contribution boundary: runtime\nlifting, Ouroboros\u002FHermes\u002FTraceGuard role separation, memory as policy, and no\nlive quality\u002Fcost\u002Ftoken\u002Flatency overclaim.\n\nPost-run TraceGuard gate:\n[`experiments\u002Fpaper-ooo-rlm-traceguard-gate.md`](experiments\u002Fpaper-ooo-rlm-traceguard-gate.md)\nnormalizes that exact `ooo rlm` parent output into TraceGuard\nfact\u002Fevidence-handle form. The evidence-backed parent is accepted with\nunsupported rate 0.0000. The same parent contaminated with an unsupported\n`MEMORY-ANSWER` fact is rejected with `unsupported_fact_id`. This is a post-run\ngate over the persisted `ooo rlm` output.\n\nIn-process `ooo rlm` gate:\n[`src\u002Frlm_forge\u002Fooo_rlm_traceguard.py`](src\u002Frlm_forge\u002Fooo_rlm_traceguard.py)\nadapts the dependency Ouroboros parent schema into TraceGuard's existing\nfact\u002Fevidence-handle validator. It turns fresh `child_result_id` plus child\nchunk handles into the accepted evidence manifest, normalizes parent\n`supported_by_child_result_ids` into claim references, and rejects parent facts\nthat do not cite the current run's child evidence. [`src\u002Frlm_forge\u002Fouroboros_cli.py`](src\u002Frlm_forge\u002Fouroboros_cli.py)\ninstalls this process-local gate for the project-local `uv run ouroboros` and\n`uv run ooo` entrypoints, so the dependency package in `.venv` is not mutated.\n[`experiments\u002Fooo-rlm-inprocess-traceguard-gate.md`](experiments\u002Fooo-rlm-inprocess-traceguard-gate.md)\nshows the gate catching a real raw-parent handle defect in the persisted paper\nrun: the parent cited `rlm_node_root:child_result:004`, but the run produced\nonly child results `000..003`. After repairing that handle to fresh child\nevidence, the parent accepts with unsupported rate 0.0000; adding a\n`MEMORY-ANSWER` fact is rejected.\n[`experiments\u002Fooo-rlm-live-inprocess-traceguard-run.md`](experiments\u002Fooo-rlm-live-inprocess-traceguard-run.md)\nrecords the live wrapper path after the integration: `uv run --extra dev ooo\nrlm ...` completes four child Hermes calls plus one parent synthesis call and\nprints `TraceGuard accepted parent synthesis (unsupported_rate=0.0000,\nclaims=4)` before reporting command success.\n\n---\n\n## What the live experiment proves\n\nThe 24-cell live run is a systems experiment, not a leaderboard. Its purpose\nis to test whether an RLM-style child\u002Fparent contract can run across real\nagent runtimes while a deterministic validator controls what may become parent\nstate.\n\nThe useful result is not \"all models got the answer right.\" The useful result\nis that RLM-FORGE exposes a runtime surface:\n\n1. child calls return structured facts with evidence handles;\n2. parent synthesis must cite those handles;\n3. TraceGuard rejects parent claims that lack accepted evidence;\n4. the same contract can be exercised through Hermes, Claude Code, and Codex.\n\nThe chunk-only citation trap is the useful stress case. In an earlier\nHermes+GLM run, GLM preserved the fact text but emitted one\n`evidence_chunk_id` as `null`; TraceGuard rejected that parent claim before it\ncould become accepted state. The latest full rerun passes this cell because\nGLM includes the required handle. The important point is that the runtime now\nhas a narrow response if that observed class recurs: when rejection is\nexclusively `missing_evidence_handle`, the repair loop fills only missing\u002Fnull\nhandle fields from the child evidence manifest and retries parent synthesis\nonce. This turns the observed failure from a terminal contract failure into a\nbounded, inspectable recovery step.\n\nTraceGuard is not an LLM judge. It does not ask another model whether the\nanswer \"seems correct.\" It checks the manifest deterministically:\n\n```text\nfresh child evidence + fact_id + evidence_chunk_id -> accepted parent claim\nmissing handle \u002F omitted fact \u002F chunk-only citation -> rejected parent claim\n```\n\n---\n\n## Repository layout\n\n```text\nrlm-forge\u002F\n├─ src\u002Frlm_forge\u002F\n│  ├─ traceguard.py       # evidence-gated parent synthesis validator\n│  ├─ memory.py           # operational memory priors, guards, and backends\n│  ├─ live_portability.py # live runtime matrix, repair loop, memory mode\n│  ├─ replay.py           # offline artifact replay CLI\n│  └─ __init__.py         # public API surface\n├─ tests\u002F                 # no-API CI tests\n├─ experiments\u002F           # deterministic scorer + TraceGuard artifacts\n├─ benchmarks\u002F            # persisted Hermes truncation benchmark\n├─ docs\u002F                  # architecture, setup, benchmark notes\n├─ examples\u002F              # small command wrappers\n└─ paper\u002F                 # hackathon technical note\n```\n\n---\n\n## Experimental memory-shaped recursion\n\nRLM-FORGE now includes an experimental memory mode for the live portability\nharness. It turns decomposition traces, provider-specific failures, schema\nrepairs, and TraceGuard rejections into persistent operational priors for later\nrecursive runs.\n\nThe important rule is:\n\n```text\nMemory is not evidence; memory is a prior over how to ask for evidence.\n```\n\nIn that framing, memory never stores \"this fixture's answer is X.\" The memory\nbackend is schema-first and guarded. It can store operational lessons such as:\n\n- long-context preservation tasks work better when child outputs preserve\n  `fact_id` and `evidence_chunk_id` together;\n- a provider may be slow but schema-stable, or fast but prone to handle\n  omissions;\n- TraceGuard rejection patterns can drive retry prompts or stricter child\n  schemas;\n- routing can specialize over time: decomposition, evidence extraction,\n  synthesis, and repair may prefer different providers.\n\nThe live path now builds TraceGuard's accepted evidence manifest from the\ncurrent run's child outputs, not from memory. Memory can only enter prompts as\nstructured `memory_priors` for schema or retry policy. Every new parent claim\nstill has to cite fresh child evidence and pass TraceGuard.\n\nThere is also an observed layered-memory effect when Hermes is the inner\nruntime. `HermesCliRuntime` calls `hermes chat -Q --source tool -q \u003Cprompt>`\nwithout `--ignore-rules`, so Hermes' own built-in `MEMORY.md` can be injected\nbefore the explicit RLM-FORGE prompt. In this environment, Hermes sessions\ncreated after an RLM-FORGE run contained RLM-FORGE operational prior entries in\ntheir system prompts. That layer is useful as a behavior prior, but it is not\npart of the TraceGuard evidence manifest. Details and session evidence are in\n[`docs\u002Fhermes-layered-memory-flow.md`](docs\u002Fhermes-layered-memory-flow.md).\n\nThe stronger contribution is evidence-admissible memory for agent runtimes:\nmemory may shape decomposition, schema discipline, and repair policy, but only\nfresh child evidence can support parent claims. The deterministic contribution\nbenchmarks can be regenerated with:\n\n```bash\npython scripts\u002Frun-memory-contribution-benchmarks.py\n```\n\n```bash\npython scripts\u002Frun-live-portability-matrix.py \\\n  --mode live-primary \\\n  --memory-mode read-write \\\n  --memory-store .rlm-forge-memory.jsonl \\\n  --output-prefix live-portability-primary-memory\n```\n\nThe default remains `--memory-mode off` for clean baseline runs. Memory runs\nshould be reported separately from no-memory baselines and should claim only a\nruntime feedback loop, not a model-quality improvement.\n\nMemory live experiment:\n[`experiments\u002Flive-portability-primary-memory-fixed.md`](experiments\u002Flive-portability-primary-memory-fixed.md)\nrecords the May 4, 2026 `read-write` memory run across the same 24-cell\nprimary matrix. The run passed all required cells with memory enabled:\nHermes+GLM, Claude Code, and Codex each completed and passed all 8 primary\nfixtures. TraceGuard accepted every final parent synthesis, the fresh child\nevidence gate had zero failures, and the memory backend stored 24 operational\nobservations with zero rejected memory candidates.\n\n| Memory mode | Family | Primary cells | Fresh evidence failures | TraceGuard failures |\n| --- | --- | ---: | ---: | ---: |\n| `read-write` | `hermes_glm` | 8\u002F8 pass | 0 | 0 |\n| `read-write` | `claude_code_opus47` | 8\u002F8 pass | 0 | 0 |\n| `read-write` | `codex_gpt55` | 8\u002F8 pass | 0 | 0 |\n\nThe durable memory log for that run is\n[`experiments\u002Flive-portability-memory-primary-fixed.jsonl`](experiments\u002Flive-portability-memory-primary-fixed.jsonl).\nIt contains operational priors such as `preserve_child_fact_identity`, not\nfixture answers or retained factual chunks.\n\n---\n\n## What RLM-FORGE is and is not\n\n| It is | It is not |\n| --- | --- |\n| A Hermes-backed RLM runtime MVP | A new model architecture |\n| A replayable trace and evidence-validation scaffold | A claim that recursion alone prevents hallucination |\n| A practical integration recipe for Hermes + Ouroboros | A production RLM service |\n| A deterministic TraceGuard enforcement demo | A benchmark suite proving model-quality superiority |\n\nThis is an MVP designed to demonstrate that Hermes can serve as the inner\nrecursive LM in an RLM-style scaffold with replayable traces and deterministic\nevaluation. It is not a production-ready RLM service, does not claim novelty\nover the Zhang et al. paper, and no longer claims a quality advantage from the\nsingle truncation fixture. The experimental memory mode should improve\ndecomposition, routing, and repair policy only; it does not replace fresh trace\nevidence for parent claims.\n\n---\n\n## Development\n\n```bash\npython3.12 -m venv .venv\nsource .venv\u002Fbin\u002Factivate\npip install -e '.[dev]'\npytest -q\n```\n\nCurrent local verification:\n\n```text\n55 passed\n```\n\n---\n\n## Examples\n\n| Script | What it does | Hermes calls |\n| --- | --- | --- |\n| `examples\u002F01-dry-run.sh` | Validate the RLM path, no side effects | 0 |\n| `examples\u002F02-vanilla-baseline.sh` | One vanilla Hermes call on the truncation fixture | 1 |\n| `examples\u002F03-truncation-comparison.sh` | Side-by-side vanilla vs recursive RLM | 1 + 5 |\n\nEach script is a one-liner that wraps the Ouroboros CLI.\n\n---\n\n## Architecture\n\nSee [`docs\u002Farchitecture.md`](docs\u002Farchitecture.md) for the layer model,\norchestration boundaries, and 6-step sub-call lifecycle. The full concept\ndesign is `docs\u002Fguides\u002Frecursive-language-model.md` in the upstream\nOuroboros repository (1,580 lines).\n\n```\nUser\n  |\n  v\nooo rlm\n  |\n  v\nOuroboros outer scaffold\n  - validates ambiguity \u003C= 0.2\n  - owns ACTree recursion, max depth 5\n  - owns RLM tree state, scheduling, termination, trace persistence\n  - calls Hermes through HermesCliRuntime\n  |\n  v\nHermes inner LM layer\n  - receives one bounded recursive sub-call at a time\n  - proposes decomposition, atomic execution, summary, or synthesis\n  - returns structured JSON evidence to Ouroboros\n```\n\n---\n\n## License\n\nMIT. See [`LICENSE`](LICENSE).\n\n## Acknowledgements\n\n- **Hermes Agent** — Nous Research. The inner runtime that made the experiment practical.\n- **Ouroboros** — the outer scaffold that owns recursion, state, and traces.\n- **Zhang, Kraska, Khattab** — *Recursive Language Models*, the conceptual seed for this work.\n","RLM-FORGE 是一个为Hermes Agent设计的运行时提升递归语言模型框架。其核心功能包括通过Ouroboros实现递归调用，利用Hermes Agent进行有界JSON子调用，并采用TraceGuard机制确保所有合成操作均有证据支持，从而增强系统的安全性和可靠性。项目使用Python 3.12编写，适合需要构建复杂、多层次且具有自我调用能力的语言模型的应用场景，特别是在对安全性要求较高的代理系统开发中。",2,"2026-06-11 02:59:52","CREATED_QUERY"]