[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-77281":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":13,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":19,"topics":20,"createdAt":8,"pushedAt":8,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},77281,"auto-arch-tournament","FeSens\u002Fauto-arch-tournament","FeSens",null,"VHDL",125,8,119,1,0,6,2.86,false,"main",true,[],"2026-06-12 02:03:42","# auto-arch-tournament\n\nAn autonomous research loop pointed at a SystemVerilog RV32IM CPU. Each round\nthe agent proposes a microarchitectural hypothesis, implements it in an\nisolated git worktree, then runs it through riscv-formal + Verilator cosim +\n3-seed FPGA place-and-route. Only hypotheses that beat the current champion\non CoreMark\u002FMHz get merged.\n\nThe repo is multi-core: every architecture lives under `cores\u002F\u003Cname>\u002F` with\nits own RTL, tests, log, and `core.yaml`. `make loop TARGET=\u003Cname>` runs the\nloop on one core at a time, on a dedicated `core-\u003Cname>` git branch inside\nits own worktree, so two cores can iterate in parallel without stomping each\nother.\n\n## What came out of it\n\n![CoreMark progress: green dots are accepted winners, orange are rejected, blue\u002Fpurple bands group tournament rounds.](cores\u002Fv1\u002Fexperiments\u002Fprogress.png)\n\n73 hypotheses, 9h 51m wall-clock, on the run that produced `cores\u002Fv1\u002F` from\n`cores\u002Fbaseline\u002F`. Locked baseline → champion went from\n**301 iter\u002Fs (2.23 CoreMark\u002FMHz)** to **577 iter\u002Fs (2.91 CoreMark\u002FMHz)** —\n+92% on fitness, +26% over VexRiscv's published 2.30 CoreMark\u002FMHz, with\n40% fewer LUTs.\n\nThe 10 accepted winners, in order of merge:\n\n| Δt   | iter\u002Fs | CM\u002FMHz | Fmax    | Hypothesis                                    |\n|------|--------|--------|---------|-----------------------------------------------|\n| 0.0h | 301.04 | 2.226  | 135 MHz | Baseline                                      |\n| 0.4h | 313.10 | 2.320  | 135 MHz | Backward-Branch Taken Predictor               |\n| 0.7h | 324.48 | 2.348  | 138 MHz | IF Direct-Jump Predictor                      |\n| 2.1h | 375.43 | 2.348  | 160 MHz | Cold Multi-Cycle DIV\u002FREM Unit                 |\n| 2.7h | 397.55 | 2.366  | 168 MHz | One-Deep Store Retirement Slot                |\n| 3.5h | 422.77 | 2.366  | 179 MHz | Segmented RVFI Order Counter                  |\n| 3.8h | 472.96 | 2.891  | 164 MHz | Registered Lookahead I-Fetch Replay Predictor |\n| 4.0h | 505.65 | 2.891  | 175 MHz | Compressed Resetless I-Fetch Replay Tags      |\n| 5.3h | 529.35 | 2.891  | 183 MHz | RTL-Only Hot\u002FCold ALU Opcode Split            |\n| 6.1h | 577.76 | 2.908  | 199 MHz | Banked Registered I-Fetch Replay Predictor    |\n\nFull writeup: [docs\u002Fauto-arch-tournament-blog-post.md](docs\u002Fauto-arch-tournament-blog-post.md).\n\n## Setup\n\nmacOS only for now. One-time toolchain install:\n\n```\nbash setup.sh\n```\n\nFetches Verilator, OSS CAD Suite (yosys, nextpnr-himbaechel, sby, bitwuzla),\nthe riscv-none-elf cross compiler, and a few Python deps into `.toolchain\u002F`.\nTools already on `$PATH` are detected and reused.\n\nYou'll also need a coding-agent CLI. Codex is the default; Claude Code works\ntoo — pass `AGENT=claude` to any orchestrator target.\n\n## Run the loop\n\nEvery core-touching command takes `TARGET=\u003Cname>`. The Makefile errors out\notherwise and lists the cores it found.\n\n```\nmake next TARGET=v1                        # one round, one slot — smoke test\nmake loop TARGET=v1 N=10                   # 10 rounds, sequential\nmake loop TARGET=v1 N=10 K=3               # 10 rounds, 3 parallel slots\u002Fround\nmake loop TARGET=v1 N=10 LUT=3000 COREMARK=400   # set per-target fitness goals\nmake loop TARGET=v1 N=10 AGENT=claude      # use Claude Code instead of Codex\nmake loop TARGET=mycore BASE=baseline N=10 # fork a new core from cores\u002Fbaseline\u002F\n\nmake report TARGET=v1                      # summary of cores\u002Fv1\u002Fexperiments\u002Flog.jsonl\n```\n\nWhat `make loop TARGET=\u003Cname>` does:\n\n1. Creates `.worktrees\u002F\u003Cname>\u002F` on branch `core-\u003Cname>` if it doesn't exist\n   (subsequent runs reuse it). Re-execs make inside the worktree so two\n   parallel `make loop`s on different TARGETs don't collide on the git index.\n2. For each round:\n   - **Hypothesis agent** writes `cores\u002F\u003Cname>\u002Fexperiments\u002Fhypotheses\u002F\u003Cid>.yaml`\n     against the JSON schema. The prompt sees: source RTL, recent log entries,\n     `LESSONS.md`, `CORE_PHILOSOPHY.md`, `core.yaml`.\n   - **Implementation agent** edits `cores\u002F\u003Cname>\u002Frtl\u002F` (and optionally\n     `cores\u002F\u003Cname>\u002Ftest\u002Ftest_*.py`) in a per-slot worktree.\n   - **Eval pipeline**: verilator lint → yosys synth → bench `make` → cosim\n     build → riscv-formal → Verilator cosim vs. Python ISS → 3-seed FPGA\n     P&R + CoreMark. Each step is gated; first failure short-circuits with\n     a `broken: \u003Cstep>: \u003Cstderr tail>` outcome.\n   - **Scribe** distills one bullet into `cores\u002F\u003Cname>\u002FLESSONS.md` so the\n     next round's hypothesis agent reads what failed and why.\n3. The highest-fitness slot above the current champion merges into\n   `core-\u003Cname>`. Regressions \u002F broken \u002F placement-failed → worktree\n   destroyed, log entry written, next round.\n\nThe orchestrator never merges to `main`. Each core's evolution is one PR\ndiff: `git push -u origin core-\u003Cname> && gh pr create --base main`.\n\nTo opt out of the worktree wrap (run directly on the current branch):\n`make loop TARGET=v1 WORKTREE=`.\n\nOther useful targets — all accept `TARGET=`:\n\n```\nmake lint TARGET=v1          # verilator lint over cores\u002Fv1\u002Frtl\u002F\nmake test TARGET=v1          # cocotb unit tests under cores\u002Fv1\u002Ftest\u002F\nmake cosim TARGET=v1         # cosim alone (no orchestrator)\nmake formal TARGET=v1        # riscv-formal fast suite (ALTOPS — see CLAUDE.md)\nmake formal-deep TARGET=v1   # full formal suite WITHOUT ALTOPS — slow, real bitvector arithmetic\nmake fpga TARGET=v1          # FPGA eval alone (3-seed P&R + CoreMark)\nmake bench                   # build selftest.elf \u002F coremark.elf (shared across cores)\nmake clean TARGET=v1         # nuke per-core build artifacts\nmake test-infra              # pytest under tools\u002F (no TARGET needed)\n```\n\nIf you SSH-sign your commits and run unattended, `tools\u002Forchestrator.py`\nsets `commit.gpgsign=false` for its own subprocess tree so the loop\ndoesn't hang on a 1Password biometric prompt. Manual commits from your\nshell are unaffected.\n\n## Working with multiple cores\n\nThe repo holds multiple cores under `cores\u002F\u003Cname>\u002F`. Each has its own RTL,\ntests, experiment log, `core.yaml`, and `LESSONS.md`. The orchestrator runs\nagainst one core at a time.\n\n**Available cores on `main`:**\n- `cores\u002Fbaseline\u002F` — the original simple RV32IM 5-stage in-order core.\n  Universal seed for new cores.\n- `cores\u002Fv1\u002F` — the evolved core from the historical run shown above\n  (branch prediction, banked I-fetch replay, hot\u002Fcold ALU split).\n\nOther cores live on their own `core-\u003Cname>` branches, not yet merged to\n`main` — that's the normal state for a core mid-evolution.\n\n**Forking a new core:**\n\n```\nmake loop TARGET=mycore BASE=baseline N=20 LUT=3000 COREMARK=400\n```\n\nThe orchestrator:\n1. Copies `cores\u002Fbaseline\u002F` → `cores\u002Fmycore\u002F`.\n2. Re-runs the eval gates against the freshly-copied RTL to get a measured\n   `current:` block in `cores\u002Fmycore\u002Fcore.yaml` — the first log entry.\n3. Iterates `N` rounds, with the LUT\u002FCoreMark targets as Pareto goals.\n\nThe work lands on branch `core-mycore` in `.worktrees\u002Fmycore\u002F`. To review\nor ship:\n\n```\ngit push -u origin core-mycore\ngh pr create --base main\n```\n\nThis is intentional: each core's evolution is one reviewable diff, and\nmultiple cores can iterate concurrently — `make loop TARGET=mini` and\n`make loop TARGET=maxperf` run side-by-side without colliding because each\nruns in its own worktree on its own branch.\n\n**Per-core artifacts** (under `cores\u002F\u003Cname>\u002F`):\n\n| File \u002F dir              | Purpose                                                                     |\n|-------------------------|-----------------------------------------------------------------------------|\n| `rtl\u002F*.sv`              | The design. The agent's playground.                                         |\n| `test\u002Ftest_*.py`        | cocotb tests. The agent may add tests for new modules.                      |\n| `core.yaml`             | Declared targets + auto-updated `current:` (running champion's measured numbers). |\n| `CORE_PHILOSOPHY.md`    | Optional architect intent — injected verbatim into hypothesis prompts.      |\n| `LESSONS.md`            | Append-only one-line lessons from prior iterations (the scribe agent).      |\n| `experiments\u002Flog.jsonl` | Per-iteration outcomes: proposal + fitness numbers + verdict + lesson.      |\n| `experiments\u002Fprogress.png` | Fitness chart over time.                                                 |\n\n## Philosophy\n\nThe orchestrator is hardcoded. The model never edits it. What the model\ncan touch is small and explicit:\n\n- `cores\u002F\u003CTARGET>\u002Frtl\u002F**` — any SystemVerilog file. Add modules, delete\n  modules, rename, restructure, rewrite from scratch. The only top-level\n  invariant is the I\u002FO contract on `core.sv` (clock\u002Freset, imem\u002Fdmem,\n  NRET=2 RVFI).\n- `cores\u002F\u003CTARGET>\u002Ftest\u002Ftest_*.py` — cocotb suites. Add tests for new modules.\n\nEverything else is off-limits. The path sandbox in `tools\u002Forchestrator.py`\nrejects the round *before* any eval runs if the agent touched `formal\u002F`,\n`tools\u002F`, `fpga\u002F`, `test\u002Fcosim\u002Fmain.cpp`, the CRC table, or any sibling\ncore's directory. The agent doesn't get to soften its own grader.\n\nThe verifier does the heavy lifting:\n\n- **riscv-formal** — symbolic BMC against RV32IM: decode, traps, ordering,\n  liveness, M-ext discipline. ~105 checks at NRET=2.\n- **Verilator cosim** — random ~22% bus stalls, RVFI byte-identical against\n  a Python ISS on `selftest.elf` and `coremark.elf`.\n- **3-seed P&R** — yosys + nextpnr on a Gowin GW2A-LV18 (Tang Nano 20K).\n  Median Fmax × CoreMark iter\u002Fcycle = fitness. One seed is a coin flip;\n  three is comparable across rounds.\n- **CoreMark CRC validation** — the four canonical 2K-config CRCs.\n  CoreMark prints \"Correct operation validated.\" even when it isn't, so\n  the bench re-checks them itself.\n- **Path sandbox** — the agent cannot edit anything outside its target\n  core's `rtl\u002F` and `test\u002Ftest_*.py`.\n\nOf 73 hypotheses in the run shown above, 63 were rejected by the verifier.\nThat's the point.\n\nThe full contract — invariants, don't-touch list, what may change — is in\n[`CLAUDE.md`](CLAUDE.md). The I\u002FO contract is in [`ARCHITECTURE.md`](ARCHITECTURE.md).\n\n## Why this exists\n\nKarpathy's [autoresearch](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch) showed\nan agent loop finding 20 training-time wins on a nanochat in two days. That\nworked because Python and gradient descent are the agent's home turf. This\nrepo asks whether the recipe generalizes when you point it somewhere it has\nno business being good at — SystemVerilog, formal verification, FPGA timing.\n\nIt does. But the loop isn't the moat — the loop is commodity. The artifact\nthat survived 10 wins past 63 rejections wasn't the agent; it was the\nverifier. That's the part that encodes what *correct* means in this domain.\nThe full argument is in the [blog post](docs\u002Fauto-arch-tournament-blog-post.md).\n\n## Layout\n\n```\ncores\u002F                  # per-target architectures\n  \u003Cname>\u002F\n    rtl\u002F                # SystemVerilog sources (the design — agent-editable)\n    test\u002F               # cocotb unit tests (agent-editable)\n    experiments\u002F        # log.jsonl + progress.png + transient hypotheses\u002F\u003Cid>.yaml\n    core.yaml           # declared targets + auto-updated current:\n    CORE_PHILOSOPHY.md  # optional architect intent (injected into prompts)\n    LESSONS.md          # append-only scribe output\n\nbench\u002Fprograms\u002F         # selftest.S, crt0.S, link.ld, EEMBC CoreMark — shared, off-limits\nformal\u002F                 # riscv-formal wrapper, checks.cfg, run_all.sh — correctness contract\nfpga\u002F                   # core_bench.sv, synth.tcl, nextpnr scripts, constraints — fitness contract\ntest\u002Fcosim\u002F             # Verilator cosim harness (main.cpp + reference Python ISS)\ntools\u002F                  # orchestrator, worktree manager, eval gates, scribe, plotting\nschemas\u002F                # hypothesis + eval-result JSON schemas\ndocs\u002F                   # design notes, blog post\n.worktrees\u002F             # auto-created by `make loop`, one git worktree per TARGET\n```\n\n## Tech stack\n\n| Concern        | Tool                                                            |\n|----------------|-----------------------------------------------------------------|\n| RTL            | SystemVerilog (IEEE 1800-2017 synthesizable subset)             |\n| Sim            | Verilator ≥ 5.0                                                 |\n| Unit tests     | cocotb ≥ 1.8 (Python harness over Verilator)                    |\n| Formal         | YosysHQ riscv-formal (vendored submodule); sby + bitwuzla       |\n| Synth          | Yosys + `synth_gowin`                                           |\n| Place & route  | nextpnr-himbaechel (Gowin GW2A-LV18QN88C8\u002FI7 = Tang Nano 20K)   |\n| Cross-compiler | xPack riscv-none-elf-gcc 15.x (symlinked to riscv32-unknown-elf)|\n| Orchestrator   | Python 3.11+, jsonschema, pyyaml, matplotlib                    |\n| Coding agent   | Codex CLI (default) or Claude Code (`AGENT=claude`)             |\n\n## Citation\n\nIf you use HWE Bench in research, please cite the repository (an arXiv\npreprint is in preparation). GitHub renders a \"Cite this repository\"\nbutton from `CITATION.cff` on the repo's main page.\n\nBibTeX:\n\n```bibtex\n@software{bonetto_hwebench_2026,\n  author       = {Bonetto, Felipe Sens},\n  title        = {{HWE Bench: An Unbounded Benchmark for LLM Hardware Development on RISC-V}},\n  year         = {2026},\n  url          = {https:\u002F\u002Fhwebench.com},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002FFeSens\u002Fauto-arch-tournament}}\n}\n```\n\nPublic site: \u003Chttps:\u002F\u002Fhwebench.com>.\n","auto-arch-tournament 是一个面向 SystemVerilog RV32IM CPU 的自动化研究循环项目，旨在通过自动化的微架构假设、实现和验证过程来优化 CPU 性能。该项目使用 VHDL 语言编写，每轮迭代中，代理会提出一个新的微架构假设，在隔离的 Git 工作树中实现，并通过 riscv-formal、Verilator 和 FPGA 布局布线进行验证。只有在 CoreMark\u002FMHz 指标上超过当前最优方案的假设才会被合并。每个核心都独立管理，允许并行处理多个核心而不互相干扰。适用于需要高效优化 RISC-V 处理器性能的研究场景或实际开发环境中。",2,"2026-06-11 03:55:15","CREATED_QUERY"]