[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80777":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":11,"stars7d":14,"stars30d":12,"stars90d":13,"forks30d":13,"starsTrendScore":15,"compositeScore":16,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":17,"topics":20,"createdAt":8,"pushedAt":8,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":13,"starSnapshotCount":13,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},80777,"mlsys2026-flashinfer-contest","mit-han-lab\u002Fmlsys2026-flashinfer-contest","mit-han-lab",null,"Python",80,3,40,0,14,18,52.81,false,"main",true,[],"2026-06-12 04:01:30","# HAN Lab Kernel Mafia MLSys2026 Flashinfer Constest Release\n\nThis repository releases the prompts, workflow documentation, and a minimal verification example for our MLSys 2026 FlashInfer Full-Agent track effort. The submitted kernels were produced by a fully agent-driven optimization workflow with [KDA (Kernel Develop Agents)](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fkernel-design-agents). The core methods are [Humanize](https:\u002F\u002Fgithub.com\u002FPolyArch\u002Fhumanize) (the best harness framework), [Our Collected KernelWiki](https:\u002F\u002Fgithub.com\u002FDongyunZou\u002FKernelWiki\u002Ftree\u002Fmaster), and [Nsight Compute Profile Skills](https:\u002F\u002Fgithub.com\u002FDongyunZou\u002Fncu-report-skill).\n\n* Team: HAN Lab Kernel Mafia\n* Technical Report: [docs\u002FHAN_Lab_Kernel_Mafia_Technical_Report](docs\u002FHAN_Lab_Kernel_Mafia_Technical_Report.pdf)\n* Generated Kernels \u002F Final solution: [mit-han-lab\u002Fmlsys2026-flashinfer-contest-solution](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fmlsys2026-flashinfer-contest-solution)\n\n![HAN Lab Kernel Mafia teaser](docs\u002Fassets\u002Fhanlab-kernel-mafia-teaser.png)\n\nThe repository is intentionally small. It contains documentation, agent prompts, and a lightweight `flashinfer-bench` verification example for an externally packed `solution.json`. Final kernel source snapshots and the submission verification harness live in the separate submissions repository linked above. The reusable skills remain in their own repositories and are linked below.\n\n## Competition Results\n\nOur agent workflow shows impressive results in all three Full-Agent Approach tracks of the MLSys 2026 Competition NVIDIA Track:\n\n| Track | Result |\n|---|---:|\n| MoE Track | 1st place |\n| DSA Track | 2nd place |\n| GDN Track | 3rd place |\n\nThe released prompts follow a three-stage optimization workflow:\n\n![Three-stage kernel optimization pipeline](docs\u002Fassets\u002F3_stage_pipeline.png)\n\nEach stage uses the Humanize planning and RLCR loop to turn a phase prompt into an executable optimization plan:\n\n![Humanize kernel agent loop](docs\u002Fassets\u002Fhumanize_kernel_agent_loop.png)\n\n## Skill Ablation\n\nThis ablation was run after the competition, separately from the official contest submissions, so its numbers are meant to explain skill contributions rather than exactly match the competition results above. The skill ablation highlights that Humanize is the dominant contributor: it gives the agent a much stronger plan-execute-verify structure, turning each optimization attempt into a more disciplined loop instead of a loose sequence of trials. KernelWiki broadens the kernel knowledge the agent can consult, and `ncu-report-skill` lets the agent read finer-grained profiler evidence instead of relying only on benchmark scores as a black box. Those two skills are useful, but the largest and most central gain comes from Humanize.\n\n![Ablation on skills](docs\u002Fassets\u002Fablation_on_skills.png)\n\n## Contents\n\n| Path | Purpose |\n|---|---|\n| `verify.py` | Minimal example that evaluates one packed FlashInfer `solution.json` with `flashinfer-bench`. |\n| `prompts\u002F` | Prompt template and task-specific prompts used for the agent workflow. |\n| `skills\u002F` | Git submodule links to the required Claude skills. |\n| `docs\u002FHAN_Lab_Kernel_Mafia_Technical_Report.pdf` | Technical report. |\n| `docs\u002Freproduction.md` | Environment, dataset, and benchmark reproduction notes. |\n\n## Agent Workflow Dependencies\n\nThe workflow depends on Claude Code and Codex. Install `humanize` as a Claude Code plugin, and install `KernelWiki` and `ncu-report-skill` as Claude skills under `~\u002F.claude\u002Fskills\u002F`.\n\nThis repository links the two required skills as git submodules under `skills\u002F` so they are visible in the release tree. If you did not clone with `--recurse-submodules`, initialize them from the repository root:\n\n```bash\n# link skills\ngit submodule update --init --recursive\nmkdir -p ~\u002F.claude\u002Fskills\nln -sfn \"$PWD\u002Fskills\u002FKernelWiki\" ~\u002F.claude\u002Fskills\u002FKernelWiki\nln -sfn \"$PWD\u002Fskills\u002Fncu-report-skill\" ~\u002F.claude\u002Fskills\u002Fncu-report-skill\n\n# or clone skills directly\nmkdir -p ~\u002F.claude\u002Fskills && cd ~\u002F.claude\u002Fskills\ngit clone https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fncu-report-skill.git\ngit clone https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002FKernelWiki.git\n```\n\nInstall `humanize` separately from Claude Plugin Marketplace\n```bash\n# Add PolyArch marketplace\n\u002Fplugin marketplace add PolyArch\u002Fhumanize\n# Then install humanize plugin\n\u002Fplugin install humanize@PolyArch\n```\n\n## Fresh Workflow Setup\n\nClone this repository, install the benchmark environment, download the FlashInfer contest workloads, and prepare the agent workflow dependencies:\n\n```bash\ngit clone --recurse-submodules https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fmlsys2026-flashinfer-contest.git\ncd mlsys2026-flashinfer-contest\n\ngit clone https:\u002F\u002Fgithub.com\u002Fflashinfer-ai\u002Fflashinfer-bench.git \u002Ftmp\u002Fflashinfer-bench-main\nuv sync --python 3.12\n\n# uv.lock pins the contest-tested stack:\n# flashinfer-python==0.6.8.post1, torch==2.12.0+cu132, triton==3.6.0.\n# Use Python 3.12 or 3.13; Python 3.14 is not supported by all CUDA wheels.\n\n# Required by some baselines and generated solutions that use DeepGEMM\u002FCUTLASS\u002FCuTe headers.\ngit clone https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepGEMM.git \u002Ftmp\u002FDeepGEMM\nuv pip install -e \u002Ftmp\u002FDeepGEMM --no-build-isolation\n\nuv run .\u002Fscripts\u002Fdownload_data.sh\n```\n\nConfirm that the workload dataset is visible:\n\n```bash\nuv run python -c \"from flashinfer_bench import TraceSet; ts = TraceSet.from_path('data\u002Fflashinfer-trace'); print(sorted(ts.definitions)); print(sum(len(v) for v in ts.workloads.values()), 'workloads')\"\n```\n\nCreate a separate task implementation workspace from the official FlashInfer starter kit, then start the agent from there. This repository is the prompt\u002Fworkflow release; do not implement kernels directly in this repository.\n\n```bash\nmkdir -p workspaces\ngit clone https:\u002F\u002Fgithub.com\u002Fflashinfer-ai\u002Fflashinfer-bench-starter-kit.git workspaces\u002F\u003Ctask-name>\ncd workspaces\u002F\u003Ctask-name>\nexport FIB_DATASET_PATH=\"$OLDPWD\u002Fdata\u002Fflashinfer-trace\"\n```\n\nThen choose a task prompt under `prompts\u002F`, start a fresh agent session in the task implementation workspace, and paste the selected phase prompt. The released final kernels are not part of this workflow and must not be used as implementation input.\n\nSee `docs\u002Freproduction.md` for full environment notes and packed-solution verification commands.\n\nBy default, the dataset is stored under `data\u002Fflashinfer-trace` inside this repository. Override it with:\n\n```bash\nexport FIB_DATASET_PATH=\u002Fpath\u002Fto\u002Fflashinfer-trace\n```\n\n## Release Boundary\n\nFinal kernels are stored only in [mit-han-lab\u002Fmlsys2026-flashinfer-contest-solution.git](https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fmlsys2026-flashinfer-contest-solution.git) as result snapshots. This link is for release provenance and final-result verification; it is not an input to the prompt-driven agent workflow. Agents MUST NOT clone or inspect the release repository while solving the tasks. Intermediate candidates, benchmark histories, and search DAGs are not part of this release. The prompts in `prompts\u002F` are meant to be run from a separate task implementation workspace created from the official FlashInfer starter kit. We do not place final kernels inside an agent starting workspace.\n\nRunning the full Agents workflow is not bitwise deterministic: search order, profiling noise, GPU scheduling, and model behavior can change. The external submissions repository is the source of truth for the released final kernel snapshots.\n","该项目是为MLSys 2026 FlashInfer竞赛设计的，旨在通过完全由代理驱动的优化工作流程来生成高效的内核代码。核心功能包括使用Humanize框架进行计划与执行验证循环、KernelWiki提供广泛的内核知识库以及Nsight Compute Profile Skills实现更细粒度的性能分析。这些技术共同作用于三个阶段的优化流程中，显著提升了内核开发效率与质量，在MoE、DSA和GDN三个赛道上分别取得了第一、第二和第三名的成绩。适用于需要高性能计算内核优化的场景，尤其是当目标是通过自动化工具加速软件开发过程时。",2,"2026-06-11 04:02:18","CREATED_QUERY"]