[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-78044":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":12,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":8,"rankLanguage":8,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":8,"pushedAt":8,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":14,"starSnapshotCount":14,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},78044,"MetaFine","Hiangx-robotics\u002FMetaFine","Hiangx-robotics",null,"Python",300,20,1,6,0,239,3.97,"MIT License",false,"main",true,[],"2026-06-12 02:03:45","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"docs\u002Flogo.png\" alt=\"MetaFine\" height=\"120\" \u002F>\n\n# MetaFine\n\n**A diagnostic evaluation framework for fine-grained robotic manipulation.**\n\n[![Homepage](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🌐_Homepage-MetaFine-2563eb?style=for-the-badge)](https:\u002F\u002Fmetafine.github.io)\n[![ModelScope](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Dataset-624aff?style=for-the-badge&logo=alibabacloud&logoColor=white)](https:\u002F\u002Fwww.modelscope.cn\u002Fdatasets\u002Fhiangx\u002FMetaFine)\n[![HuggingFace](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗_HuggingFace-Dataset-ffc107?style=for-the-badge)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhiangx\u002FMetaFine)\n\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10%2B-3776ab?style=flat-square&logo=python&logoColor=white)](https:\u002F\u002Fwww.python.org\u002F)\n[![SAPIEN](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSAPIEN-3.0+-26c0e0?style=flat-square)](https:\u002F\u002Fsapien.ucsd.edu\u002F)\n[![ManiSkill](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FManiSkill-3.0+-4caf50?style=flat-square)](https:\u002F\u002Fgithub.com\u002Fhaosulab\u002FManiSkill)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-green?style=flat-square)](#license)\n[![Status](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FStatus-Alpha-orange?style=flat-square)](#timeline)\n\n\u003C\u002Fdiv>\n\n---\n\nMetaFine treats evaluation as a tool for **scientific diagnosis** rather than a leaderboard. Instead of collapsing a manipulation policy into a single binary success rate, MetaFine disentangles capability into three fundamental dimensions — **understanding**, **perception**, and **behavior** — and surfaces the hidden failure modes that conventional benchmarks miss.\n\nThe platform is built on a **compositional task graph** and an **extensible asset library** so it can generate diverse fine-grained tasks, absorb heterogeneous benchmarks, and support both pure simulation and hybrid real–sim evaluation.\n\n> 📖 **Full documentation**, tutorials, and the supported policies \u002F tasks catalogue live on the [project homepage](https:\u002F\u002Fmetafine.github.io). This README covers what the *codebase* is, how to install it, and how it fits together.\n\n## ✨ Features\n\n| | | |\n|:---|:---|:---|\n| **🔬 Three-dimensional diagnosis** | **🧩 Atomic compositional skills** | **🧱 Fine-grained part-aware assets** |\n| Evaluate \u003Cem>understanding · perception · behavior\u003C\u002Fem> separately to expose hidden failure modes — not a single binary success rate. | Compose arbitrary fine-grained tasks from 21 reusable atomic primitives + an 11-element affordance closed set. | 40+ part-annotated articulated objects with auto-derived `capabilities.json` and a CLI for rapid asset expansion. |\n| **🌐 Real-sim hybrid (PPI)** | **📦 Drop-in install** | **🤖 Built-in agent + skill library** |\n| Bridge simulation and reality with phone-scan → process → import → reproduce; same diagnostic protocol on both sides. | One command: `pip install -e .`. Per-policy VLA stacks install separately so dependency hells stay contained. | Two Claude Code skills (`metafine_help`, `metafine_add`) automate platform Q&A and new skill\u002Ftask authoring. |\n\n## ⚡ TL;DR\n\n- **Diagnostic, not binary.** Every eval produces a `results.json` with three orthogonal scores (per-stage success \u002F DR-AUSC \u002F action-smoothness) — not just `success_rate=0.42`.\n- **Compositional skills.** 21 affordance-typed atomic skills (grasp, rotate, slide, insert, …) compose into multi-step task graphs via YAML or Python. Adding a long-horizon task is a 30-line YAML, not a new env class.\n- **Plays well with VLAs.** A shared data pipeline (record → merge → replay → convert) feeds LeRobot and RLDS exports. Seven backbones are vendored (ACT \u002F DP3 \u002F OpenVLA \u002F OpenVLA-OFT \u002F π0 \u002F π0.5 \u002F StarVLA); training is verified via the **LeRobot** and **StarVLA** paths, and **π0.5** closed-loop inference is verified.\n\n---\n\n## 🤖 Built-in skills (Claude Code)\n\nMetaFine ships two [Claude Code](https:\u002F\u002Fclaude.com\u002Fclaude-code) skills that drop into `~\u002F.claude\u002Fskills\u002F` and accelerate everyday work on the platform:\n\n| Skill | Invoke | What it does |\n|---|---|---|\n| **`metafine_help`** | `\u002Fmetafine_help \u003Cquestion>` | Routes a natural-language question to the relevant user-guide section, optionally consults the live codebase, and returns a tight 5–15 line answer with a `→ See:` source citation. Strictly read-only. |\n| **`metafine_add`** | `\u002Fmetafine_add \u003Cdescription>` | Designs a new MetaFine artifact — either a new atomic skill (`@register_skill` stub) or a new compositional task graph YAML. Walks phase classification, affordance contract, predicate composition, validation, and writes the file *only on confirmation*. |\n\nInstall: drop the skill directories into `~\u002F.claude\u002Fskills\u002Fmetafine_help\u002F` and `~\u002F.claude\u002Fskills\u002Fmetafine_add\u002F`. Both skills work with the upstream MetaFine user guide as their primary knowledge base. See [`docs\u002Fagents.md`](docs\u002Fagents.md) for the full design.\n\n---\n\n## Why MetaFine?\n\nConventional benchmarks ask one question: *did the policy succeed?* A yes\u002Fno answer hides which part of the system failed. MetaFine's premise is that any meaningful evaluation has to answer three questions simultaneously:\n\n| Dimension | The question it answers | How MetaFine measures it |\n|---|---|---|\n| **Understanding** | Did the policy know *what to do, in the right order*? | **Per-stage success rates** over a multi-step task graph — surfaces *where* the chain breaks (engagement → manipulation → release). |\n| **Perception** | Did the policy correctly process its sensory inputs *under variation*? | **Domain-randomisation sweeps** with **AUSC** (area-under-success-curve) for lighting, camera pose, and camera rotation — a normalised 0-to-1 score per axis. |\n| **Behavior** | Did the policy execute its plan *smoothly*? | **Action-trajectory smoothness** (jerk RMS, velocity variance, path length) — exposes jerky, hesitant, or chunk-of-N-artefact policies that still happen to \"succeed\". |\n\nTwo policies with the same headline success rate can have totally different `results.json` profiles. MetaFine is designed to make that difference visible.\n\n---\n\n## 📢 What's New\n\n\u003Csub>Latest at top.\u003C\u002Fsub>\n- **2026-05-15** &nbsp; ![NEW](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNEW-red?style=flat-square)\n  &nbsp; 🤖 **Built-in Claude Code skills shipped.**\n  Two slash commands land alongside the platform: [`\u002Fmetafine_help`](https:\u002F\u002Fmetafine.github.io\u002Fuser_guide\u002Fagents\u002Fhelp.html) routes natural-language\n  questions to the user guide; [`\u002Fmetafine_add`](https:\u002F\u002Fmetafine.github.io\u002Fuser_guide\u002Fagents\u002Fadd.html) walks you through designing a new atomic skill\n  or task graph with phase \u002F affordance \u002F predicate validation.\n  \n- **2026-05-14** &nbsp; ![RELEASE](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRELEASE-v0.1.0-2ea44f?style=flat-square)\n  &nbsp; 🚀 **MetaFine v0.1 — public open-source release.**\n  19 envs · 21 atomic skills · 11-affordance closed set · 40+ part-aware assets · three-dimension\n  diagnostic eval (U \u002F P \u002F B) · 7 vendored VLA backbones (training verified via LeRobot + StarVLA) ·\n  LeRobot + RLDS exports · editable install via `pip install -e .`. See the [user guide](https:\u002F\u002Fmetafine.github.io\u002Fuser_guide\u002F) for the full tour.\n  \n## 🧭 How it works\n\nMetaFine sits on a three-layer pipeline. **Composition** brings together atomic skills + part-aware assets via a closed-set affordance match. **Generation** turns that algebra into compositional task graphs that drive recording and rollout. **Diagnostic** scores every rollout along the three orthogonal axes.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Farchitecture.svg\" alt=\"MetaFine architecture: Composition → Generation → Diagnostic, with a real-sim hybrid (PPI) loop.\" width=\"100%\" \u002F>\n\u003C\u002Fp>\n\nEvery concept maps onto something concrete in the source tree:\n\n- **Atomic skills** — `core\u002Fskill.py` (21 motion-planning primitives, `@register_skill` decorator).\n- **Part-aware assets** — `assets\u002F\u003Cid>\u002F{urdf.xml, capabilities.json, model_data.json}` (40+ articulated objects with declared affordances).\n- **Task graphs** — `configs\u002F*.yaml`, executed by `utils\u002Ftask_graph.py`; predicates compile via `core\u002Fpredicates.py`.\n- **Rollout** — `record.py` for expert demonstrations (MP solver); `core\u002Fpolicies\u002F*` for VLA backbones.\n- **Diagnostic** — `utils\u002Feval_metrics.py` (smoothness), `utils\u002Feval_sweep.py` (DR + AUSC), `utils\u002Feval_setup.py` (env dispatch).\n\n---\n\n## 📦 Installation\n\n### System requirements\n\n| Component | Required |\n|---|---|\n| OS | Linux (Ubuntu 20.04 \u002F 22.04 tested) |\n| GPU | NVIDIA, ≥ 8 GB VRAM (CUDA 11.8 or 12.x) |\n| Python | 3.10 or 3.11 |\n| Disk | ~3 GB for code + assets; per-policy checkpoints separate |\n\n### Quick install\n\n```bash\n# 1. (Recommended) fresh conda env\nconda create -n metafine python=3.10 -y\nconda activate metafine\n\n# 2. Clone + editable install\ngit clone https:\u002F\u002Fgithub.com\u002FHiangx-robotics\u002FMetaFine.git\ncd metafine\npip install -e .\n```\n\nThat's it for the simulation core. Verify the install:\n\n```bash\npython -c \"import core.env, core.skill; import gymnasium as gym; \\\n           env = gym.make('grasp_part'); \\\n           print('Ready:', type(env.unwrapped).__name__); env.close()\"\n# → Ready: GraspPartEnv\n```\n\n### Optional extras\n\n```bash\npip install -e \".[ai]\"     # + openai client (used by the AI-planner path)\npip install -e \".[dev]\"    # + pytest for running the test suite\n```\n\n### Assets\n\nThe 40+ part-annotated articulated objects (PartNet-Mobility subset + custom URDFs) and example task-graph configs are distributed as a separate dataset to keep the source repo small. Download from either mirror:\n\n- 🤖 **ModelScope** — `modelscope download --dataset hiangx\u002FMetaFine`\n- 🤗 **Hugging Face** — `huggingface-cli download hiangx\u002FMetaFine --repo-type dataset`\n\nPlace the unpacked `assets\u002F` and `configs\u002F` next to the repo root. Detailed asset onboarding (capability auto-derivation, the review CLI, the schema for `capabilities.json` \u002F `model_data.json`) is in the user guide.\n\n### Per-policy installs (VLA stacks)\n\nEach VLA backbone is a separately-installable subdirectory with its own dependency set — π0 and OpenVLA pin conflicting `torch` \u002F `transformers` versions, so MetaFine deliberately does *not* roll them into the core install. Pick the one you need:\n\n```bash\npip install -e core\u002Fpolicies\u002Fpi05        # π0.5\npip install -e core\u002Fpolicies\u002Fopenvla     # OpenVLA\npip install -e core\u002Fpolicies\u002Fopenvla-oft # OpenVLA-OFT\n# ... see core\u002Fpolicies\u002F\u003Cname>\u002FREADME.md for the rest\n```\n\n---\n\n## 🚀 Quickstart\n\nThe end-to-end pipeline: **record → merge → replay → convert → train → evaluate**. Full tutorials on the [project homepage](https:\u002F\u002Fmetafine.github.io\u002Fuser_guide\u002F).\n\n```bash\n# 1. Record expert demos — single skill, or --task-graph for a multi-stage env.\n#    Output: demos\u002F\u003Cenv>\u002Ftrial_NNNN\u002F{trajectory.h5,trajectory.json}\npython record.py -e grasp_part --object-name 100221 --part-name cap -n 5 --only-count-success\npython record.py --task-graph configs\u002Fexample_grasp_cap.yaml -n 5 --only-count-success\n\n# 2. Merge the per-trial shards (point -i at the env dir; it recurses trial_*)\npython utils\u002Fmerge_trajectory.py -i demos\u002Fgrasp_part \\\n    -o demos\u002Fgrasp_part\u002Fmerged.h5 -p trajectory.h5\n\n# 3. Replay to render observations. Use the recording's own control mode\n#    (see trajectory.json env_kwargs.control_mode; grasp_part = pd_joint_pos)\n#    + --use-env-states for a faithful, deterministic replay.\npython utils\u002Freplay_trajectory.py --traj-path demos\u002Fgrasp_part\u002Fmerged.h5 \\\n    -o rgb -c pd_joint_pos -b physx_cpu --use-env-states --save-traj --save-video\n# → demos\u002Fgrasp_part\u002Fmerged.rgb.pd_joint_pos.physx_cpu.h5\n# For task-graph data add --allow-failure (success is decided at record time;\n# replay can't re-evaluate the goal predicate and must not re-filter).\n\n# 4. Convert for training — LeRobot, or convert_to_rlds for OpenVLA\npython utils\u002Fconvert_to_lerobot.py \\\n    --traj-path demos\u002Fgrasp_part\u002Fmerged.rgb.pd_joint_pos.physx_cpu.h5 \\\n    --output-dir demos\u002Fgrasp_part\u002Flerobot_grasp_part \\\n    --task-name \"Grasp the cap of the bottle.\" --fps 30 --robot-type panda\n\n# 5. Train via the LeRobot or StarVLA pipeline (see user guide)\n\n# 6. Evaluate the trained checkpoint closed-loop in the simulator (π0.5 example)\npython core\u002Fpolicies\u002Fpi05\u002Fevaluate.py \\\n    --policy-path \u002Fpath\u002Fto\u002Fpretrained_model --env-id grasp_part \\\n    --object-name 100221 --part-name cap --obs-mode rgb \\\n    --control-mode pd_joint_delta_pos --n-episodes 50 \\\n    --device cuda --task \"Grasp the cap of the bottle.\" --save-video\n```\n\nEach backbone's exact flags are in its own `core\u002Fpolicies\u002F\u003Cname>\u002FREADME.md`. There is no universal `--task-graph` eval adapter; per-policy `evaluate.py` scripts are standalone.\n\n---\n\n\n\n## 🗂️ Project layout\n\n```\nmetafine\u002F\n├── core\u002F\n│   ├── env.py                 # 19 Gym envs (single-skill + bundle)\n│   ├── skill.py               # 21 motion-planning skill solvers\n│   ├── scene.py               # SceneBuilders (data-driven, no per-asset branches)\n│   ├── skill_registry.py      # @register_skill + affordance metadata\n│   ├── predicates.py          # success-DSL compiler\n│   ├── env_mixins.py          # EvalDREnvMixin (camera\u002Flight jitter)\n│   ├── motion.py              # MP solver helpers\n│   └── policies\u002F              # vendored VLA stacks — installed separately\n│       ├── act\u002F  dp3\u002F  pi0\u002F  pi05\u002F  openvla\u002F  openvla-oft\u002F  starvla\u002F\n├── utils\u002F\n│   ├── task_graph.py          # TaskGraph dataclass + YAML loader + runner\n│   ├── eval_setup.py          # make_eval_env (single-skill ↔ task-graph dispatch)\n│   ├── eval_metrics.py        # EpisodeResult \u002F EvalSummary \u002F smoothness\n│   ├── eval_sweep.py          # dr_sweep + standard_dr_sweeps with AUSC\n│   ├── derive_capabilities.py # URDF → capabilities.json auto-derivation\n│   └── review_capabilities.py # interactive CLI for capabilities QA\n├── assets\u002F                    # distributed separately — see Installation\n├── configs\u002F                   # example task-graph YAMLs\n├── docs\u002F                      # logo + architecture diagram + agent design notes\n├── robots\u002F                    # Franka URDF + robot.py\n├── record.py                  # demo recorder (single-skill + --task-graph mode)\n├── pyproject.toml\n└── README.md\n```\n\n---\n\n\n\n## 🗓️ On the roadmap\n— &nbsp; 🧠 **AI planner.** Natural-language → task-graph YAML; LLM proposes stages, the validator gates them, you review.\n— &nbsp; 🏆 **Hosted PPI evaluation platform + public leaderboard.** Upload phone scan + policy checkpoint; get a unified `results.json` against the public board. Sim \u002F real \u002F hybrid scores side by side.\n— &nbsp; 👋 **Multi-modal observations.** Tactile · force\u002Ftorque · audio. Real-robot parity adapter for drop-in Franka \u002F xArm deployment.\n\n## 🤝 Contributing\n\nThis project is in **active alpha** development; APIs may break between releases. The general workflow:\n\n- Branch from `main` as `feature\u002F\u003Cshort-name>` or `refactor\u002F\u003Cshort-name>`.\n- Run `python smoke_envs.py` before opening a PR — every commit should leave the 19 envs loadable.\n- Keep commits scoped (one phase per commit), and follow the existing imperative-mood commit-message style.\n- For larger changes (new skills, new affordances, new policies), open a discussion on the project homepage first so the affordance vocabulary and registry stay coherent.\n\nFull contributor guide and code-style conventions: see the user guide.\n\n---\n\n## 📑 Citation\n\nIf MetaFine is useful for your work, please cite the [arXiv paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.19986):\n\n```bibtex\n@article{xu2026metafine,\n  title   = {Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation},\n  author  = {Xu, He-Yang and Zhang, Pengyuan and Ge, Zongyuan and Hao, Xiaoshuai and Belongie, Serge and Geng, Xin and Peng, Yuxin and Wei, Xiu-Shen},\n  journal = {arXiv preprint arXiv:2605.19986},\n  year    = {2026}\n}\n```\n\n---\n\n## 🙏 Acknowledgments\n\nMetaFine builds on the shoulders of several superb open-source projects:\n\n- [**SAPIEN**](https:\u002F\u002Fsapien.ucsd.edu\u002F) and [**ManiSkill**](https:\u002F\u002Fgithub.com\u002Fhaosulab\u002FManiSkill) — physics simulator and benchmark backbone.\n- [**PartNet-Mobility**](https:\u002F\u002Fsapien.ucsd.edu\u002Fbrowse) — the articulated-object corpus most of our assets are drawn from.\n- [**LeRobot**](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Flerobot) — episode-format and policy-training tooling.\n- The authors and maintainers of **ACT**, **Diffusion Policy \u002F DP3**, **OpenVLA \u002F OpenVLA-OFT**, **π0 \u002F π0.5**, and **StarVLA** for releasing reproducible policy code.\n\n---\n\n## 📄 License\n\nReleased under the **MIT License**. See the [LICENSE](LICENSE) file for details. Note that vendored VLA stacks under `core\u002Fpolicies\u002F*` retain their own upstream licenses — consult each subdirectory before redistribution.\n","MetaFine 是一个用于精细机器人操作的诊断评估框架。该项目通过将能力分解为理解、感知和行为三个基本维度，提供了一种科学的诊断工具，能够揭示传统基准测试中可能遗漏的隐藏故障模式。它基于组合任务图和可扩展资产库构建，支持多样化的细粒度任务生成、异构基准整合以及纯仿真与混合现实-仿真评估。适合于需要深入分析机器人操作性能的研究者或开发者使用。项目采用Python语言编写，并依赖SAPIEN和ManiSkill等技术栈，遵循MIT许可证发布。",2,"2026-06-11 03:56:23","CREATED_QUERY"]