[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80101":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":23,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":13,"lastSyncTime":28,"discoverSource":29},80101,"Grid-Sampler","Fediory\u002FGrid-Sampler","Fediory","[ICML 2026] Official Implementation of \"See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model\"","https:\u002F\u002Ffediory.github.io\u002FGrid-Sampler\u002F",null,"Python",64,2,57,1,0,3,8,9,46.73,false,"main",true,[],"2026-06-12 04:01:26","\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo.png\" alt=\"Grid Sampler logo\" width=\"140\" \u002F>\n\u003C\u002Fp>\n\n# Grid Sampler [ICML 2026]\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FFediory\u002FGrid-Sampler\u002Ftree\u002Fmain\u002Fcore_idea\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCore%20idea-363636?style=for-the-badge\" alt=\"Core idea\" \u002F>\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.11817\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArXiv%20Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=white\" alt=\"ArXiv paper\" \u002F>\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Ffediory.github.io\u002FGrid-Sampler\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-6366f1?style=for-the-badge&logo=googlechrome&logoColor=white\" alt=\"Project website\" \u002F>\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FFediory\u002Fgrid-sampler\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModels-FF9D00?style=for-the-badge&logo=huggingface&logoColor=white\" alt=\"Hugging Face models\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n[简体中文](README-ZH.md) | [知乎](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2038021530454586577)\n\n#### See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=WljJ2HUAAAAJ\">Yixu Feng\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Zinan_Zhao1\">Zinan Zhao\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=mBHSbeIAAAAJ\">Yanxiang Ma\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Chenghao_Xia1\">Chenghao Xia\u003C\u002Fa>\u003Csup>3\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=guY3iCsAAAAJ\">Chengbin Du\u003C\u002Fa>\u003Csup>3\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=m4wbcOsAAAAJ\">Yunke Wang\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=N4F_3eoAAAAJ\">Chang Xu\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n  \u003Csup>1\u003C\u002Fsup> University of Sydney &nbsp;·&nbsp;\n  \u003Csup>2\u003C\u002Fsup> City University of Hong Kong &nbsp;·&nbsp;\n  \u003Csup>3\u003C\u002Fsup> StellarEdge Robotics\n\u003C\u002Fp>\n\n## Demo 🎞\n\n\n\n**In-domain**\n\n| Pen | Pick | Stack |\n|:---:|:---:|:---:|\n| [▶ Watch `GridS_demo_pen.mp4`](demo\u002FGridS_demo_pen.mp4) | [▶ Watch `GridS_demo_pick.mp4`](demo\u002FGridS_demo_pick.mp4) | [▶ Watch `GridS_demo_stack.mp4`](demo\u002FGridS_demo_stack.mp4) |\n\n**OOD**\n\n| Pen (OOD) | Pick (OOD) | Stack (OOD) |\n|:---:|:---:|:---:|\n| [▶ Watch `GridS_demo_pen_ood.mp4`](demo\u002FGridS_demo_pen_ood.mp4) | [▶ Watch `GridS_demo_pick_ood.mp4`](demo\u002FGridS_demo_pick_ood.mp4) | [▶ Watch `GridS_demo_stack_ood.mp4`](demo\u002FGridS_demo_stack_ood.mp4) |\n\n\n\n## News 🆕\n\n- **2026.05.10** Code public on openpi. 💎\n- **2026.05.01** Congratulations! Our paper \"See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model\" has been accepted by ICML 2026. We look forward to subsequent work based on the Grid Sampler! 🔥\n\n\n## To-Do List ✅\n- ✅ Release openpi code with Grid Sampler.\n- ✅ Release Lerobot code with Grid Sampler on real-world dataset.\n- ⬜ Release X-VLA code with Grid Sampler on Robotwin.\n\n\n## Results 📊\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>LIBERO:\u003C\u002Fb>\u003C\u002Fsummary>\n\n![Result 1](assets\u002Fmain_result1.png)\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Real-world:\u003C\u002Fb>\u003C\u002Fsummary>\n\n![Result 2](assets\u002Fmain_result2.png)\n\n\u003C\u002Fdetails>\n\n\u003Ca id=\"core-idea\">\u003C\u002Fa>\n\n## 1. Core idea 🌑\n\n![Main idea](assets\u002Fcore_idea.png)\n\n*Overview of the GridS Token Pruning framework.*\n\n**(a) Standard Dense Representation:** An input image (*H*\u003Csub>R\u003C\u002Fsub> and *W*\u003Csub>R\u003C\u002Fsub> denote the original image resolution) is processed by a visual encoder with ViT-style embeddings (Dosovitskiy et al., 2020) to generate dense visual tokens (*H* × *W* × *C*), capturing full spatial details.\n\n**(b) GridS Token Pruning Module:** This module identifies salient regions to sample a sparse set of visual tokens (*K* × *C*), which includes two stages: (1) Global Coordinate Prediction, and (2) Grid Sampling with Geometry Injection. By ensuring the token count is significantly smaller than the dense spatial resolution (*K* ≪ *H* × *W*), it achieves efficient representation for the downstream Transformer.\n\n## 2. Testing 🌒\n\n### openpi\n\nAll runnable checks for this stack live in **openpi**. After environment setup in [`openpi\u002FREADME.md`](openpi\u002FREADME.md) (**Installation**), use the main doc as the index:\n\n- **Quick policy smoke test (no robot):** the **“Test inference without a robot”** paragraph under **Running Inference for a Pre-Trained Model** — it points to [`openpi\u002Fexamples\u002Fsimple_client\u002FREADME.md`](openpi\u002Fexamples\u002Fsimple_client\u002FREADME.md).\n- **Notebook \u002F general inference flow:** same **Running Inference for a Pre-Trained Model** section (and the linked `examples\u002Finference.ipynb` mentioned there).\n- **LIBERO simulation rollout \u002F eval:** [`openpi\u002Fexamples\u002Flibero\u002FREADME.md`](openpi\u002Fexamples\u002Flibero\u002FREADME.md) (see also the LIBERO pointer in the fine-tuned checkpoint table on the main README).\n- **ALOHA:** simulator workflow in [`openpi\u002Fexamples\u002Faloha_sim\u002FREADME.md`](openpi\u002Fexamples\u002Faloha_sim\u002FREADME.md); real-robot setup in [`openpi\u002Fexamples\u002Faloha_real\u002FREADME.md`](openpi\u002Fexamples\u002Faloha_real\u002FREADME.md) (also summarized under **More Examples** in [`openpi\u002FREADME.md`](openpi\u002FREADME.md)).\n\n### LeRobot (PyTorch)\n\nThis repo vendors a **LeRobot** fork under [`lerobot\u002F`](lerobot\u002F). Install and CLI usage follow upstream **[`lerobot\u002FREADME.md`](lerobot\u002FREADME.md)** (e.g. `pip install -e \".[...]\"` from that directory, then `lerobot-train` \u002F `lerobot-eval` as documented there).\n\n**What Grid Sampler changes inside this LeRobot tree**\n\n| Area | Change |\n|------|--------|\n| **New module** | [`lerobot\u002Fsrc\u002Flerobot\u002Fpolicies\u002Factive_token_sampler.py`](lerobot\u002Fsrc\u002Flerobot\u002Fpolicies\u002Factive_token_sampler.py) — PyTorch **ActiveTokenSampler**: global-pooled visual context predicts `K` normalized 2D locations, **`F.grid_sample`** bilinearly reads features, optional **coordinate MLP** adds geometry to sampled tokens (non-square feature maps are squared with adaptive pooling first). |\n| **SmolVLA** | [`configuration_smolvla.py`](lerobot\u002Fsrc\u002Flerobot\u002Fpolicies\u002Fsmolvla\u002Fconfiguration_smolvla.py), [`modeling_smolvla.py`](lerobot\u002Fsrc\u002Flerobot\u002Fpolicies\u002Fsmolvla\u002Fmodeling_smolvla.py) — flags **`use_grid_token_sampler`** (default `True`) and **`grid_token_sampler_num_tokens`** (default `16`). When on, SigLIP patch tokens are reshaped to a feature map, pruned by `ActiveTokenSampler`, then fed to the VLM as **K tokens** instead of the full patch grid. `forward` \u002F `predict_action_chunk` \u002F `select_action` accept optional **`use_grid_token_sampler`** to override per call (see module docstring at top of `modeling_smolvla.py`). |\n| **ACT \u002F Diffusion \u002F VQ-BeT \u002F TDMPC** | Each policy’s **configuration** adds **`use_vision_grid_token_prune`** and **`vision_grid_token_prune_num_tokens`**; the matching **modeling** file wires **`ActiveTokenSampler`** after the CNN \u002F ResNet vision tower (before the usual flatten + transformer or spatial-softmax path), with learnable pos-embeddings sized to `K` when pruning is enabled. |\n| **π0 \u002F π0-FAST \u002F SAC \u002F reward classifier** | These files **import** `ActiveTokenSampler` for consistency with the tree, but **do not call it in the forward pass** in this fork — the full **JAX π0 + Grid** path stays in **openpi** (configs with `grid=True`). |\n\n## 3. Finetuning 🌓\n\n### openpi\n\nFine-tuning follows the **openpi** workflow; the authoritative walkthrough is **[Fine-Tuning Base Models on Your Own Data](openpi\u002FREADME.md#fine-tuning-base-models-on-your-own-data)** in [`openpi\u002FREADME.md`](openpi\u002FREADME.md). In short:\n\n1. **Data:** convert to a LeRobot dataset (LIBERO example: [`openpi\u002Fexamples\u002Flibero\u002Fconvert_libero_data_to_lerobot.py`](openpi\u002Fexamples\u002Flibero\u002Fconvert_libero_data_to_lerobot.py)); for LIBERO-only runs you can often skip conversion if you use the bundled dataset as in the upstream configs.\n2. **Configs:** data transforms and `TrainConfig` live in [`openpi\u002Fsrc\u002Fopenpi\u002Ftraining\u002Fconfig.py`](openpi\u002Fsrc\u002Fopenpi\u002Ftraining\u002Fconfig.py) (e.g. LIBERO policies in [`openpi\u002Fsrc\u002Fopenpi\u002Fpolicies\u002Flibero_policy.py`](openpi\u002Fsrc\u002Fopenpi\u002Fpolicies\u002Flibero_policy.py)). For **Grid Sampler** runs, choose a training config whose model sets `grid=True` (e.g. names like `pi0_libero_grid*`, `pi05_libero_grid*` in that file).\n3. **Train (JAX):** compute norm stats then launch training as documented there, e.g. `uv run scripts\u002Fcompute_norm_stats.py --config-name \u003Cyour_config>` then `uv run scripts\u002Ftrain.py \u003Cyour_config> --exp-name=...` (see the same README section for flags and `XLA_PYTHON_CLIENT_MEM_FRACTION`).\n4. **PyTorch path:** if you use the PyTorch stack, follow **[PyTorch Support](openpi\u002FREADME.md#pytorch-support)** in the same README (setup, `train_pytorch.py`, and JAX→PyTorch conversion notes).\n\nServing a fine-tuned checkpoint is covered under **Spinning up a policy server and running inference** in that chapter, and in [`openpi\u002Fscripts\u002Fserve_policy.py`](openpi\u002Fscripts\u002Fserve_policy.py) \u002F [`openpi\u002Fexamples\u002Flibero\u002FREADME.md`](openpi\u002Fexamples\u002Flibero\u002FREADME.md) for LIBERO.\n\n### LeRobot\n\nUse the vendored [`lerobot\u002F`](lerobot\u002F) tree: install extras as in **[`lerobot\u002FREADME.md`](lerobot\u002FREADME.md)**, then train with the usual LeRobot CLI, for example:\n\n```bash\ncd lerobot\npip install -e \".[smolvla]\"   # or another policy extra you need\nlerobot-train --policy.type=smolvla --dataset.repo_id=...\n```\n\nEnable or tune Grid-style pruning via policy kwargs, e.g. **`--policy.use_grid_token_sampler=true|false`** and **`--policy.grid_token_sampler_num_tokens=N`** for **SmolVLA**; for **ACT \u002F Diffusion \u002F VQ-BeT \u002F TDMPC**, use **`--policy.use_vision_grid_token_prune=true`** and **`--policy.vision_grid_token_prune_num_tokens=N`** (must match how the checkpoint was trained when loading weights). See the **“What Grid Sampler changes”** table under **LeRobot (PyTorch)** in **§2 Testing** above for file-level detail.\n\n## 4. Contacts 🌔\nIf you have any questions, please contact us or submit an issue to the repository! We sincerely welcome your feedback and contributions.\n\nYixu Feng ([yfen0429@sydney.edu.au](yfen0429@sydney.edu.au) or [fedioryf@gmail.com](fedioryf@gmail.com))\n\nZinan Zhao ([zhao48zinan@gmail.com](zhao48zinan@gmail.com))\n\nYou can also scan the QR code below to contact me:\n\n\u003Cimg src=\"assets\u002Fwechat.jpg\" alt=\"WeChat QR code\" width=\"220\">\n\n## 5. Citation 🌕\nIf you find our work useful for your research, please cite our paper:\n\n```\n@inproceedings{feng2026gridsampler,\n  title     = {See What Matters: Differentiable Grid Sample Pruning for Generalizable Vision-Language-Action Model},\n  author    = {Feng, Yixu and Zhao, Zinan and Ma, Yanxiang and Xia, Chenghao and Du, Chengbin and Wang, Yunke and Xu, Chang},\n  booktitle = {Forty-Third International Conference on Machine Learning (ICML)},\n  year      = {2026}\n}\n  ```","Grid Sampler 是一个用于视觉-语言-动作模型的可微网格采样修剪方法的官方实现，旨在提高模型在不同场景下的泛化能力。该项目的核心功能是通过一种新颖的网格采样技术来识别和保留对任务至关重要的视觉信息，同时去除冗余数据，从而优化模型性能。它采用Python编写，并基于深度学习框架实现了这一创新算法。此项目特别适用于需要增强机器视觉系统鲁棒性和适应性的研究与开发场景，如机器人操作、自动驾驶等领域的应用。","2026-06-11 03:59:14","CREATED_QUERY"]