[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80083":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":13,"stars7d":13,"stars30d":14,"stars90d":13,"forks30d":13,"starsTrendScore":13,"compositeScore":15,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":16,"fork":16,"defaultBranch":17,"hasWiki":16,"hasPages":16,"topics":18,"createdAt":8,"pushedAt":8,"updatedAt":19,"readmeContent":20,"aiSummary":21,"trendingCount":13,"starSnapshotCount":13,"syncStatus":22,"lastSyncTime":23,"discoverSource":24},80083,"AtlasVA","wangpan-ustc\u002FAtlasVA","wangpan-ustc",null,"Python",71,3,60,0,9,1.81,false,"main",[],"2026-06-12 02:03:57","# \u003Cimg src=\"logo\u002Flogo.png\" alt=\"AtlasVA logo\" width=\"36\" align=\"center\"> AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.17933-red?style=flat&logo=arxiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.17933) [![Website](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Homepage-blue?style=flat&logo=github)](https:\u002F\u002Fwangpan-ustc.github.io\u002FAtlasvaWeb)\n\nWelcome to the official repository for **AtlasVA**! 🚀\n\n**AtlasVA** is a teacher-free visual skill memory framework designed for Vision-Language Model (VLM) agents. Unlike traditional methods that compress spatial knowledge into lossy text and rely on proprietary LLMs for supervision, AtlasVA keeps experience visually grounded. It organizes memory into three complementary layers: spatial heatmaps, visual exemplars, and symbolic text skills. By evolving danger and affinity atlases directly from trajectory statistics, AtlasVA provides dense, coordinate-aware guidance for reinforcement learning. This unifies perception, memory, and optimization without external LLM supervision, achieving strong performance on spatially intensive tasks like Sokoban, FrozenLake, 3D embodied navigation, and 3D robotic manipulation! 🏆\n\n![AtlasVA Architecture](https:\u002F\u002Ftuchuang-apen.oss-cn-beijing.aliyuncs.com\u002Farch.png)\n*The overall architecture of AtlasVA. (a) The Three-Layer Visual Memory (VSM) stores spatial heatmaps, visual exemplars, and symbolic text skills. (b) Teacher-Free Visual Atlas Evolution bootstraps memory from raw interaction history. (c) Atlas-Grounded Dense Visual Reward Shaping provides coordinate-aware guidance for RL. (d) Policy Optimization unifies perception, memory, and optimization.*\n\nThe code is organized to reproduce the reinforcement-learning runs and evaluation scripts used in the paper. The main package is `atlasva`, and the training backend is built on the vendored `verl` directory.\n\n## 📦 What Is Included\n\n- `atlasva\u002F`: AtlasVA environments, agent loop, visual skill memory, reward shaping, and evaluation utilities.\n- `atlasva\u002Fconfigs\u002F`: Hydra entry configs. The main training config is `atlasva_multiturn.yaml`.\n- `scripts\u002F`: training and validation configs for each environment.\n- `scripts\u002FGiants\u002F`: API-based zero-shot large-model evaluation scripts.\n- `verl\u002F`: the RL training backend used by the training scripts.\n\n> ⚠️ **Note:** Large model checkpoints and experiment outputs are not included in the source artifact. Training scripts write outputs to `exps\u002F`.\n\n---\n\n## 💻 Hardware and Software Assumptions\n\nThe full training scripts are configured for one node with 8 GPUs:\n\n- 🐧 CUDA-capable Linux machine (CUDA 12.8).\n- 🎮 8 GPUs for the default scripts.\n- 🐍 Python 3.12. We used a Conda environment (PyTorch: 2.8.0+cu128, PyTorch CUDA build: 12.8).\n- 🧠 Qwen2.5-VL-3B-Instruct as the default policy model.\n- 📊 Optional Weights & Biases logging. Disable or change `trainer.logger` in the scripts if W&B is not available.\n\nFor quick code checks, a CPU-only machine is enough. For full reproduction, reviewers should run on a CUDA machine with enough GPU memory for vLLM and FSDP.\n\n---\n\n## 🛠️ Environment Setup\n\nCreate and activate a fresh environment:\n\n```bash\nconda create -n atlasva python=3.12 -y\nconda activate atlasva\n```\n\nInstall the vendored `verl` backend:\n\n```bash\ncd verl\nUSE_MEGATRON=0 bash scripts\u002Finstall_vllm_sglang_mcore.sh\npip install --no-deps -e .\ncd ..\n```\n\nInstall AtlasVA and runtime dependencies:\n\n```bash\npip install -e .\npip install \"trl==0.26.2\"\npip install \"vllm==0.11.0\"\npip install \"transformers==4.56.2\"\npip install \"setuptools==79.0.1\"\npip install matplotlib einops pyzmq hydra-core fire ray wandb Pillow\n```\n\nIf your CUDA\u002FPyTorch setup requires FlashAttention explicitly, install it after PyTorch is available:\n\n```bash\npip install \"flash-attn==2.8.3\" --no-build-isolation\n```\n\nPrimitiveSkill and Swap use ManiSkill\u002FSAPIEN rendering. Install ManiSkill and download the assets before running those environments:\n\n```bash\npip install \"mani_skill==3.0.0b20\"\npython -m mani_skill.utils.download_asset PickSingleYCB-v1 -y\npython -m mani_skill.utils.download_asset partnet_mobility_cabinet -y\n```\n\nOn headless Linux machines, ManiSkill\u002FSAPIEN also needs working EGL\u002FVulkan drivers and GLVND vendor files. If imports fail with messages such as `\u002Fusr\u002Fshare\u002Fglvnd\u002Fegl_vendor.d` missing, install the host NVIDIA driver, Vulkan\u002FGLVND runtime packages, and verify that the container or job exposes `\u002Fusr\u002Fshare\u002Fglvnd\u002Fegl_vendor.d` and the NVIDIA ICD files.\n\n---\n\n## 🧠 Model Checkpoints\n\nThe pre-trained **AtlasVA** model weights are publicly available on Hugging Face:\n🤗 [wangpan-ustc\u002FAtlasVA](https:\u002F\u002Fhuggingface.co\u002Fwangpan-ustc\u002FAtlasVA)\n\nFor training, the scripts use `Qwen\u002FQwen2.5-VL-3B-Instruct` as the base policy model by default. To avoid repeated downloads, point the scripts to a local checkpoint:\n\n```bash\nexport QWEN25_VL_3B_LOCAL_PATH=\u002Fpath\u002Fto\u002FQwen2.5-VL-3B-Instruct\n```\n\nEach script checks this variable first. If `config.json` exists under that directory, the script switches Hugging Face and Transformers to offline mode. Otherwise, it falls back to the Hugging Face model ID.\n\n---\n\n## 🧪 Smoke Tests\n\nRun these commands before launching expensive training:\n\n```bash\npython -m compileall -q atlasva scripts setup.py\npython -c \"import atlasva; import atlasva.envs.registry; import atlasva.envs_remote; print('imports ok')\"\npython setup.py --name\n```\n\nAfter installing Hydra\u002FRay\u002FvLLM dependencies, check that the training entry can load:\n\n```bash\npython -m atlasva.main_ppo \\\n  --config-path=\"$(pwd)\u002Fatlasva\u002Fconfigs\" \\\n  --config-name=atlasva_multiturn \\\n  --help\n```\n\n---\n\n## 📂 Repository Layout\n\n| Path | Purpose |\n| --- | --- |\n| `atlasva\u002Fmain_ppo.py` | PPO training entry point. |\n| `atlasva\u002Fray_trainer.py` | AtlasVA training loop extensions. |\n| `atlasva\u002Fagent_loop\u002F` | Multi-turn environment-agent interaction loop. |\n| `atlasva\u002Fskills\u002F` | Text skills, visual skill memory, heatmaps, exemplars, and visual rewards. |\n| `atlasva\u002Fenvs\u002F` | Sokoban, FrozenLake, Navigation, and PrimitiveSkill environments. |\n| `atlasva\u002Fenvs_remote\u002F` | HTTP client\u002Fserver wrapper for remote rendered environments. |\n| `scripts\u002F*\u002F*.yaml` | Dataset and environment specifications. |\n| `scripts\u002F*\u002F*.sh` | Reproduction scripts for main runs and baselines. |\n\n---\n\n## 🚀 Main Training Commands\n\nRun all commands from the repository root. The scripts create experiment directories under `exps\u002F\u003Cproject>\u002F\u003Cexperiment>\u002F`.\n\n### 🧩 Sokoban\n\n```bash\nbash scripts\u002Fsokoban\u002Ftrain_ppo_qwen25vl3b_Base.sh\nbash scripts\u002Fsokoban\u002Ftrain_ppo_qwen25vl3b_Skill.sh\n```\n\n### 🧊 FrozenLake\n\n```bash\nbash scripts\u002Ffrozenlake\u002Ftrain_ppo_qwen25vl3b_Skill.sh\nbash scripts\u002Ffrozenlake\u002Ftrain_grpo_qwen25vl3b_Skill.sh\n```\n\n### 🗺️ Navigation\n\nNavigation uses a remote environment server. Start the server first:\n\n```bash\npython -m atlasva.envs.navigation.serve \\\n  --port=8036 \\\n  --devices='[0,1,2,3,4,5,6,7]' \\\n  --max_envs=512\n```\n\nFor a local single-server run, use the `_Local` script and local YAML files:\n\n```bash\nbash scripts\u002Fnavigation\u002Ftrain_ppo_qwen25vl3b_SkillCommon_Local.sh\n```\n\nFor multi-server runs, edit `base_urls` in `scripts\u002Fnavigation\u002Ftrain_navigation_base_common.yaml` and `scripts\u002Fnavigation\u002Fval_navigation_base_common.yaml`, then run:\n\n```bash\nbash scripts\u002Fnavigation\u002Ftrain_ppo_qwen25vl3b_BaseCommon.sh\nbash scripts\u002Fnavigation\u002Ftrain_ppo_qwen25vl3b_SkillCommon.sh\n```\n\n### 🤖 PrimitiveSkill\n\nPrimitiveSkill also supports remote rendering. Start one or more servers:\n\n```bash\npython -m atlasva.envs.primitive_skill.serve --port=8037 --max_envs=512\npython -m atlasva.envs.primitive_skill.serve --port=8038 --max_envs=512\npython -m atlasva.envs.primitive_skill.serve --port=8039 --max_envs=512\n```\n\nThen edit the `base_urls` fields in `scripts\u002Fprimitive_skill\u002Ftrain_primitive_skill_vision_remote.yaml` and `scripts\u002Fprimitive_skill\u002Fval_primitive_skill_vision_remote.yaml` if your server addresses differ from the defaults. Launch:\n\n```bash\nbash scripts\u002Fprimitive_skill\u002Ftrain_ppo_qwen25vl3b_Base.sh\nbash scripts\u002Fprimitive_skill\u002Ftrain_ppo_qwen25vl3b_Skill.sh\n```\n\n---\n\n## 📊 Outputs\n\nTraining scripts write:\n\n- 💾 checkpoints to `exps\u002F\u003Cproject>\u002F\u003Cexperiment>\u002Fverl_checkpoints\u002F`;\n- 📈 validation traces to `exps\u002F\u003Cproject>\u002F\u003Cexperiment>\u002Fvalidation\u002F`;\n- 🎥 rollout dumps to `exps\u002F\u003Cproject>\u002F\u003Cexperiment>\u002Frollout_data\u002F`;\n- 📝 logs to both the experiment directory and the repository root.\n\nThe default scripts use W&B plus console logging:\n\n```bash\nwandb login\n```\n\nTo avoid W&B, change `trainer.logger=['console','wandb']` to `trainer.logger=['console']` in the target script.\n\n---\n\n## 🤖 API-Based Large-Model Evaluation\n\nThe `scripts\u002FGiants\u002F` directory evaluates closed-source or hosted models through OpenRouter-compatible APIs.\n\n```bash\npip install openai\nexport OPENROUTER_API_KEY=\"sk-or-...\"\n\nbash scripts\u002FGiants\u002Feval_gpt4o.sh\nbash scripts\u002FGiants\u002Feval_gpt5.sh\nbash scripts\u002FGiants\u002Frun_all_sokoban.sh\n```\n\nThe API evaluation writes summaries and episode dumps under `exps\u002Fgiants\u002F`.\n\n---\n\n## 🌐 Remote Environment Notes\n\nRemote YAML files contain concrete `base_urls` used in our internal cluster, for example `http:\u002F\u002Flocalhost:8036`. Reviewers should replace these addresses with their own server addresses.\n\nHealth check:\n\n```bash\ncurl -s http:\u002F\u002Flocalhost:8036\u002Fhealth\n```\n\nIf a rendering server runs on a separate machine, expose the server port with SSH tunneling or the cluster networking tool available in your environment.\n\n---\n\n## ❓ Common Issues\n\n- **`ModuleNotFoundError: hydra`, `fire`, or `ray`**: install `hydra-core fire ray`.\n- **`python -m atlasva.main_ppo` cannot find the config**: pass `--config-path=\"$(pwd)\u002Fatlasva\u002Fconfigs\" --config-name=atlasva_multiturn`.\n- **`ModuleNotFoundError: PIL`**: install `Pillow`.\n- **PrimitiveSkill or Swap fails during `mani_skill`\u002FSAPIEN import**: check that ManiSkill assets are downloaded and EGL\u002FVulkan\u002FGLVND files are available on the host or inside the container.\n- **Remote training hangs at environment creation**: check `base_urls`, firewall rules, and `curl http:\u002F\u002F\u003Chost>:\u003Cport>\u002Fhealth`.\n- **Hugging Face downloads are slow or unavailable**: set `QWEN25_VL_3B_LOCAL_PATH` to a local model directory.\n- **W&B is unavailable**: change `trainer.logger` to console-only in the script.\n\n---\n\n## ✅ Minimal Reviewer Workflow\n\nFor a lightweight artifact check:\n\n1. Install the environment.\n2. Run the smoke tests.\n3. Start one Navigation or PrimitiveSkill server.\n4. Edit the relevant YAML `base_urls` to `localhost`.\n5. Launch the corresponding `_Local` or remote training script with reduced `trainer.total_training_steps` and smaller `n_envs`.\n\nFor full reproduction, use the scripts listed above without reducing the training steps.\n\n## Acknowledgments\nOur work builds upon [VAGEN](https:\u002F\u002Fgithub.com\u002Fmll-lab-nu\u002FVAGEN), and we sincerely appreciate the authors for their outstanding contributions.\n","AtlasVA是一个无教师视觉技能记忆框架，专为视觉-语言模型（VLM）代理设计。其核心功能包括三层互补的视觉记忆结构：空间热图、视觉样本和符号文本技能，并通过直接从轨迹统计数据演化的危险和亲和力地图提供密集的、坐标感知的强化学习指导。这使得AtlasVA能够在无需外部大型语言模型监督的情况下统一感知、记忆与优化过程，在如Sokoban、FrozenLake等空间密集型任务上表现出色。该项目使用Python开发，适合需要在复杂环境中进行高效自主导航或操作的研究者及开发者应用。",2,"2026-06-11 03:59:12","CREATED_QUERY"]