[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80812":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":13,"forks30d":13,"starsTrendScore":19,"compositeScore":13,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":13,"starSnapshotCount":13,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},80812,"AwareVLN","GWxuan\u002FAwareVLN","GWxuan","[CVPR 2026] AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation","https:\u002F\u002Fgwxuan.github.io\u002FAwareVLN\u002F",null,"Python",48,0,1,3,6,9,10,18,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:04:07","\u003Cdiv align=\"center\">\n\n# AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation\n\n\u003Cp align=\"center\" style=\"margin:1.4em 0 0.8em;\">\n  \u003Ca href=\"https:\u002F\u002Fgwxuan.github.io\u002FAwareVLN\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-AwareVLN-2EA44F?style=flat&labelColor=555555\" alt=\"Project Page\">\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22816\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.22816-B31B1B?style=flat&labelColor=555555&logo=arxiv&logoColor=white\" alt=\"Paper\">\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgwx22\u002FAwareVLN\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDataset-AwareVLN-FFD63A?style=flat&labelColor=555555\" alt=\"Dataset\">\u003C\u002Fa>\n  &nbsp;\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fgwx22\u002FAwareVLN-ck\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCheckpoint-AwareVLN-FFD63A?style=flat&labelColor=555555\" alt=\"Checkpoint\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp style=\"font-size:1.5em;font-weight:600;letter-spacing:0.03em;color:#555;margin:0.75em 0 0;\">\u003Cstrong>CVPR 2026\u003C\u002Fstrong>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fteaser.png\" width=\"800\">\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n## 💡 Introduction\n\nAwareVLN equips VLN with sparse **self-aware reasoning** at key navigation nodes. A unified VLM switches between `[REASON]` and `[ACT]`; an automatic data engine provides scalable supervision.\n\n\n## 🚀 Training\n### Installation\nTo build the training environment, run:\n```bash\n.\u002Fenvironment_setup.sh awarevln\nconda activate awarevln\n```\n\n### Dataset\nTraining annotations of reasoning are produced by our **automatic data engine**, which labels sparse **self-aware reasoning** at key nodes. Download from [Dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgwx22\u002FAwareVLN) and extract `videos.tar.gz` in each subfolder.\n\n* **r2r \u002F rxr:** Trajectories from rollouts of existing policy, with corrections when needed; reasoning annotations from our data engine.\n* **r2rfollow \u002F rxrfollow:** Trajectories that follow expert paths; reasoning annotations from our data engine.\n\n* **Human:** Not included. Follow [NaVILA-Dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fa8cheng\u002FNaVILA-Dataset): use **[video IDs](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fa8cheng\u002FNaVILA-Dataset\u002Fblob\u002Fmain\u002FHuman\u002Fvideo_ids.txt)**, download with `yt-dlp`, extract frames via `scripts\u002Fextract_rawframes.py` in the [NaVILA repo](https:\u002F\u002Fgithub.com\u002Fa8cheng\u002FNaVILA).\n\nThe data should have structure like:\n```graphql\nAwareVLN-Dataset\n├─ reason\n|   ├─ r2r\n|   |    ├─ _anno_cot\n|   |    |    ├─ annotations_shuffle_uni.json\n|   |    |    ├─ cot_new.json\n|   |    ├─ videos\n|   ├─ rxr\n|   |    ├─ ...\n|   ├─ r2rfollow\n|   |    ├─ ...\n|   ├─ rxrfollow\n|   |    ├─ ...\n├─ Human\n|   ├─ raw_frames\n|   |    ├─ \u003Cvideo_id>\n|   |    |    ├─ 0001.jpg\n|   |    |    ├─ ...\n|   ├─ annotations_shuffled.json\n```\n\n### Training\nWe start from **NaVILA-style VILA** (Llama-3 8B + SigLIP + mm_projector, 8 frames), and fine-tune with our reasoning data to learn **self-aware reasoning**. The pretrained model and our trained **AwareVLN weights** are available [here](https:\u002F\u002Fhuggingface.co\u002Fgwx22\u002FAwareVLN-ck).\n\n```bash\nexport AWAREVLN_DATA_ROOT=\u002Fpath\u002Fto\u002Fdata\nbash scripts\u002Ftrain\u002Fsft_8frames.sh\n```\n\n\n## 📊 Evaluation\n\n### Installation\n\nThis repository builds on [VLN-CE](https:\u002F\u002Fgithub.com\u002Fjacobkrantz\u002FVLN-CE), which relies on older versions of [Habitat-Lab](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-lab\u002Ftree\u002Fv0.1.7) and [Habitat-Sim](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-sim\u002Ftree\u002Fv0.1.7).\n\n1. Create conda env `awarevln-eval` (Python 3.10)\n\n```bash\nconda create -n awarevln-eval python=3.10\nconda activate awarevln-eval\n```\n\n2. Build Habitat-Sim & Lab (v0.1.7) from **source**\n\nFollow the [VLN-CE setup guide](https:\u002F\u002Fgithub.com\u002Fjacobkrantz\u002FVLN-CE?tab=readme-ov-file#setup).\nTo resolve NumPy compatibility issues, apply the following hotfix:\n```bash\npython evaluation\u002Fscripts\u002Fhabitat_sim_autofix.py # replace habitat_sim\u002Futils\u002Fcommon.py\n```\n\n3. Install VLN-CE dependencies\n```bash\npip install -r evaluation\u002Frequirements.txt\n```\n\n4. Install VILA dependencies\n```bash\npip install https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.5.8\u002Fflash_attn-2.5.8+cu122torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\n\npip install -e .\npip install -e \".[train]\"\npip install -e \".[eval]\"\n\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers@v4.37.2\nsite_pkg_path=$(python -c 'import site; print(site.getsitepackages()[0])')\ncp -rv .\u002Fllava\u002Ftrain\u002Ftransformers_replace\u002F* $site_pkg_path\u002Ftransformers\u002F\ncp -rv .\u002Fllava\u002Ftrain\u002Fdeepspeed_replace\u002F* $site_pkg_path\u002Fdeepspeed\u002F\n```\n\n5. Fix WebDataset version\n```bash\npip install webdataset==0.1.103\n```\n\n### Data\nFollow [VLN-CE](https:\u002F\u002Fgithub.com\u002Fjacobkrantz\u002FVLN-CE) and download R2R \u002F RxR annotations and MP3D scenes under `evaluation\u002Fdata\u002F` (Val-Unseen, monocular RGB):\n```graphql\nevaluation\u002Fdata\u002Fdatasets\n├─ RxR_VLNCE_v0\n|   ├─ val_unseen\n|   |    ├─ val_unseen_guide.json.gz\n|   |    ├─ ...\n├─ R2R_VLNCE_v1-3_preprocessed\n|   ├─ val_unseen\n|   |    ├─ val_unseen.json.gz\n|   |    ├─ ...\nevaluation\u002Fdata\u002Fscene_datasets\n├─ mp3d\n|   ├─ 17DRP5sb8fy\n|   |    ├─ 17DRP5sb8fy.glb\n|   |    ├─ ...\n```\n\n### Running Evaluation\n1. Trained **AwareVLN weights** are available [here](https:\u002F\u002Fhuggingface.co\u002Fgwx22\u002FAwareVLN-ck), or use your own `outputs\u002F`.\n2. Run evaluation on R2R-CE using:\n```bash\ncd evaluation\nbash scripts\u002Feval\u002Fr2r.sh\n```\nExamples:\n* Single GPU:\n    ```bash\n    MODEL_PATH=..\u002Fck\u002Fawarevln TOTAL_CHUNKS=1 GPU_LIST=\"0\" bash scripts\u002Feval\u002Fr2r.sh\n    ```\n* Multiple GPUs (e.g., 8 GPUs):\n    ```bash\n    MODEL_PATH=..\u002Fck\u002Fawarevln TOTAL_CHUNKS=8 GPU_LIST=\"0,1,2,3,4,5,6,7\" bash scripts\u002Feval\u002Fr2r.sh\n    ```\n3. Run evaluation on RxR-CE using:\n```bash\nMODEL_PATH=..\u002Fck\u002Fawarevln bash scripts\u002Feval\u002Frxr.sh\n```\n4. Results are saved under `evaluation\u002Feval_awarevln\u002F\u003CCKPT_NAME>\u002F`. Metrics are aggregated automatically; to re-run:\n```bash\npython scripts\u002Feval_jsons.py eval_awarevln\u002Fawarevln\u002FVLN-CE-v1\u002Fval_unseen NUM_CHUNKS\npython scripts\u002Feval_jsons.py eval_awarevln\u002Fawarevln\u002FRxR-VLN-CE-v1\u002Fval_unseen NUM_CHUNKS\n```\n\n## 🎬 Demo\n\nAwareVLN performs structured reasoning during navigation—for example, detecting a misinterpreted turn and issuing a corrective plan, or recognizing a completed subtask and planning the next phase aligned with the instruction.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fdemo.gif\" width=\"600\">\n\u003C\u002Fp>\n\n\n_______________________________________________________________\n\n## 📜 Citation\n\n```bibtex\n@article{guo2026awarevln,\n      title={AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation}, \n      author={Wenxuan Guo and Xiuwei Xu and Yichen Liu and Xiangyu Li and Hang Yin and Huangxing Chen and Wenzhao Zheng and Jianjiang Feng and Jie Zhou and Jiwen Lu},\n      journal={arXiv preprint arXiv:2605.22816},\n      year={2026},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22816}, \n}\n```\n","AwareVLN 是一个用于视觉-语言导航（VLN）的项目，通过在关键导航节点引入稀疏的自我意识推理来增强导航能力。该项目的核心功能包括一个统一的视觉-语言模型（VLM），该模型能够在推理和行动之间切换，并且利用自动数据引擎提供可扩展的监督。技术上，它基于 NaVILA 风格的 VILA 模型（Llama-3 8B + SigLIP + mm_projector, 8 帧）进行微调，以学习自我意识推理。适用于需要提高机器人或虚拟代理在复杂环境中导航效率与准确性的场景，如智能家居、自动驾驶等。",2,"2026-06-11 04:02:25","CREATED_QUERY"]