[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11720":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":14,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":16,"starSnapshotCount":16,"syncStatus":37,"lastSyncTime":38,"discoverSource":39},11720,"habitat-gs","zju3dv\u002Fhabitat-gs","zju3dv","Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting","https:\u002F\u002Fzju3dv.github.io\u002Fhabitat-gs\u002F",null,"C++",152,9,5,1,0,8,28,15,3,"MIT License",false,"main",[25,26,27,28,29,30,31,32,33],"computer-vision","embodied-ai","embodied-navigation","gaussian-avatar","gaussian-splatting","robotics","simulator","vln","vln-ce","2026-06-12 02:02:33","\u003Cdiv align=\"center\">\n\n\u003Ch1 align=\"center\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"docs\u002Fgs_assets\u002Flogo_black.png\">\n    \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"docs\u002Fgs_assets\u002Flogo_white.png\">\n    \u003Cimg alt=\"Habitat-GS\" src=\"docs\u002Fgs_assets\u002Flogo_black.png\" width=\"50%\">\n  \u003C\u002Fpicture>\u003Cbr>\n  A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting\n\u003C\u002Fh1>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.12626\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Habitat--GS-red' alt='Paper PDF'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fzju3dv.github.io\u002Fhabitat-gs\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-Habitat--GS-orange' alt='Project Page'>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRukawaY\u002Fgs_scenes\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging_Face-GS_Dataset-blue' alt='Hugging Face'>\u003C\u002Fa>\n  \u003Ca href=\"docs\u002Fgs_assets\u002Fwechat.png\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-Group-green?logo=wechat&logoColor=green\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n  \n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fziyuan-xia.com\">Ziyuan Xia\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fecho636\">Jingyi Xu\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FKinchite17\">Chong Cui\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fyuanhongyu.xyz\">Yuanhong Yu\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fjzhzhang.github.io\">Jiazhao Zhang\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fyanqswhu.top\">Qingsong Yan\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Forcid.org\u002F0000-0002-8676-6546\">Tao Ni\u003C\u002Fa> \u003Cbr>\n  \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=4YOIYGwAAAAJ&hl=en\">Junbo Chen\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fxzhou.me\">Xiaowei Zhou\u003C\u002Fa> •\n  \u003Ca href=\"http:\u002F\u002Fwww.cad.zju.edu.cn\u002Fhome\u002Fbao\u002F\">Hujun Bao\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fcsse.szu.edu.cn\u002Fstaff\u002Fruizhenhu\u002F\">Ruizhen Hu\u003C\u002Fa> •\n  \u003Ca href=\"https:\u002F\u002Fpengsida.net\u002F\">Sida Peng\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Cvideo src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1d971739-3507-4667-b9c1-f9debd64ea76\" controls width=\"600\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n\n## 📢 News\n\n> **[2026-05]** 🎉 [GS dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRukawaY\u002Fgs_scenes) is expanded with 64 [InteriorGS](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fspatialverse\u002FInteriorGS) scenes! Now 129 scenes in total!\n\n> **[2026-04]** 🎉 [GS dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRukawaY\u002Fgs_scenes) is updated! We provide 65 high-quality GS scenes, as well as episodes and trajectories for training and evaluation!\n\n> **[2026-04]** 🎉 Paper, project page, code and dataset of Habitat-GS are released! Check it out!\n\n## 🧭 What Is Habitat-GS and What Can Habitat-GS Do?\n\n`Habitat-GS` is a non-intrusive extension of [Habitat-Sim](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-sim) for embodied navigation tasks in Gaussian Splatting scenes. It keeps [Habitat](https:\u002F\u002Faihabitat.org)'s standard scene dataset abstraction, NavMesh\u002Fpathfinding, agent control, and [Habitat-Lab]((https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-lab)) integration, while extending the rendering backbone to support [3D Gaussian Splatting](https:\u002F\u002Fgithub.com\u002Fgraphdeco-inria\u002Fgaussian-splatting), and incorporating a dynamic gaussian avatar module to drive humanoid gaussian avatars.\n\nIn practice, Habitat-GS can:\n\n- render photo-realistic GS scenes with Habitat RGB and depth sensors;\n- support driving dynamic Gaussian avatars from [GaussianAvatar](https:\u002F\u002Fgithub.com\u002Faipixel\u002FGaussianAvatar) and [AnimatableGaussians](https:\u002F\u002Fgithub.com\u002Flizhe00\u002FAnimatableGaussians) in simulation environments;\n- plug into Habitat-Lab for training and evaluation with the same scene dataset format.\n\nCompared with traditional mesh-based simulators, Habitat-GS can achieve photo-realistic rendering and render high-fidelity gaussian avatars with high efficiency. By introducing Gaussian Splatting to embodied simulators, we hope our work can facilitate future embodied AI research.\n\n## 📖 Table of Contents\n\n- 🛠️ [Install Habitat-GS](#%EF%B8%8F-install-habitat-gs)\n  - 🧪 [Create the environment](#-create-the-environment)\n  - 📦 [Install Habitat-GS](#-install-habitat-gs)\n  - 🤖 [Install Habitat-Lab](#-install-habitat-lab-optional-but-recommended)\n- 📦 [Download GS Asset](#-download-gs-asset)\n- 🚀 [Run Habitat-GS](#-run-habitat-gs)\n  - 🧱 [Prepare GS Assets](#-prepare-gs-assets)\n  - 🗂️ [Organize Scene Dataset](#%EF%B8%8F-organize-scene-dataset)\n  - 🖥️ [Run Interactive Viewer](#%EF%B8%8F-run-interactive-viewer)\n- 🦞 [HabitatAgent](#-habitatagent)\n- 🏋️ [Train\u002FEval Navigation Agents on Habitat-GS](#%EF%B8%8F-traineval-navigation-agents-on-habitat-gs)\n  - 🗺️ [Point\u002FImage\u002FObject Goal Navigation on Habitat-Lab](#%EF%B8%8F-pointimageobject-goal-navigation-on-habitat-lab)\n  - 🗣️ [Vision-and-Language Navigation with StreamVLN](#%EF%B8%8F-vision-and-language-navigation-with-streamvln)\n  - ✈️ [Vision-and-Language Navigation with Uni-NaVid](#%EF%B8%8F-vision-and-language-navigation-with-uni-navid)\n- 📚 [Citation](#-citation)\n\n## 🛠️ Install Habitat-GS\n\n### 🧪 Create the environment\n\n```bash\nconda create -n habitat-gs python=3.12 cmake=3.27\nconda activate habitat-gs\n\n# IMPORTANT: Install CUDA-compatible torch first\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n```\n\n### 📦 Install Habitat-GS\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fzju3dv\u002Fhabitat-gs.git\ncd habitat-gs\ngit submodule update --init --recursive\n\n# Recommended: CUDA on, Bullet off\nHABITAT_WITH_CUDA=ON HABITAT_WITH_BULLET=OFF pip install .\n```\n\nIf you also need Bullet physics (e.g. manipulate mesh objects in a 3DGS scene), install with:\n\n```bash\nHABITAT_WITH_CUDA=ON HABITAT_WITH_BULLET=ON pip install .\n```\n\n### 🤖 Install Habitat-Lab (optional but recommended)\n\nHabitat-GS can be used standalone for rendering and scene inspection, but Habitat-Lab is typically used along with Habitat-GS for navigation task definition, training, and evaluation.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-lab.git\n```\n\n**IMPORTANT**: Before installing Habitat-Lab into the same environment, update its NumPy pin to avoid conflicts with Habitat-GS:\n\n- edit `habitat-lab\u002Fhabitat-lab\u002Frequirements.txt`\n- change `numpy==1.26.4` to `numpy>=2.0.0,\u003C2.4`\n\nThen install:\n\n```bash\ncd habitat-lab\npip install -e habitat-lab\npip install -e habitat-baselines\n```\n\n## 📦 Download GS Asset\n\nPlease refer to our 🤗 [huggingface dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRukawaY\u002Fgs_scenes) for more details. We provide five categories of assets:\n\n|   | Category | Size | Required For |\n|---|----------|------|-------------|\n| 1 | **GS Scenes** | ~27 GB | Everything — core scene assets |\n| 2 | **Gaussian Avatars** | ~3.1 GB | Dynamic avatar simulation |\n| 3 | **Habitat-Lab Nav Data** | ~30 MB | PointNav \u002F ImageNav \u002F ObjectNav training & evaluation |\n| 4 | **StreamVLN Data** | ~40 GB | VLN training & evaluation ([StreamVLN](https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FStreamVLN)) |\n| 5 | **Uni-NaVid Data** | ~25 GB | VLN training & evaluation ([Uni-NaVid](https:\u002F\u002Fgithub.com\u002Fjzhzhang\u002FUni-NaVid)) |\n\n## 🚀 Run Habitat-GS\n\n### 🧱 Prepare GS Assets\n\nHabitat-GS requires two categories of assets: GS scenes and GS avatars.\n\n\u003Cdetails>\n\u003Csummary>Click to expand: GS scene assets\u003C\u002Fsummary>\n\nFor a static GS scene without avatars, you only need:\n\n- a 3DGS render asset;\n- a Habitat-format `.navmesh` file for navigation. It defines the walkable area for agents.\n\n**IMPORTANT**: Habitat-GS recognizes GS stage assets by suffix. This means your scene file MUST end with `.gs.ply` or `.3dgs.ply`.\n\n**NOTE**: In our [provided dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRukawaY\u002Fgs_scenes), an outdoor scene may also have:\n\n- a background 3DGS asset (`background.gs.ply`) for far-field content like sky or distant geometry that was separated from the foreground during reconstruction. To render foreground and background together as a single stage, merge the two `.gs.ply` files into one with `tools_gs\u002Fmerge_background_gs.py`.\n\nIf you already have a high-quality NavMesh, you can use it directly. If not, we recommended the following pipeline to generate one from your GS scene:\n\n1. convert the GS scene to collision mesh with [3DGS-to-PC](https:\u002F\u002Fgithub.com\u002FLewis-Stuart-11\u002F3DGS-to-PC) or other methods;\n2. generate a Habitat NavMesh from that collision mesh with `tools_gs\u002Fgenerate_navmesh.py`.\n\nMinimal example:\n\n```bash\nconda activate habitat-gs\npython tools_gs\u002Fgenerate_navmesh.py \\\n  --input \u002Fpath\u002Fto\u002Fscene_collision_mesh.ply \\\n  --output \u002Fpath\u002Fto\u002Fscene.navmesh\n```\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Click to expand: GS avatar assets\u003C\u002Fsummary>\n\nEvery gaussian avatar needs two assets:\n\n- `canonical_gs.npz`: canonical gaussians exported from either [GaussianAvatar](https:\u002F\u002Fgithub.com\u002Faipixel\u002FGaussianAvatar) or [AnimatableGaussians](https:\u002F\u002Fgithub.com\u002Flizhe00\u002FAnimatableGaussians);\n- `driver.pkl`: scene-specific motion driver generated on the NavMesh using [GAMMA](https:\u002F\u002Fgithub.com\u002Fyz-cnsdqz\u002FGAMMA-release) method.\n\n1.1. Export canonical gaussians from GaussianAvatar\n\nRun this in the `GaussianAvatar` conda environment after installing the upstream repo:\n\n```bash\npython tools_gs\u002Fexport_gaussian_avatar_to_canonical.py \\\n  --posmap \u002Fpath\u002Fto\u002Fquery_posemap_.npz \\\n  --lbs-map \u002Fpath\u002Fto\u002Flbs_map_.npy \\\n  --joint-mat \u002Fpath\u002Fto\u002Fsmpl_cano_joint_mat.pth \\\n  --net-ckpt \u002Fpath\u002Fto\u002Fnet.pth \\\n  --out \u002Fpath\u002Fto\u002Fcanonical_gs.npz \\\n  --ga-root \u002Fpath\u002Fto\u002FGaussianAvatar\n```\n\n1.2. Export canonical gaussians from AnimatableGaussians\n\nSimilarly, run this in the `AnimatableGaussians` conda environment after installing the upstream repo:\n\n```bash\npython tools_gs\u002Fexport_animatable_to_canonical.py \\\n  --config \u002Fpath\u002Fto\u002Fconfig.yaml \\\n  --ckpt \u002Fpath\u002Fto\u002Fcheckpoints \\\n  --out \u002Fpath\u002Fto\u002Fcanonical_gs.npz \\\n  --anim-root \u002Fpath\u002Fto\u002FAnimatableGaussians \\\n  --smpl-model-path \u002Fpath\u002Fto\u002FAnimatableGaussians\u002Fsmpl_files\u002Fsmplx\n```\n\n2. Generate the motion driver on a scene NavMesh\n\n`driver.pkl` depends on the target scene because it is generated on the scene NavMesh. Run the following command in the `GAMMA` environment after installing GAMMA and make sure `habitat_sim` is importable in that environment. We provide two modes for generating the driver trajectory:\n\nAuto-sample a path by target length:\n\n```bash\npython tools_gs\u002Fgenerate_trajectory.py \\\n  --navmesh \u002Fpath\u002Fto\u002Fscene.navmesh \\\n  --output \u002Fpath\u002Fto\u002Fdriver.pkl \\\n  --path-length 6.0 \\\n  --smpl-model-path \u002Fpath\u002Fto\u002Fsmpl_files\u002Fsmplx \\\n  --gamma-root \u002Fpath\u002Fto\u002FGAMMA-release\n```\n\nSpecify start\u002Fend\u002Fseveral optional via points explicitly:\n\n```bash\npython tools_gs\u002Fgenerate_trajectory.py \\\n  --navmesh \u002Fpath\u002Fto\u002Fscene.navmesh \\\n  --output \u002Fpath\u002Fto\u002Fdriver.pkl \\\n  --start 0.0 0.0 0.0 \\\n  --via 1.0 0.0 -1.0 \\\n  --end 2.0 0.0 -2.0 \\\n  --smpl-model-path \u002Fpath\u002Fto\u002Fsmpl_files\u002Fsmplx \\\n  --gamma-root \u002Fpath\u002Fto\u002FGAMMA-release\n```\n\nThe generated `driver.pkl` contains precomputed `joint_mats` for rendering. Avatar rendering uses explicit Gaussians + CUDA LBS without neural forward pass at runtime. The `.pkl` also contains precomputed `proxy_capsules` used for NavMesh-level dynamic obstacle handling, guaranteeing agent cannot pass through gaussian avatars.\n\n\u003C\u002Fdetails>\n\n### 🗂️ Organize Scene Dataset\n\nHabitat-GS follows Habitat's standard dataset hierarchy:\n\n`scene_dataset_config.json` → `scene_instance.json` → `stage_config.json`\n\n\u003Cdetails>\n\u003Csummary>Click to expand: recommended dataset layout\u003C\u002Fsummary>\n\n```unicode\nplayroom\u002F\n├── playroom.scene_dataset_config.json\n├── configs\u002F\n│   ├── scenes\u002F\n│   │   └── playroom.scene_instance.json\n│   └── stages\u002F\n│       └── playroom_stage.stage_config.json\n├── stages\u002F\n│   └── playroom.gs.ply\n├── navmeshes\u002F\n│   └── playroom.navmesh\n└── avatars\u002F\n    └── actor01\u002F\n        ├── canonical_gs.npz\n        ├── driver.pkl\n        └── smplx\u002F\n```\n\n`playroom.scene_dataset_config.json`\n\n```json\n{\n  \"stages\": {\n    \"paths\": {\n      \".json\": [\"configs\u002Fstages\"]\n    }\n  },\n  \"scene_instances\": {\n    \"paths\": {\n      \".json\": [\"configs\u002Fscenes\"]\n    }\n  },\n  \"navmesh_instances\": {\n    \"playroom_navmesh\": \"navmeshes\u002Fplayroom.navmesh\"\n  }\n}\n```\n\n`configs\u002Fstages\u002Fplayroom_stage.stage_config.json`\n\n```json\n{\n  \"render_asset\": \"..\u002F..\u002Fstages\u002Fplayroom.gs.ply\"\n}\n```\n\n`configs\u002Fscenes\u002Fplayroom.scene_instance.json`\n\n```json\n{\n  \"stage_instance\": {\n    \"template_name\": \"playroom_stage\"\n  },\n  \"navmesh_instance\": \"playroom_navmesh\",\n  \"time_max\": 20.0,\n  \"time_loop\": true,\n  \"gaussian_avatars\": [\n    {\n      \"name\": \"actor01\",\n      \"canonical_gaussians\": \"..\u002F..\u002Favatars\u002Factor01\u002Fcanonical_gs.npz\",\n      \"driver\": \"..\u002F..\u002Favatars\u002Factor01\u002Fdriver.pkl\",\n      \"smpl_model_path\": \"..\u002F..\u002Favatars\u002Factor01\u002Fsmplx\",\n      \"smpl_type\": \"smplx\",\n      \"scale\": 1.0,\n      \"offset_y\": 1.0,\n      \"time_begin\": 0.0,\n      \"time_end\": 20.0\n    }\n  ]\n}\n```\n\nNotes:\n\n- If you only want a static GS scene, simply omit `gaussian_avatars`.\n- If your GS exporter already stores normalized quaternions and you observe spikes or blur in rendering results, add `\"norm_quaternion\": false` to the stage config.\n- Time related fields explanation:\n  - `time_max`: maximum simulation time in seconds. max(time_end) of all avatars by default. 0 if no avatars in scene.\n  - `time_loop`: whether to loop the simulation time. True by default.\n\n  For each avatar, you can also specify:\n  - `time_begin`: the simulation time when the avatar appears. 0 by default.\n  - `time_end`: the simulation time when the avatar disappears. 0.025 * num_frames by default. If time_end > time_begin + 0.025 * num_frames, the avatar will be static at the destination in remaining time.\n\n\u003C\u002Fdetails>\n\n### 🖥️ Run Interactive Viewer\n\n`examples\u002Fgaussian_viewer.py` is an interactive RGB\u002Fdepth viewer for Habitat-GS scenes. A display is required to run the viewer (for example, a local desktop session, X11 session, or VNC session). We provide two modes to run it:\n\nQuickly preview a GS scene:\n\n```bash\npython examples\u002Fgaussian_viewer.py --input \u002Fpath\u002Fto\u002Fplayroom.gs.ply\n```\n\nRun a full Habitat scene dataset with GS stage + NavMesh + gaussian avatars:\n\n```bash\npython examples\u002Fgaussian_viewer.py \\\n  --dataset \u002Fpath\u002Fto\u002Fplayroom.scene_dataset_config.json \\\n  --scene playroom\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: useful flags\u003C\u002Fsummary>\n\n- `--width` \u002F `--height` to change window size.\n- `--time` to start from a specific Gaussian time;\n- `--time-rate` to change playback speed;\n- `--enable-physics` to enable physics simulation if you need physics interaction with mesh objects in a GS scene. Requires building with Bullet.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>Click to expand: useful viewer controls\u003C\u002Fsummary>\n\n- `W\u002FS`: Move forward\u002Fbackward\n- `A\u002FD`: Move left\u002Fright\n- `Z\u002FX`: Move up\u002Fdown\n- `Arrow keys`: Rotate view\n- `TAB`: switch between RGB and depth\n- `SPACE`: play\u002Fpause Gaussian time\n- `H`: print help\n- `[` \u002F `]`: scrub backward\u002Fforward\n- `N`: toggle NavMesh visualization\n- `ESC`: exit viewer\n\n\u003C\u002Fdetails>\n\n## 🦞 HabitatAgent\n\n`HabitatAgent` is an LLM-powered agent system built on top of Habitat-GS, enabling\nnatural-language navigation, MCP tool integration, and interactive scene exploration\nvia a terminal chat interface.\n\nKey features: TUI chat, 16 MCP bridge tools, autonomous nav loops, scene-graph\nquery, SPL evaluation, rerun live visualization, third-person camera with\noptional visual robot mesh, multi-client support (Claude Code, Codex, OpenClaw).\n\n👉 **[Video Demo → Project Page](https:\u002F\u002Fzju3dv.github.io\u002Fhabitat-gs\u002F#agent)**\n\n👉 **[Full documentation → docs\u002Fhabitatagent.md](docs\u002Fhabitatagent.md)**\n\n```bash\n# Quick start (TUI + bridge)\npip install -r requirements-agent.txt\npython tools\u002Fhabitat_agent.py\n\n# With MCP server (for Claude Code \u002F Codex integration)\npython tools\u002Fhabitat_agent.py --mcp\n```\n\n## 🏋️ Train\u002FEval Navigation Agents on Habitat-GS\n\n### 🗺️ Point\u002FImage\u002FObject Goal Navigation on Habitat-Lab\n\nWe provide **one-click** training and evaluation pipelines for three navigation tasks on GS scenes using [Habitat-Lab](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-lab) with DDPPO:\n\n| Task | Goal | Sensors | Actions |\n|------|------|---------|---------|\n| **PointNav** | GPS coordinates | RGB, Depth, GPS, Compass | move_forward, turn_left, turn_right, stop |\n| **ImageNav** | Goal image | RGB, Depth, ImageGoal | move_forward, turn_left, turn_right, stop |\n| **ObjectNav** | Object category (e.g. \"bench\") | RGB, Depth, GPS, Compass, ObjectGoal | move_forward, turn_left, turn_right, look_up, look_down, stop |\n\n#### Prerequisites\n\n- Habitat-GS and Habitat-Lab installed in the `habitat-gs` conda environment (see [Install](#-install-habitat-gs))\n- GS data downloaded and placed under `data\u002Fscene_datasets\u002Fgs_scenes\u002F` (see [Data Layout](#data-layout))\n\n#### Data Layout\n\n\u003Cdetails>\n\u003Csummary>Click to expand: structure of our provided scene data and generated episodes:\u003C\u002Fsummary>\n\n```\ndata\u002Fscene_datasets\u002Fgs_scenes\u002F\n├── train.scene_dataset_config.json\n├── val.scene_dataset_config.json\n├── train\u002F                           # 110 training scenes\n│   ├── scene01\u002F                     #   self-reconstructed (55 scenes): full assets\n│   │   ├── scene01.gs.ply           #     foreground GS render asset\n│   │   ├── background.gs.ply        #     background GS asset (sky \u002F distant geometry; optional)\n│   │   ├── scene01.mesh.ply         #     collision mesh (will not be used unless physics is enabled)\n│   │   └── scene01.navmesh          #     navigation mesh\n│   ├── scene02\u002F ... scene55\u002F\n│   ├── interior_0007_840137\u002F        #   InteriorGS (55 scenes): 3DGS and navmesh only\n│   │   ├── interior_0007_840137.gs.ply\n│   │   └── interior_0007_840137.navmesh\n│   └── interior_0022_840117\u002F ... ×55\n├── val\u002F                             # 19 evaluation scenes\n│   ├── scene56\u002F ... scene65\u002F\n│   └── interior_0516_840045\u002F ... ×9\n├── configs\u002F                         # Hydra YAML configs (provided)\n│   ├── ddppo_pointnav_gs_train.yaml\n│   ├── ddppo_pointnav_gs_eval.yaml\n│   ├── ddppo_imagenav_gs_train.yaml\n│   ├── ddppo_imagenav_gs_eval.yaml\n│   ├── ddppo_objectnav_gs_train.yaml\n│   └── ddppo_objectnav_gs_eval.yaml\n└── episodes\u002F                        # generated by scripts below\n    ├── pointnav\u002F{train,val}\u002F\n    ├── imagenav\u002F{train,val}\u002F\n    └── objectnav\u002F{train,val}\u002F\n```\n\n\u003C\u002Fdetails>\n\n#### Step 1: Generate Episodes\n\nEpisode data must be generated before training and evaluating. We have provided 1000 episodes for each training scene and 100 episodes for each evaluation scene in our released dataset, but you can also generate your own episodes with the following commands:\n\n```bash\nconda activate habitat-gs\n\n# PointNav episodes\npython scripts_gs\u002Fgenerate_pointnav_episodes.py\n\n# ImageNav episodes\npython scripts_gs\u002Fgenerate_imagenav_episodes.py\n\n# ObjectNav episodes - outdoor categories on outdoor scenes (uses SAM + CLIP)\npython scripts_gs\u002Fgenerate_objectnav_episodes.py\n\n# ObjectNav episodes - indoor categories on interiorGS scenes (--indoor switch)\npython scripts_gs\u002Fgenerate_objectnav_episodes.py --indoor\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: ObjectNav episode generation details\u003C\u002Fsummary>\n\nObjectNav uses **SAM (Segment Anything)** + **CLIP (zero-shot classification)** to automatically detect and classify objects in GS scenes, then generates navigation episodes to those objects.\n\n**Required model checkpoints:**\n\n| Model | Path | Download |\n|-------|------|----------|\n| SAM ViT-B | `~\u002F.cache\u002Fsam_checkpoints\u002Fsam_vit_b_01ec64.pth` | [GitHub](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything#model-checkpoints) |\n| CLIP ViT-B-32 | `~\u002F.cache\u002Fclip_models\u002Fvit_b_32_laion400m.pt` | [GitHub](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002Fopen_clip\u002Freleases) |\n\n**Object categories:**\n\n| Range | Mode | Categories |\n|---|---|---|\n| ID 0–11 | outdoor (default) | car, bench, tree, street lamp, traffic sign, fire hydrant, trash can, bicycle, **potted plant**, barrier, statue, **chair** |\n| ID 12–21 | indoor (`--indoor`) | sofa, bed, dining table, toilet, sink, tv, refrigerator, bookshelf, cabinet, lamp |\n\n\u003C\u002Fdetails>\n\n#### Step 2: Train\n\n```bash\n# PointNav (default 5e8 steps)\nbash scripts_gs\u002Ftrain_pointnav.sh --output output\u002Fpointnav\n\n# ImageNav (default 2.5e9 steps)\nbash scripts_gs\u002Ftrain_imagenav.sh --output output\u002Fimagenav\n\n# ObjectNav (default 2.5e9 steps)\nbash scripts_gs\u002Ftrain_objectnav.sh --output output\u002Fobjectnav\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: training options\u003C\u002Fsummary>\n\nAll training scripts accept the same options:\n\n```\n--output DIR             Output directory for checkpoints and tensorboard (required)\n--num-envs N             Number of parallel environments per GPU (default: 4)\n--num-gpus N             Number of GPUs for DDPPO (default: 1)\n--total-steps N          Total training steps\n--num-ckpts N            Number of checkpoints to save (default: 100)\n--pretrained-ckpt PATH   Fine-tune from an existing .pth checkpoint\n                         (this sets ddppo.pretrained=True; critic is re-initialised)\n```\n\nExtra arguments are forwarded as Hydra overrides. Example with multi-GPU:\n\n```bash\nbash scripts_gs\u002Ftrain_objectnav.sh \\\n    --output output\u002Fobjectnav \\\n    --num-envs 8 \\\n    --num-gpus 4\n```\n\nFine-tuning from a previously trained checkpoint:\n\n```bash\nbash scripts_gs\u002Ftrain_pointnav.sh \\\n    --output output\u002Fpointnav_ft \\\n    --pretrained-ckpt output\u002Fpointnav\u002Fcheckpoints\u002Fckpt.99.pth\n```\n\nBy default this loads the **whole** policy (encoder + RNN + actor head) and re-initialises the critic head — appropriate when continuing on the same task or transferring to a closely related one. To customise the load behaviour, append Hydra overrides, e.g.:\n\n```bash\n# keep the trained critic\nbash scripts_gs\u002Ftrain_pointnav.sh \\\n    --output output\u002Fpointnav_ft \\\n    --pretrained-ckpt output\u002Fpointnav\u002Fcheckpoints\u002Fckpt.99.pth \\\n    habitat_baselines.rl.ddppo.reset_critic=False\n\n# load only the visual encoder backbone (e.g. transfer from PointNav to ImageNav)\nbash scripts_gs\u002Ftrain_imagenav.sh \\\n    --output output\u002Fimagenav_ft \\\n    --pretrained-ckpt output\u002Fpointnav\u002Fcheckpoints\u002Fckpt.99.pth \\\n    habitat_baselines.rl.ddppo.pretrained=False \\\n    habitat_baselines.rl.ddppo.pretrained_encoder=True \\\n    habitat_baselines.rl.ddppo.train_encoder=False    # freeze the encoder\n```\n\n> Note: optimizer state, step counter and seeds are **reset** — this is fine-tuning, not resume. To resume an interrupted run, just re-launch with the same `--output` directory; habitat-baselines auto-detects `.resume_state.pth` and continues seamlessly.\n\nOutput structure:\n\n```\noutput\u002Fobjectnav\u002F\n├── checkpoints\u002F    # .pth checkpoint files\n├── tb\u002F             # TensorBoard logs\n└── train.log       # training log\n```\n\n\u003C\u002Fdetails>\n\n#### Step 3: Evaluate\n\n```bash\n# PointNav\nbash scripts_gs\u002Feval_pointnav.sh --ckpt output\u002Fpointnav\u002Fcheckpoints\u002Fckpt.0.pth\n\n# ImageNav\nbash scripts_gs\u002Feval_imagenav.sh --ckpt output\u002Fimagenav\u002Fcheckpoints\u002Fckpt.0.pth\n\n# ObjectNav\nbash scripts_gs\u002Feval_objectnav.sh --ckpt output\u002Fobjectnav\u002Fcheckpoints\u002Fckpt.0.pth\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: evaluation options\u003C\u002Fsummary>\n\n```\n--ckpt PATH         Path to a .pth checkpoint file or a directory of checkpoints (required)\n--num-envs N        Number of parallel environments (default: 1)\n--video-dir DIR     Directory to save evaluation rollout videos (optional)\n```\n\nPass `--ckpt` a directory to evaluate all checkpoints in it sequentially.\n\n\u003C\u002Fdetails>\n\n### 🗣️ Vision-and-Language Navigation with StreamVLN\n\nWe also provide **one-click** training and evaluation pipelines for [StreamVLN](https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FStreamVLN) (a SOTA VLM-based VLN agent built on LLaVA-Video-7B-Qwen2 + SigLIP) on GS scenes. Unlike PointNav\u002FImageNav\u002FObjectNav which use Habitat-Lab + DDPPO, StreamVLN is trained via supervised fine-tuning of a vision-language model on demonstration trajectories.\n\n| Task | Goal | Sensors | Actions | Backbone |\n|------|------|---------|---------|----------|\n| **VLN-R2R** | Natural-language instruction (e.g. \"walk past the table and stop near the window\") | RGB, GPS, Compass | move_forward, turn_left, turn_right, stop | LLaVA-Video-7B-Qwen2 + SigLIP |\n\n#### Prerequisites\n\n- A **separate** `habitat-gs-streamvln` conda environment (see `Step 1` below). StreamVLN pins specific package versions (`transformers==4.45.1`, `accelerate==0.28.0`, etc.) that conflict with the main `habitat-gs` env.\n\n- StreamVLN cloned as a sibling of `habitat-gs\u002F`:\n\n  ```bash\n  cd \u002Fpath\u002Fto\u002Fparent\n  git clone https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FStreamVLN.git\n  ```\n\n- GS data downloaded and placed under `data\u002Fscene_datasets\u002Fgs_scenes\u002F` (same layout as the section above)\n\n#### Data Layout\n\n\u003Cdetails>\n\u003Csummary>Click to expand: VLN-specific files added on top of the base dataset layout:\u003C\u002Fsummary>\n\n```\ndata\u002Fscene_datasets\u002Fgs_scenes\u002F\n├── configs\u002F\n│   └── vln_gs_eval.yaml             # habitat config for VLN evaluation (provided)\n├── episodes\u002F\n│   └── vln\u002F                         # generated by generate_vln_episodes.py\n│       ├── train\u002Ftrain.json.gz      # 110 scenes × 200 episodes = 22,000 train\n│       └── val\u002Fval.json.gz          # 19 scenes × 50 episodes = 950 val\n└── trajectory_data\u002F\n    └── vln\u002F                         # generated by generate_vln_trajectories.py\n        ├── annotations.json         # StreamVLN-format action sequences\n        └── images\u002F{scene}_gs_{ep_id}\u002Frgb\u002F*.jpg   # rendered RGB frames\n```\n\n\u003C\u002Fdetails>\n\n#### Step 1: One-Time Setup\n\n`setup_vln.sh` creates the `habitat-gs-streamvln` conda environment cloned from `habitat-gs`, patches the StreamVLN repo for compatibility with Habitat-GS, installs StreamVLN Python dependencies, and downloads the LLaVA-Video-7B-Qwen2 (~15GB) and SigLIP (~3.3GB) checkpoints into `StreamVLN\u002Fcheckpoints\u002F`.\n\n```bash\nbash scripts_gs\u002Fsetup_vln.sh\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: what setup_vln.sh actually does\u003C\u002Fsummary>\n\nThe script applies `scripts_gs\u002Fstreamvln_compat.patch` to the StreamVLN clone (4 files, +42\u002F−13 lines) so that:\n\n- `streamvln\u002Fhabitat_extensions\u002Fmeasures.py` works with habitat-lab 0.3.3 (which removed `try_cv2_import`)\n- `streamvln\u002Fstreamvln_train.py` honors a `--vision_tower` CLI override for local model paths, fixes `low_cpu_mem_usage` for quantized loading, removes duplicate quantization kwargs, and adds a tokenizer-loading fallback for merged LoRA checkpoints\n- `streamvln\u002Fstreamvln_eval.py` explicitly loads the SigLIP vision tower (fixes `delay_load` issue) and auto-selects `flash_attention_2` with `eager` fallback\n- `llava\u002Fmodel\u002Fmultimodal_encoder\u002Fsiglip_encoder.py` passes `low_cpu_mem_usage=True` for `device_map`-based loading\n\nThe script is idempotent — re-running `setup_vln.sh` is safe. Available flags:\n\n```\n--skip-env          Skip creating the conda environment\n--skip-patch        Skip applying the compat patch\n--skip-download     Skip downloading model checkpoints\n--skip-deps         Skip installing Python dependencies\n--hf-token TOKEN    HuggingFace token for gated models\n```\n\n\u003C\u002Fdetails>\n\n#### Step 2: Generate Episodes and Trajectories\n\nVLN needs both **episodes** (start\u002Fgoal + natural-language instruction) and **trajectory data** (rendered RGB frames + ground-truth action sequences for SFT). We provide both in the released dataset, but you can also re-generate them:\n\n```bash\nconda activate habitat-gs-streamvln\n\n# 1. Generate VLN episodes (samples paths on the navmesh, renders waypoints with GS,\n#    and queries a VLM to produce the instruction text). Outputs R2RVLN-v1 format.\npython scripts_gs\u002Fgenerate_vln_episodes.py\n\n# 2. Generate StreamVLN trajectory data by replaying each episode with a greedy\n#    path follower that records (RGB frame, action) pairs.\npython scripts_gs\u002Fgenerate_vln_trajectories.py\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: episode\u002Ftrajectory generation details\u003C\u002Fsummary>\n\n`generate_vln_episodes.py` produces 200 episodes per training scene and 50 per evaluation scene by default, in the standard R2RVLN-v1 format consumed by habitat-lab's `vln_r2r` task. Instructions are generated by querying an OpenAI-compatible VLM endpoint with multi-view renderings along the path. Configure the endpoint via `OPENAI_BASE_URL` + `OPENAI_API_KEY` environment variables, or pass `--api-config \u002Fpath\u002Fto\u002Fconfig.json`.\n\n`generate_vln_trajectories.py` runs a greedy heading-based path follower (`forward_step=0.25m`, `turn_angle=15°`, `success_distance=0.25m` for the final waypoint) on each episode and records:\n\n- **annotations.json** — one entry per episode with the instruction, the action sequence (`-1`=initial, `0`=stop, `1`=forward, `2`=turn-left, `3`=turn-right), and per-step poses\n- **images\u002F{scene}\\_gs\\_{ep_id}\u002Frgb\u002F*.jpg** — RGB frame at each step, rendered through the GS pipeline\n\nThe script supports `--resume` to skip already-completed scenes, which is useful if generation is interrupted.\n\n\u003C\u002Fdetails>\n\n#### Step 3: Train\n\nBy default, the training script performs **standard full fine-tune** (vision tower + projector + LLM, matching StreamVLN official config). This requires **≥80 GB VRAM per GPU** (A100 80GB recommended) with the default DeepSpeed ZeRO-2 config. For consumer GPUs (RTX 3090\u002F4090), add `--lora` flag to enable memory-efficient LoRA training.\n\n```bash\n# ── Standard full fine-tune ──\n\n# Stage-1: SFT on demonstration trajectories\nbash scripts_gs\u002Ftrain_vln.sh --output output\u002Fvln_stage1 --stage stage-one\n\n# DAgger: retrain with DAgger-collected data\nbash scripts_gs\u002Ftrain_vln.sh --output output\u002Fvln_dagger --stage dagger \\\n    --ckpt output\u002Fvln_stage1\u002Fcheckpoint-XXX\n\n# Stage-2: co-training with auxiliary QA data\nbash scripts_gs\u002Ftrain_vln.sh --output output\u002Fvln_stage2 --stage stage-two \\\n    --ckpt output\u002Fvln_dagger\u002Fcheckpoint-XXX\n\n\n# ── LoRA mode ──\nbash scripts_gs\u002Ftrain_vln.sh --output output\u002Fvln_stage1 --stage stage-one --lora\nbash scripts_gs\u002Ftrain_vln.sh --output output\u002Fvln_dagger --stage dagger \\\n    --ckpt output\u002Fvln_stage1 --lora\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: training options\u003C\u002Fsummary>\n\n```\n--output DIR        Output directory for checkpoints (required)\n--stage STAGE       Training stage: stage-one | dagger | stage-two (default: stage-one)\n--num-gpus N        Number of GPUs (default: 1)\n--ckpt PATH         Base checkpoint (default: local LLaVA-Video-7B-Qwen2 for stage-one)\n--epochs N          Number of epochs (default: 1)\n--batch-size N      Per-device batch size (default: 2)\n--grad-accum N      Gradient accumulation steps (default: 2)\n--lr RATE           Learning rate (default: 2e-5)\n--num-frames N      Frames per sample (default: 32)\n--lora              Enable LoRA mode (see below)\n```\n\n**Standard mode (default):** full fine-tune of the entire model (~8 GB trainable parameters) with `anyres_max_9` image tiling, 32 frames, 32K context, and `torch.compile`. Multi-GPU training uses DeepSpeed ZeRO-2. Requires **≥80 GB VRAM per GPU** (A100 80GB recommended).\n\n**LoRA mode (`--lora`):** freezes the LLM backbone, trains only the MM projector + LoRA adapters (`r=64, alpha=128`, ~17 MB trainable), reduces frames to 4 and context to 2K. Fits on a **single 24GB RTX 4090**. When chaining stages with `--lora` (e.g. stage-one → dagger), the script auto-merges the previous LoRA checkpoint before applying new adapters.\n\n\u003C\u002Fdetails>\n\n#### Step 4: Evaluate\n\n```bash\nbash scripts_gs\u002Feval_vln.sh --ckpt output\u002Fvln_stage1\u002Fcheckpoint-XXX\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: evaluation options\u003C\u002Fsummary>\n\n```\n--ckpt PATH         Path to a trained StreamVLN checkpoint (required)\n--output DIR        Output directory for results (default: results\u002Fvln\u002F\u003Cckpt>_\u003Csplit>)\n--num-gpus N        Number of GPUs for parallel rollout (default: 1)\n--split SPLIT       Evaluation split: train | val (default: val)\n--num-frames N      Frames per sample (default: 32)\n--save-video        Save visualization videos\n```\n\nThe evaluator uses `data\u002Fscene_datasets\u002Fgs_scenes\u002Fconfigs\u002Fvln_gs_eval.yaml` (RGB+Depth at 640x480, hfov=79°, `forward_step=0.25m`, `turn_angle=15°`, `success_distance=3.0m`, `max_episode_steps=500`) and reports the standard VLN metrics: Success, SPL, Oracle Success, Distance-to-Goal, and Oracle Navigation Error.\n\n\u003C\u002Fdetails>\n\n### ✈️ Vision-and-Language Navigation with Uni-NaVid\n\nWe also support **one-click** training and evaluation pipeline for [Uni-NaVid](https:\u002F\u002Fgithub.com\u002Fjzhzhang\u002FUni-NaVid) (RSS 2025), a unified video-based vision-language-action model that handles multiple embodied navigation tasks (VLN, ObjectNav, EQA, etc.) with a single model. Uni-NaVid is built on Vicuna-7B + EVA-ViT-G with online token merging for efficient streaming inference.\n\n| Task | Goal | Sensors | Actions | Backbone |\n|------|------|---------|---------|----------|\n| **VLN-R2R** | Natural-language instruction | RGB (120° HFOV) | move_forward (0.25m), turn_left\u002Fright (30°), stop | Vicuna-7B + EVA-ViT-G |\n\n#### Prerequisites\n\n- A **separate** `habitat-gs-uni-navid` conda environment (see `Step 1` below).\n\n- Uni-NaVid repo cloned as a sibling of `habitat-gs\u002F`:\n\n  ```bash\n  cd \u002Fpath\u002Fto\u002Fparent\n  git clone https:\u002F\u002Fgithub.com\u002Fjzhzhang\u002FUni-NaVid.git\n  ```\n\n- GS data downloaded and placed under `data\u002Fscene_datasets\u002Fgs_scenes\u002F` (same layout as other tasks)\n\n#### Data Layout\n\n\u003Cdetails>\n\u003Csummary>Click to expand: Uni-NaVid-specific files added on top of the base dataset layout:\u003C\u002Fsummary>\n\n```\ndata\u002Fscene_datasets\u002Fgs_scenes\u002F\n├── configs\u002F\n│   └── vln_uninavid_gs_eval.yaml    # habitat config for Uni-NaVid eval (provided)\n├── episodes\u002F\n│   └── vln\u002F                         # shared with StreamVLN\n│       ├── train\u002Ftrain.json.gz      # 110 scenes × 200 episodes = 22,000 train\n│       └── val\u002Fval.json.gz          # 19 scenes × 50 episodes = 950 val\n└── trajectory_data\u002F\n    └── uninavid\u002F                    # generated by generate_uninavid_trajectories.py\n        ├── nav_gs_train.json        # Uni-NaVid conversation-format annotations\n        ├── nav_gs_val.json\n        └── nav_videos\u002F              # .mp4 trajectory videos\n            ├── scene01_gs_000000.mp4\n            ├── interior_0007_840137_gs_000000.mp4\n            └── ...\n```\n\n\u003C\u002Fdetails>\n\n#### Step 1: One-Time Setup\n\n`setup_uninavid.sh` creates the `habitat-gs-uni-navid` conda environment cloned from `habitat-gs`, installs Uni-NaVid's Python dependencies, applies `scripts_gs\u002Funinavid_compat.patch` to guarantee compatibility, and downloads model checkpoints (EVA-ViT-G ~3.5GB, Vicuna-7B ~13GB, Uni-NaVid pretrained ~14GB).\n\n```bash\nbash scripts_gs\u002Fsetup_uninavid.sh\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: setup options\u003C\u002Fsummary>\n\n```\n--skip-env          Skip creating the conda environment\n--skip-deps         Skip installing Python dependencies\n--skip-patch        Skip applying uninavid_compat.patch\n--skip-download     Skip downloading model checkpoints\n--proxy URL         HTTP proxy for downloads\n```\n\n\u003C\u002Fdetails>\n\n#### Step 2: Generate Trajectory Data\n\nUni-NaVid is trained on video trajectories in a conversation format. This script replays each VLN episode with a greedy path follower and records RGB frames as `.mp4` videos + action annotations in Uni-NaVid's conversation JSON format, which are both included in our released dataset.\n\n```bash\nconda activate habitat-gs-uni-navid\npython scripts_gs\u002Fgenerate_uninavid_trajectories.py\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: trajectory generation details\u003C\u002Fsummary>\n\nThe greedy controller uses `forward_step=0.25m`, `turn_angle=30°`, `HFOV=120°`. Each trajectory video is encoded at 10 fps. The output JSON uses Uni-NaVid's conversation format with `NAV_ID` prefix and `NAVIGATION_IDENTIFIER` string to trigger navigation-specific token processing during training.\n\n\u003C\u002Fdetails>\n\n#### Step 3: Train\n\nTwo-stage training is supported as standard Uni-NaVid. Requires **≥80 GB VRAM per GPU** (A100 80GB recommended).\n- **stage-1**: Fine-tune from Vicuna-7B (training from scratch, requires large dataset)\n- **stage-2**: Fine-tune from pre-trained Uni-NaVid checkpoint (recommended)\n\n```bash\nconda activate habitat-gs-uni-navid\n\n# Recommended: fine-tune from pre-trained Uni-NaVid\nbash scripts_gs\u002Ftrain_uninavid.sh --output output\u002Funinavid_gs --stage stage-2\n\n# Or train from scratch with Vicuna-7B\nbash scripts_gs\u002Ftrain_uninavid.sh --output output\u002Funinavid_gs --stage stage-1\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: training options\u003C\u002Fsummary>\n\n```\n--output DIR         Output directory for checkpoints (required)\n--stage STAGE        Training stage: stage-1|stage-2 (default: stage-2)\n--num-gpus N         Number of GPUs (default: 1)\n--ckpt PATH          Base checkpoint path (auto-selected per stage)\n--epochs N           Number of epochs (default: 1)\n--batch-size N       Per-device batch size (default: 8)\n--grad-accum N       Gradient accumulation steps (default: 2)\n--lr RATE            Learning rate (default: 1e-5)\n```\n\n\u003C\u002Fdetails>\n\n#### Step 4: Evaluate\n\nEvaluation runs online in a `habitat.Env` with the VLN-v0 task, matching the [NaVid-VLN-CE](https:\u002F\u002Fgithub.com\u002Fjzhzhang\u002FNaVid-VLN-CE) evaluation pattern. \n\n```bash\nconda activate habitat-gs-uni-navid\nbash scripts_gs\u002Feval_uninavid.sh --ckpt output\u002Funinavid_gs\u002F\u003Ccheckpoint>\n```\n\n\u003Cdetails>\n\u003Csummary>Click to expand: evaluation options\u003C\u002Fsummary>\n\n```\n--ckpt PATH         Path to trained Uni-NaVid checkpoint (required)\n--output DIR        Output directory for results (default: results\u002Funinavid\u002F\u003Cckpt>_\u003Csplit>)\n--num-gpus N        Number of GPUs for parallel evaluation (default: 1)\n--split SPLIT       Evaluation split: train|val (default: val)\n--save-video        Save evaluation rollout videos\n```\n\nThe evaluator reports: Success Rate (SR), SPL, Oracle Success (OSR), and Distance-to-Goal (DTG).\n\n\u003C\u002Fdetails>\n\n## 📚 Citation\n\nIf you find Habitat-GS useful in your research, please consider citing:\n\n```bibtex\n@misc{xia2026habitatgs,\n    title={Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting}, \n    author={Ziyuan Xia and Jingyi Xu and Chong Cui and Yuanhong Yu and Jiazhao Zhang and Qingsong Yan and Tao Ni and Junbo Chen and Xiaowei Zhou and Hujun Bao and Ruizhen Hu and Sida Peng},\n    year={2026},\n    eprint={2604.12626},\n    archivePrefix={arXiv},\n    primaryClass={cs.RO},\n    url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.12626}, \n}\n```\n","Habitat-GS 是一个基于动态高斯点云技术的高保真导航模拟器，专为具身智能导航任务设计。其核心功能包括支持3D高斯点云渲染、保持Habitat-Sim的标准场景数据抽象、NavMesh路径查找及与Habitat-Lab的集成等特性，这使得它能够提供更加逼真的视觉效果和更灵活的交互方式。特别适合于需要在复杂室内环境中进行高效导航算法测试与开发的应用场景，如机器人学研究、虚拟现实环境下的导航挑战等。项目采用C++开发，并以MIT许可证形式开源，便于学术界和工业界的进一步利用与扩展。",2,"2026-06-11 03:32:25","CREATED_QUERY"]