[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79891":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":15,"lastSyncTime":26,"discoverSource":27},79891,"FreeOcc","the-masses\u002FFreeOcc","the-masses","[RSS 2026] FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction",null,"Python",100,12,4,0,2,3,44.64,"Other",false,"main",true,[],"2026-06-12 04:01:25","\u003Ch1 align=\"center\">FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Frss2026.svg\" alt=\"RSS 2026\" width=\"150\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cstrong>\n    \u003Ca href=\"https:\u002F\u002Fthe-masses.github.io\u002F\">Zeyu Jiang\u003C\u002Fa>\u003Csup>* 1\u003C\u002Fsup>,\n    \u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=FZ3jPs4AAAAJ&hl=zh-CN\">Changqing Zhou\u003C\u002Fa>\u003Csup>* 1\u003C\u002Fsup>,\n    \u003Ca href=\"https:\u002F\u002Fxingxingzuo.github.io\u002F\">Xingxing Zuo\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>,\n    \u003Ca href=\"https:\u002F\u002Fchanghao-chen.github.io\u002F\">Changhao Chen\u003C\u002Fa>\u003Csup>1 ✉\u003C\u002Fsup>\n\u003C\u002Fstrong>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cstrong>\n    \u003Csup>1\u003C\u002Fsup>The Hong Kong University of Science and Technology (Guangzhou)\u003Cbr>\n    \u003Csup>2\u003C\u002Fsup>Mohamed bin Zayed University of Artificial Intelligence\n\u003C\u002Fstrong>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Csub>*Equal contribution. ✉Corresponding author.\u003C\u002Fsub>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cstrong>\n    \u003Ca href=\"https:\u002F\u002Fthe-masses.github.io\u002Ffreeocc-web\u002F\">Project Site\u003C\u002Fa> |\n    \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.28115\">Arxiv\u003C\u002Fa> |\n    Paper |\n    \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fthe-masses\u002FReplicaOcc\">Benchmark\u003C\u002Fa>\n\u003C\u002Fstrong>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fpipelinefinal.png\" width=\"85%\" \u002F>\n\u003C\u002Fp>\n\n**FreeOcc** is a training-free framework for embodied open-vocabulary occupancy prediction from monocular or RGB-D image sequences. Instead of relying on voxel-level occupancy annotations, semantic labels, or ground-truth camera poses, FreeOcc incrementally builds a globally consistent 3D occupancy map by coupling SLAM geometry, 3D Gaussian mapping, vision-language semantics, and probabilistic Gaussian-to-occupancy projection.\n\nThe pipeline maintains four scene representations in a streaming manner:\n\n- **SLAM backbone** estimates camera poses and sparse\u002Fsemi-dense geometry from monocular or RGB-D observations.\n- **Geometrically consistent 3D Gaussian mapping** constructs dense Gaussian maps with geometry-aware initialization and anchored Gaussian updates.\n- **Open-vocabulary semantic association** injects language-aligned features from off-the-shelf vision-language models into Gaussian primitives.\n- **Gaussian-to-occupancy projection** converts language-embedded Gaussians into dense voxel occupancy, enabling text-driven 3D semantic querying.\n\nFreeOcc is designed for annotation-free, pose-agnostic occupancy reasoning and supports open-vocabulary queries over the reconstructed 3D occupancy map.\n\n## News\n\n- [2026.05.07] We release the code and [benchmark](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fthe-masses\u002FReplicaOcc) for FreeOcc.\n- [2026.04.27] The paper FreeOcc was accepted to RSS 2026. Code will be released soon.\n\n## Environment\n\nThe main `freeocc` environment runs SLAM reconstruction, Gaussian mapping, occupancy evaluation, and `occ.npz` export. Mayavi visualization uses a separate environment; see [Visualization](#visualization).\n\nClone the repository with submodules, or initialize submodules after cloning:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fthe-masses\u002FFreeOcc.git\ncd FreeOcc\ngit submodule update --init --recursive\n```\n\nSystem packages:\n\n```bash\nsudo apt-get update\nsudo apt-get install -y build-essential git curl wget libopenexr-dev\n```\n\nCUDA extensions require a working CUDA toolkit with `nvcc`. The verified setup uses CUDA 12.8 and PyTorch `2.9.0+cu128`.\n\n```bash\nnvcc --version\n```\n\nIf CUDA is not installed at the default location, set `CUDA_HOME` before building:\n\n```bash\nexport CUDA_HOME=\u002Fpath\u002Fto\u002Fcuda\n```\n\nCreate the main environment:\n\n```bash\nconda env create -f environment.yaml\nconda activate freeocc\n```\n\nInstall PyTorch with CUDA 12.8:\n\n```bash\npip install --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128 \\\n  torch==2.9.0 torchvision==0.24.0 torchaudio==2.9.0\n```\n\nInstall PyTorch3D and Torch Scatter. With recent PyTorch\u002FCUDA versions, install PyTorch3D from source without its optional CUDA extensions:\n\n```bash\npip install fvcore iopath\nPYTORCH3D_NO_EXTENSION=1 \\\npip install --no-build-isolation --no-deps \\\n  \"git+https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fpytorch3d.git@stable\"\n\npip install torch-scatter -f https:\u002F\u002Fdata.pyg.org\u002Fwhl\u002Ftorch-2.9.0+cu128.html\n```\n\nInstall the Python runtime dependencies:\n\n```bash\npip install \\\n  hydra-core omegaconf tqdm termcolor ipdb \\\n  kornia faiss-cpu einops plyfile pyliblzfse \\\n  open3d opencv-python==4.9.0 opencv-python-headless==4.9.0 \\\n  glfw imgviz PyGLM PyOpenGL PyOpenGL-accelerate \\\n  plotly kaleido evo torchmetrics \\\n  ftfy==6.2.0 regex==2023.8.8 fsspec transformers==4.37.2 \\\n  openpyxl==3.1.2 huggingface_hub==0.23.0 safetensors==0.4.3 \\\n  timm==0.6.7 pycocotools easydict torchtyping\n```\n\nInstall OpenMMLab packages used by Trident:\n\n```bash\npip install -U openmim\npip install -U mmengine==0.10.7\nmim install \"mmcv==2.1.0\"\npip install mmsegmentation==1.2.2\n```\n\nIf `mim install \"mmcv==2.1.0\"` reports version\u002Fbuild incompatibilities or falls back to downloading a source tarball instead of a prebuilt wheel, please build the full `mmcv` package from source by following the official [MMCV installation guide](https:\u002F\u002Fmmcv.readthedocs.io\u002Fen\u002F2.x\u002Fget_started\u002Finstallation.html).\n\nBuild the required CUDA extensions:\n\n```bash\nexport TORCH_CUDA_ARCH_LIST=\"12.0\"\n\nPKG=droid_backends python setup.py install\nPKG=lietorch python setup.py install\nPKG=simple_knn python setup.py install\nPKG=diff_gaussian_rasterization python setup.py install\n\npushd src\u002Fgs2occ\u002Flocalagg_prob\npython setup.py build_ext --inplace\npopd\n```\n\nRun import checks:\n\n```bash\npython - \u003C\u003C'PY'\nimport torch\nprint(\"torch\", torch.__version__, \"cuda\", torch.version.cuda, \"available\", torch.cuda.is_available())\nimport droid_backends\nimport lietorch\nfrom simple_knn import _C as simple_knn_ext\nimport diff_gaussian_rasterization\nfrom src.gs2occ.localagg_prob.local_aggregate_prob import LocalAggregator\nfrom pytorch3d.transforms import quaternion_to_matrix\nimport mmcv, mmengine, mmseg\nprint(\"environment ok\")\nPY\n```\n\nCheck Trident with the project Trident path:\n\n```bash\nPYTHONPATH=thirdparty\u002FTrident:. python - \u003C\u003C'PY'\nfrom trident import Trident\nprint(\"trident ok\")\nPY\n```\n\n## Data\n\nFreeOcc expects RGB-D sequence folders and occupancy ground-truth folders. Ground-truth occupancy and poses are loaded only for evaluation-time Sim(3) alignment and metric computation, not for training or map construction. The exact download and preprocessing instructions for ScanNet will be added here.\n\nThe evaluation scripts expect each input scene to be addressable as:\n\n```text\n${DATA_ROOT}\u002F${SCENE}\u002F\n```\n\nDownload the ReplicaOcc benchmark from [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fthe-masses\u002FReplicaOcc). Replica should be organized as:\n\n```text\nReplica_OCC\u002F\n├── preprocessed\u002F\n│   ├── office0.npy\n│   ├── office1.npy\n│   └── ...\n├── global_occ_package\u002F\n│   ├── office0.pkl\n│   ├── office1.pkl\n│   └── ...\n└── sequences\u002F\n    ├── cam_params.json\n    ├── office0\u002F\n    │   ├── color\u002F\n    │   │   ├── 0.jpg\n    │   │   └── ...\n    │   ├── depth\u002F\n    │   │   ├── 0.png\n    │   │   └── ...\n    │   ├── pose\u002F\n    │   │   ├── 0.txt\n    │   │   └── ...\n    │   └── intrinsic\u002F\n    │       └── intrinsic_color.txt\n    ├── office1\u002F\n    ├── room0\u002F\n    └── ...\n```\n\nFor Replica, use:\n\n```text\nDATA_ROOT=\u002Fpath\u002Fto\u002FReplica_OCC\u002Fsequences\nSCENE_OCC_ROOT=\u002Fpath\u002Fto\u002FReplica_OCC\n```\n\nPrepare ScanNet as follows:\n\n1. Prepare `posed_images` and `gathered_data` following the [Occ-ScanNet dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fhongxiaoy\u002FOccScanNet), then place them under `scannet\u002Foccscannet\u002F`.\n2. Download `global_occ_package` and `streme_occ_new_package` from [EmbodiedOcc-ScanNet](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FYkiWu\u002FEmbodiedOcc-ScanNet), unzip them, and place them under `scannet\u002Fscene_occ\u002F`.\n3. Download the original ScanNet sequences from the official [ScanNet repository](https:\u002F\u002Fgithub.com\u002FScanNet\u002FScanNet.git). The extracted RGB-D sequences should be converted into the ScanNet-style `color\u002F`, `depth\u002F`, `pose\u002F`, and `intrinsic\u002F` layout shown below.\n\n```text\nscannet\u002F\n├── occscannet\u002F\n│   ├── gathered_data\u002F\n│   ├── posed_images\u002F\n│   ├── train_final.txt\n│   ├── train_mini_final.txt\n│   ├── test_final.txt\n│   └── test_mini_final.txt\n├── scene_occ\u002F\n│   ├── global_occ_package\u002F\n│   │   ├── scene0005_01.pkl\n│   │   └── ...\n│   ├── streme_occ_new_package\u002F\n│   │   ├── train\u002F\n│   │   └── test\u002F\n│   ├── train_online.txt\n│   ├── train_mini_online.txt\n│   ├── test_online.txt\n│   └── test_mini_online.txt\n└── sequences\u002F\n    ├── scans\u002F\n    │   ├── scene0005_01\u002F\n    │   │   └── scene0005_01.sens\n    │   └── ...\n    ├── test_online\u002F\n    │   ├── scene0005_01\u002F\n    │   │   ├── color\u002F\n    │   │   │   ├── 0.jpg\n    │   │   │   └── ...\n    │   │   ├── depth\u002F\n    │   │   │   ├── 0.png\n    │   │   │   └── ...\n    │   │   ├── pose\u002F\n    │   │   │   ├── 0.txt\n    │   │   │   └── ...\n    │   │   └── intrinsic\u002F\n    │   │       └── intrinsic_color.txt\n    │   └── ...\n    └── test_online_mini\u002F\n```\n\nFor ScanNet, use:\n\n```text\nDATA_ROOT=\u002Fpath\u002Fto\u002Fscannet200\u002Fsequences\u002Ftest_online\nSCENE_OCC_ROOT=\u002Fpath\u002Fto\u002Fscannet200\u002Fscene_occ\n```\n\nRealSense uses the same ScanNet-style RGB-D sequence layout, but `pose\u002F` is optional:\n\n```text\nrealsense\u002F\n└── datasets\u002F\n    ├── scene_name\u002F\n    │   ├── color\u002F\n    │   │   ├── 0.jpg or 0.png\n    │   │   └── ...\n    │   ├── depth\u002F\n    │   │   ├── 0.png\n    │   │   └── ...\n    │   ├── intrinsic\u002F\n    │   │   └── intrinsic_color.txt\n    │   ├── pose\u002F              # optional\n    │   │   ├── 0.txt\n    │   │   └── ...\n    │   └── meta.json          # optional\n    └── ...\n```\n\nFor RealSense, use:\n\n```text\nDATA_ROOT=\u002Fpath\u002Fto\u002Frealsense\u002Fdatasets\n```\n\nOutputs are written under:\n\n```text\n${OUTPUT_ROOT}\u002F${EXPNAME}\u002F\n```\n\nFor each evaluated scene, reconstruction outputs are saved to:\n\n```text\n${OUTPUT_ROOT}\u002F${EXPNAME}\u002F${SCENE}_${MODE}\u002F\n  mesh\u002Ffinal_${MODE}.ply       # final 3D Gaussian map\n  ${EXPNAME}.log               # per-scene SLAM\u002Fmapping log\n  config.yaml                  # resolved run config\n  .hydra\u002F                      # Hydra config metadata\n```\n\nOccupancy evaluation with `--dump_npz` writes Mayavi-ready files to:\n\n```text\n${OUTPUT_ROOT}\u002F${EXPNAME}\u002Focc_vis\u002F${SCENE}_${MODE}\u002F\n  occ.npz\n```\n\nReplica writes its summary log to:\n\n```text\n${OUTPUT_ROOT}\u002F${EXPNAME}\u002Feval_occ_replica_${MODE}.log\n```\n\nScanNet multi-GPU evaluation also writes retry and scene status logs to:\n\n```text\nlogs\u002F${EXPNAME}_\u003Ctimestamp>_${MODE}\u002F\n  summary.csv\n  success_scenes.txt\n  eval_occ_scannet.log\n  ${SCENE}.gpu${GPU}.attempt_${N}.log\n```\n\n## Evaluation\n\nThe main dataset entry points are:\n\n```text\nscripts\u002Feval\u002Freplica.sh\nscripts\u002Feval\u002Fscannet_multigpu.sh\nscripts\u002Feval\u002Frealsense.sh\n```\n\nScene lists live in:\n\n```text\nscripts\u002Feval\u002Fscenes\u002F\n```\n\nAvailable lists:\n\n```text\nreplica_all.txt\nscannet_16.txt\nscannet_all.txt\nrealsense_example.txt\n```\n\nOverride paths with environment variables. Use sample paths below as placeholders for your dataset layout.\n\nReplica reconstruction, occupancy evaluation, and `occ.npz` export:\n\n```bash\nconda activate freeocc\n\nDATA_ROOT=\u002Fpath\u002Fto\u002FReplica_OCC\u002Fsequences \\\nSCENE_OCC_ROOT=\u002Fpath\u002Fto\u002FReplica_OCC \\\nOUTPUT_ROOT=\u002Fpath\u002Fto\u002Foutputs \\\nMODE=rgbd EXPNAME=replica \\\nbash scripts\u002Feval\u002Freplica.sh\n```\n\nScanNet reconstruction, retry handling, occupancy evaluation, and `occ.npz` export:\n\n```bash\nconda activate freeocc\n\nDATA_ROOT=\u002Fpath\u002Fto\u002Fscannet200\u002Fsequences\u002Ftest_online \\\nSCENE_OCC_ROOT=\u002Fpath\u002Fto\u002Fscannet200\u002Fscene_occ \\\nOUTPUT_ROOT=\u002Fpath\u002Fto\u002Foutputs \\\nMODE=rgbd EXPNAME=scannet \\\nbash scripts\u002Feval\u002Fscannet_multigpu.sh 0,1,2,3\n```\n\nUse one GPU by passing a single device id:\n\n```bash\nbash scripts\u002Feval\u002Fscannet_multigpu.sh 0\n```\n\nTo run a subset of scenes:\n\n```bash\nSCENES=\"scene0005_01 scene0006_02\" \\\nMODE=rgbd EXPNAME=debug_scannet \\\nbash scripts\u002Feval\u002Fscannet_multigpu.sh 0\n```\n\nor:\n\n```bash\nSCENES_FILE=scripts\u002Feval\u002Fscenes\u002Fscannet_16.txt \\\nMODE=rgbd EXPNAME=scannet_16 \\\nbash scripts\u002Feval\u002Fscannet_multigpu.sh 0\n```\n\nSee [Data](#data) for the expected output directory layout.\n\nRealSense example:\n\n```bash\nconda activate freeocc\n\nDATA_ROOT=\u002Fpath\u002Fto\u002Frealsense\u002Fdatasets \\\nOUTPUT_ROOT=\u002Fpath\u002Fto\u002Foutputs \\\nMODE=rgbd EXPNAME=realsense_visualization \\\nbash scripts\u002Feval\u002Frealsense.sh\n```\n\nThe RealSense script reads intrinsics from `intrinsic\u002Fintrinsic_color.txt` and infers image size from the first color frame.\n\n## Visualization\n\nMayavi visualization runs in a separate conda environment because Mayavi\u002FVTK\u002FQt can conflict with the main SLAM environment.\n\nCreate the visualization environment:\n\n```bash\nconda create -n mayavi -c conda-forge python=3.10 mayavi vtk pyqt numpy pillow\nconda activate mayavi\n```\n\nSmoke check:\n\n```bash\npython - \u003C\u003C'PY'\nfrom mayavi import mlab\nimport numpy as np\nfrom PIL import Image\nprint(\"mayavi environment ok\")\nPY\n```\n\nIf Mayavi cannot choose a GUI backend on your machine, try:\n\n```bash\nexport ETS_TOOLKIT=qt\nexport QT_API=pyqt5\n```\n\nReplica:\n\n```bash\nconda activate mayavi\n\npython scripts\u002Fvis\u002Fvis_occ_replica.py \\\n  --npz \u002Fpath\u002Fto\u002Foutputs\u002Fours_visualization\u002Focc_vis\u002Foffice0_rgbd\u002Focc.npz \\\n  --which both \\\n  --names_txt src\u002Fscannet_utils\u002Freplica_name.txt \\\n  --save_legend\n```\n\nScanNet:\n\n```bash\nconda activate mayavi\n\npython scripts\u002Fvis\u002Fvis_occ_scannet.py \\\n  --npz \u002Fpath\u002Fto\u002Foutputs\u002Fembodied_scannet_all\u002Focc_vis\u002Fscene0005_01_rgbd\u002Focc.npz \\\n  --which both \\\n  --save_legend\n```\n\nOptions:\n\n```bash\n# Only visualize prediction.\n--which pred\n\n# Render occupied geometry without semantic colors.\n--geometry_only\n```\n\n## Citation\n\nIf you find this work useful, please consider citing:\n\n```bibtex\n@article{jiang2026freeocc,\n  title={FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction},\n  author={Jiang, Zeyu and Zhou, Changqing and Zuo, Xingxing and Chen, Changhao},\n  journal={arXiv preprint arXiv:2604.28115},\n  year={2026}\n}\n```\n\n## Acknowledgements\n\nWe gratefully acknowledge the excellent open-source repositories of [EmbodiedOcc](https:\u002F\u002Fgithub.com\u002Fykiwu\u002Fembodiedocc), [LegoOcc](https:\u002F\u002Fgithub.com\u002FJuIvyy\u002FLegoOcc), [NICE-SLAM](https:\u002F\u002Fgithub.com\u002Fcvg\u002Fnice-slam), [DROID-SLAM](https:\u002F\u002Fgithub.com\u002Fprinceton-vl\u002Fdroid-slam), [DROID-Splat](https:\u002F\u002Fgithub.com\u002Fchenhoy\u002Fdroid-splat), [Trident](https:\u002F\u002Fgithub.com\u002FYuHengsss\u002FTrident) and many other inspiring contributions from the community.\n","FreeOcc 是一个无需训练的框架，用于从单目或RGB-D图像序列中进行具身开放式词汇占用预测。其核心功能包括通过结合SLAM几何、3D高斯映射、视觉-语言语义和概率高斯到占用投影，逐步构建全局一致的3D占用图，而无需依赖体素级占用注释、语义标签或真实相机姿态。FreeOcc 适合在缺乏标注数据或相机姿态信息的情况下进行3D场景理解与重建，支持基于文本的3D语义查询，适用于机器人导航、虚拟现实以及增强现实等场景中的实时环境感知与交互。","2026-06-11 03:58:24","CREATED_QUERY"]