[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80741":3},{"id":4,"name":5,"fullName":6,"owner":5,"repo":5,"description":7,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":13,"stars7d":12,"stars30d":14,"stars90d":13,"forks30d":13,"starsTrendScore":13,"compositeScore":15,"rankGlobal":8,"rankLanguage":8,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":17,"hasPages":17,"topics":19,"createdAt":8,"pushedAt":8,"updatedAt":20,"readmeContent":21,"aiSummary":22,"trendingCount":13,"starSnapshotCount":13,"syncStatus":11,"lastSyncTime":23,"discoverSource":24},80741,"recuriosity","recuriosity\u002Frecuriosity","Code for the paper \"Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration\"",null,"Python",46,2,1,0,5,1.43,"Apache License 2.0",false,"main",[],"2026-06-12 02:04:06","# Remember to be Curious\n\nOfficial code for:\n\n> **Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration**  \n> Lily Goli, Justin Kerr, Daniele Reda, Alec Jacobson, Andrea Tagliasacchi, Angjoo Kanazawa  \n> *University of Toronto · UC Berkeley · Wayve · Vector Institute · Simon Fraser University*  \n> [[Project page]](https:\u002F\u002Frecuriosity.github.io) &nbsp;|&nbsp; [[arXiv]](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22814)\n\nA reinforcement learning agent that learns to explore 3D indoor scenes using only an RGB camera. The agent uses a sliding-window transformer policy (DINO visual encoder + KV-cached transformer backbone) and is trained with PPO, using real-time Gaussian Splatting (GSplat) reconstruction as its intrinsic reward signal.\n\n---\n\n## Overview\n\n- **Task**: Camera exploration in Habitat-Matterport 3D (HM3D) and Gibson scenes\n- **Policy**: DINO ViT-B\u002F8 + sliding-window transformer with KV cache\n- **Reward**: GSplat reconstruction quality (MSE between predicted and ground-truth novel views)\n- **Downstream tasks**: Apple picking, image-goal navigation (finetuned from the explore checkpoint)\n\n### Repository layout\n\n```\nmodules\u002F\n  agent\u002F          Policy and transformer architecture\n  environment\u002F    Habitat environment wrappers, GSplat, rendering\n  eval\u002F           Checkpoint evaluation scripts\n  ppo\u002F            PPO training scripts\n    ablations\u002F    Ablation variants (RNN, ICM, context-window ablations, etc.)\nscripts\u002F          Data download and split generation utilities\ndata\u002F\n  splits\u002F         Validation episode splits (HM3D, Gibson)\nmain.py           Dispatcher entrypoint\nenvironment.yml   Conda environment\nrequirements.txt  Python dependencies\n```\n\n---\n\n## Installation\n\n### Conda\n\n#### 1. Create the environment\n\n```bash\nconda env create -f environment.yml\nconda activate recuriosity\n```\n\n#### 2. Install PyTorch\n\nInstall the wheel matching your CUDA driver. Example for CUDA 12.4:\n\n```bash\npip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 \\\n    --extra-index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n```\n\nSee https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F for other driver versions.\n\n#### 3. Install fused-ssim (required before gsplat)\n\n```bash\nCUDA_HOME=$CONDA_PREFIX CUDA_ARCHITECTURES=\"75;80;89;90\" \\\npip install --no-build-isolation \\\n    \"git+https:\u002F\u002Fgithub.com\u002Frahul-goel\u002Ffused-ssim\"\n```\n\n#### 4. Install remaining dependencies (includes gsplat build)\n\n```bash\nCUDA_HOME=$CONDA_PREFIX TORCH_CUDA_ARCH_LIST=\"7.5;8.0;8.9;9.0\" MAX_JOBS=$(nproc) \\\npip install --no-build-isolation -r requirements.txt\n```\n\n#### 5. Install habitat-sim from source\n\nhabitat-sim must be built from source with CUDA and headless rendering support.\n\n```bash\npip install scikit-build-core pybind11\n\nCUDA_HOME=$CONDA_PREFIX \\\nHABITAT_BUILD_GUI_VIEWERS=OFF \\\nHABITAT_WITH_BULLET=ON \\\nHABITAT_WITH_CUDA=ON \\\nMAX_JOBS=$(nproc) \\\npip install --no-build-isolation -v \\\n    \"habitat-sim @ git+https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-sim.git\"\n```\n\nThis takes 10–60 minutes. See https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fhabitat-sim\u002Fblob\u002Fmain\u002FBUILD_FROM_SOURCE.md for details.\n\nVerify:\n```bash\npython -c \"import habitat_sim; print('habitat_sim ok')\"\n```\n\n---\n\n## Data\n\n### HM3D (Habitat-Matterport 3D)\n\n1. Register at https:\u002F\u002Faihabitat.org\u002Fdatasets\u002Fhm3d\u002F\n2. Download the HM3D train and validation splits (GLB + navmesh)\n3. Extract to a local directory.\n\nPass the path via `--data_root` in all training and eval commands. Expected layout:\n\n```\n\u003Cdata_root>\u002Fhm3d\u002Fhm3d_glb\u002F\u003Cscene_id>\u002F\u003Cscene>.glb\n\u003Cdata_root>\u002Fhm3d\u002Fhm3d_nav\u002F\u003Cscene_id>\u002F\u003Cscene>.navmesh\n\u003Cdata_root>\u002Fhm3d-val\u002Fhm3d_glb\u002F\u003Cscene_id>\u002F\u003Cscene>.glb\n\u003Cdata_root>\u002Fhm3d-val\u002Fhm3d_nav\u002F\u003Cscene_id>\u002F\u003Cscene>.navmesh\n```\n\n### Gibson (optional — for cross-dataset generalization eval)\n\n1. Request access and download the Gibson dataset (Habitat-compatible GLBs) via the [Gibson Dataset request form](https:\u002F\u002Fdocs.google.com\u002Fforms\u002Fd\u002Fe\u002F1FAIpQLScWlx5Z1DM1M-wTSXaa6zV8lTFkPmTHW1LqMsoCBDWsTDjBkQ\u002Fviewform).\n2. Extract to a local directory and pass the path via `--gibson_root` in eval commands.\n\nGibson scenes follow the same GLB format as HM3D. The expected directory layout (matching the episode split paths) is:\n\n```\n$GIBSON_SCENE_ROOT\u002F\n  Adrian.glb\n  Annawan.glb\n  ...\n```\n\n### Validation episode splits\n\nPre-generated splits are included under `data\u002Fsplits\u002F`:\n\n| File | Task | Episodes |\n|------|------|----------|\n| `data\u002Fsplits\u002Fhm3d\u002Fval\u002Fval.json.gz` | Exploration | 200 (100 scenes × 2 starts) |\n| `data\u002Fsplits\u002Fhm3d\u002Fval\u002Fval_apples.json` | Apple picking | 200 |\n| `data\u002Fsplits\u002Fhm3d\u002Fval\u002Fval_image_goal.json` | Image-goal navigation | 200 |\n| `data\u002Fsplits\u002Fgibson\u002Fval\u002Fval_gibson_bigisland.json.gz` | Exploration (Gibson) | 86 (86 scenes × 1 start) |\n\n---\n\n## Pretrained Checkpoints\n\nDownload the pretrained checkpoints from [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Flily-goli\u002Frecuriosity):\n\n| File | Description |\n|------|-------------|\n| [`explorer.pt`](https:\u002F\u002Fhuggingface.co\u002Flily-goli\u002Frecuriosity\u002Fresolve\u002Fmain\u002Fexplorer.pt) | Main exploration policy, trained on HM3D |\n| [`apple_finetuned.pt`](https:\u002F\u002Fhuggingface.co\u002Flily-goli\u002Frecuriosity\u002Fresolve\u002Fmain\u002Fapple_finetuned.pt) | Apple-picking fine-tune |\n| [`image_goal_finetuned.pt`](https:\u002F\u002Fhuggingface.co\u002Flily-goli\u002Frecuriosity\u002Fresolve\u002Fmain\u002Fimage_goal_finetuned.pt) | Image-goal navigation fine-tune |\n\nPass the checkpoint path via `--checkpoint-path` (eval) or `--base_ckpt` \u002F `--weights_only_ckpt` (fine-tuning). The commands below use `checkpoints\u002Fexplore.pt` as an example path.\n\n---\n\n## Training\n\nAll training scripts are invoked via `main.py`. Multi-GPU training uses `torchrun`.\n\nPass the HM3D data directory via `--data_root` (the directory containing `hm3d\u002F` and `hm3d-val\u002F`). When using Docker via `make`, this is handled automatically by the Makefile.\n\n### W&B logging\n\n```bash\nexport WANDB_API_KEY=your_key_here\n# To disable: export WANDB_MODE=disabled\n```\n\n### Exploration (main model)\n\nTo train run the following command:\n\n```bash\n# Single GPU\npython main.py --script explore_no_pose \\\n    --data_root \u002Fpath\u002Fto\u002Fhm3d \\\n    --num_envs 4 \\\n    --logdir runs \\\n    --checkpoint_path checkpoints\u002Fexplore.pt\n\n# Multi-GPU (8× H100)\ntorchrun --standalone --nproc_per_node=8 main.py --script explore_no_pose \\\n    --data_root \u002Fpath\u002Fto\u002Fhm3d \\\n    --num_envs 72 \\\n    --logdir runs \\\n    --checkpoint_path checkpoints\u002Fexplore.pt\n```\n\nTraining runs on 8× H100s; with `--num_envs 72` it takes ~ 3 days on 80 GB GPUs. For a lower GPU memory usage, set `--num_envs 32` (~6 days; this is the configuration used for the released checkpoint).\n\nResume from a checkpoint: `--base_ckpt checkpoints\u002Fexplore.pt`\n\nKey hyperparameters (all have sensible defaults):\n\n| Flag | Default | Description |\n|------|---------|-------------|\n| `--num_envs` | 4 | Number of parallel environments (use 32 or 72 for multi-GPU) |\n| `--roll_length` | 1024 | Steps per rollout |\n| `--learning_rate` | 1e-5 | Adam learning rate |\n| `--attn_window` | 64 | Sliding attention window size (frames) |\n| `--nerf_iters` | 10 | GSplat optimization steps per rollout step |\n| `--ent_coef_start` | 0.1 | Initial entropy coefficient |\n\n### Ablations\n\n```bash\n# RNN backbone\ntorchrun --standalone --nproc_per_node=8 main.py --script ablation_rnn ...\n\n# RNN + ICM\ntorchrun --standalone --nproc_per_node=8 main.py --script ablation_rnn_icm ...\n\n# Transformer + ICM\ntorchrun --standalone --nproc_per_node=8 main.py --script ablation_icm ...\n\n# Context length ablation: --context_window caps how many past frame tokens the agent receives as input (1, 4, or 16)\ntorchrun --standalone --nproc_per_node=8 main.py --script ablation_ctx16 --context_window 1 ...\n\n# Forgetful GSplat (sliding scene window)\ntorchrun --standalone --nproc_per_node=8 main.py --script ablation_forgetful ...\n```\n\n### Apple-picking finetune\n\nFinetune from the pretrained exploration checkpoint:\n\n```bash\ntorchrun --standalone --nproc_per_node=8 main.py --script apples_no_pose \\\n    --data_root \u002Fpath\u002Fto\u002Fhm3d \\\n    --weights_only_ckpt checkpoints\u002Fexplore.pt \\\n    --num_envs 72 \\\n    --logdir runs \\\n    --checkpoint_path checkpoints\u002Fapples.pt\n```\n\n### Image-goal navigation finetune\n\n```bash\ntorchrun --standalone --nproc_per_node=8 main.py --script image_goal_no_pose \\\n    --data_root \u002Fpath\u002Fto\u002Fhm3d \\\n    --weights_only_ckpt checkpoints\u002Fexplore.pt \\\n    --learning_rate 1e-6 \\\n    --num_envs 72 \\\n    --logdir runs \\\n    --checkpoint_path checkpoints\u002Fimage_goal.pt\n```\n\n---\n\n## Evaluation\n\nEvaluation runs the policy on fixed pre-generated episodes and computes surface coverage completeness at 0.05 m threshold. Results are written to `eval_outputs\u002F` and logged to W&B.\n\nThe `--eval-hole-fix` and `--eval-hole-fix-force-white` flags patch scene meshes by adding a white material to backfaces in areas where the mesh has holes. Without this, Habitat disables backface rendering in those regions, leading to mismatched RGB rendering and collision detection in hole areas during evaluation.\n\n### Exploration on HM3D\n\n```bash\npython main.py --script eval --data_root \u002Fpath\u002Fto\u002Fhm3d \\\n    --ppo-module modules.ppo.train_ppo_explore_no_pose \\\n    --checkpoint-path checkpoints\u002Fexplore.pt \\\n    --episodes-json data\u002Fsplits\u002Fhm3d\u002Fval\u002Fval.json.gz \\\n    --eval-hole-fix --eval-hole-fix-force-white \\\n    --output-dir eval_outputs\u002F\n```\n\nMulti-GPU (episodes are distributed across GPUs):\n\n```bash\ntorchrun --standalone --nproc_per_node=8 main.py --script eval --data_root \u002Fpath\u002Fto\u002Fhm3d \\\n    --ppo-module modules.ppo.train_ppo_explore_no_pose \\\n    --checkpoint-path checkpoints\u002Fexplore.pt \\\n    --episodes-json data\u002Fsplits\u002Fhm3d\u002Fval\u002Fval.json.gz \\\n    --eval-hole-fix --eval-hole-fix-force-white \\\n    --output-dir eval_outputs\u002F\n```\n\n### Exploration on Gibson\n\n```bash\npython main.py --script eval \\\n    --data_root \u002Fpath\u002Fto\u002Fhm3d \\\n    --gibson_root \u002Fpath\u002Fto\u002Fgibson \\\n    --ppo-module modules.ppo.train_ppo_explore_no_pose \\\n    --checkpoint-path checkpoints\u002Fexplore.pt \\\n    --episodes-json data\u002Fsplits\u002Fgibson\u002Fval\u002Fval_gibson_bigisland.json.gz \\\n    --eval-hole-fix --eval-hole-fix-force-white \\\n    --output-dir eval_outputs\u002F\n```\n\n### Apple picking\n\n```bash\npython main.py --script eval_apples --data_root \u002Fpath\u002Fto\u002Fhm3d \\\n    --ppo-module modules.ppo.train_ppo_apples_no_pose \\\n    --checkpoint-path checkpoints\u002Fapples.pt \\\n    --episodes-json data\u002Fsplits\u002Fhm3d\u002Fval\u002Fval_apples.json \\\n    --eval-hole-fix --eval-hole-fix-force-white \\\n    --output-dir eval_outputs\u002F\n```\n\n### Image-goal navigation\n\n```bash\npython main.py --script eval_image_goal --data_root \u002Fpath\u002Fto\u002Fhm3d \\\n    --ppo-module modules.ppo.train_ppo_image_goal_no_pose \\\n    --checkpoint-path checkpoints\u002Fimage_goal.pt \\\n    --episodes-json data\u002Fsplits\u002Fhm3d\u002Fval\u002Fval_image_goal.json \\\n    --eval-hole-fix --eval-hole-fix-force-white \\\n    --output-dir eval_outputs\u002F\n```\n\n---\n\n## Active Mapping Baselines\n\nCode and instructions for running the active mapping baselines (ANS-RGB, ANS-Depth, OccAnt-RGB, OccAnt-RGBD) will be released in a future update.\n\n---\n\n## Fix for Common Issues\n\n| Issue | Cause \u002F Fix |\n|-------|-------------|\n| `FileNotFoundError: No GLBs found` | HM3D data missing or `--data_root` path incorrect |\n| `ModuleNotFoundError: habitat_sim` | habitat-sim not installed; build from source (see above) |\n| `Unable to create windowless context` \u002F `unable to find CUDA device 0 among EGL devices` | Container needs `NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics`. The Makefile sets this automatically. Also requires `--gpus all` and the nvidia-container-toolkit EGL setup on the host. |\n| `torch.hub` download failure (DINO) | DINOv2 weights are downloaded on first run; ensure internet access or pre-cache with `torch.hub.load('facebookresearch\u002Fdinov2', 'dinov2_vitb14')` |\n| gsplat build fails | Ensure `TORCH_CUDA_ARCH_LIST` is set and matches your GPU |\n| habitat-sim build fails | Ensure `cmake >= 3.14`, `libegl1-mesa-dev`, `libgl1-mesa-dev` are installed |\n| OOM with many environments | Reduce `--num_envs` (32 is safe for 4× H100 80GB) |\n\n---\n\n## Citation\n\n```bibtex\n@article{goli2026recuriosity,\n  title   = {Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration},\n  author  = {Goli, Lily and Kerr, Justin and Reda, Daniele and Jacobson, Alec and Tagliasacchi, Andrea and Kanazawa, Angjoo},\n  journal = {arXiv preprint arXiv:2605.22814},\n  year    = {2026},\n  url     = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22814},\n}\n```\n","该项目旨在开发一个能够仅使用RGB相机探索3D室内场景的强化学习代理。核心功能包括基于DINO视觉编码器和KV缓存的滑动窗口变换器策略，并通过PPO算法进行训练，利用实时高斯点云重建作为内在奖励信号。技术特点在于其创新性地结合了先进的视觉编码技术和高效的强化学习方法，以提高探索效率和准确性。适用于需要在复杂3D环境中执行任务的场景，如苹果采摘或基于图像的目标导航等下游任务，在这些领域中展示出强大的泛化能力和实用性。","2026-06-11 04:01:51","CREATED_QUERY"]