[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80750":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":14,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":27,"discoverSource":28},80750,"GenWildSplat","Vinayak-VG\u002FGenWildSplat","Vinayak-VG","[CVPR 2026] GenWildSplat: Generalizable Sparse-View 3D Reconstruction from Unconstrained Images",null,"Python",47,3,1,2,0,4,6,1.81,"MIT License",false,"main",true,[],"2026-06-12 02:04:06","# GenWildSplat: Generalizable Sparse-View 3D Reconstruction from Unconstrained Images\n\n[![Project Website](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGenWildSplat-Website-4CAF50?logo=googlechrome&logoColor=white)](https:\u002F\u002Fgenwildsplat.github.io\u002F)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-b31b1b?logo=arxiv&logoColor=b31b1b)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.28193)\n[![Code](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-Code-181717?logo=github&logoColor=white)](https:\u002F\u002Fgithub.com\u002FVinayak-VG\u002FGenWildSplat)\n\n[Vinayak Gupta](https:\u002F\u002Fvinayak-vg.github.io\u002F), [Chih-Hao Lin](https:\u002F\u002Fchih-hao-lin.github.io\u002F), [Shenlong Wang](https:\u002F\u002Fshenlong.web.illinois.edu\u002F), [Anand Bhattad](https:\u002F\u002Fanandbhattad.github.io\u002F), [Jia-Bin Huang](https:\u002F\u002Fjbhuang0604.github.io\u002F)\n\nCVPR 2026\n\n## Overview\n\nGenWildSplat is a feed-forward method for sparse-view 3D reconstruction from unconstrained \"in-the-wild\" images. From 2–6 unposed images with varying appearance and transient occluders, it produces a 3D Gaussian splat in roughly 3 seconds on a single A6000 GPU. The model jointly predicts camera poses, depth, and per-Gaussian parameters; a lightweight latent appearance encoder modulates view-dependent appearance while keeping geometry consistent, and a segmentation pathway suppresses transient objects. Renderings are then refined with [SyncFix](https:\u002F\u002Fgithub.com\u002FDeming77\u002FSyncFix) for multi-view consistency. The method is built on top of [AnySplat](https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FAnySplat) with a VGGT backbone.\n\n## Installation\n\nTested on Python 3.10, PyTorch 2.4.0, CUDA 12.4. Other PyTorch \u002F CUDA versions may work, but you'll need to swap the `gsplat` and `torch_scatter` wheels accordingly.\n\n### 1. Create the environment\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FVinayak-VG\u002FGenWildSplat.git\ncd GenWildSplat\n\nconda create -y -n genwildsplat python=3.10\nconda activate genwildsplat\n```\n\n### 2. Install GCC ≥ 9 (skip if your system already has it)\n\n`gsplat` falls back to JIT compilation, which needs GCC 9+. The easiest fix is a self-contained conda gcc:\n\n```bash\nconda install -c conda-forge -y gxx_linux-64=11 gcc_linux-64=11\n```\n\nThen point PyTorch's extension builder at it (add to your shell rc or [eval_script.sh](eval_script.sh)):\n\n```bash\nexport CC=$CONDA_PREFIX\u002Fbin\u002Fx86_64-conda-linux-gnu-gcc\nexport CXX=$CONDA_PREFIX\u002Fbin\u002Fx86_64-conda-linux-gnu-g++\n```\n\n### 3. Install PyTorch + Python deps\n\n```bash\npip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\npip install -r requirements.txt --find-links https:\u002F\u002Fdata.pyg.org\u002Fwhl\u002Ftorch-2.4.0+cu124.html\n```\n\nThe `--find-links` flag is required so pip pulls the prebuilt `torch_scatter+pt24cu124` wheel from PyG instead of trying to compile from source.\n\n### 4. Install SyncFix as a local package (no extra deps)\n\n```bash\npip install -e SyncFix --no-deps\n```\n\nThe `--no-deps` flag is critical — it makes `syncfix` importable without touching any of the carefully pinned versions from `requirements.txt`.\n\n## Checkpoints\n\n### GenWildSplat checkpoint\n\nDownload from these Google Drive links:\n- **[Main checkpoint](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1LNK4_NqZOjGw4OKt3cLrsRW0tH-pfnov\u002Fview?usp=sharing)** — contains `model.safetensors` (GenWildSplat weights) and `yolov8x-seg.pt` (YOLOv8 segmentation weights for transient-object masking)\n- **[Latent model](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1bb4Up7SNZ9lBTku4LGAJe49wE4bEVlBo\u002Fview)** — `latent_model.pth.tar` (lighting\u002Fappearance latent UNet init weights, used at training time)\n\nPlace all three files under [checkpoint\u002F](checkpoint\u002F) (alongside the existing `config.json`):\n\n```\ncheckpoint\u002F\n├── config.json\n├── model.safetensors\n├── yolov8x-seg.pt\n└── latent_model.pth.tar\n```\n\n> `latent_model.pth.tar` is only strictly required for training — at inference time its weights are overwritten by `model.safetensors`. If you skip it the encoder will print a one-line notice and continue.\n\n### SyncFix checkpoint\n\nThe diffusion-based refinement step needs the SyncFix model weights. Download them from SyncFix's [Google Drive folder](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1RWGIQig_rUqH8xJXpdIjiUNOKAX1A4qE?usp=sharing) and place the contents under [SyncFix\u002Fcheckpoint\u002F](SyncFix\u002Fcheckpoint\u002F) (create the folder if it doesn't exist):\n\n```\nSyncFix\u002Fcheckpoint\u002F\n├── config.yaml\n└── model.safetensors\n```\n\n> The eval scripts default to `--syncfix_ckpt SyncFix\u002Fcheckpoint`. Without these weights the refinement step will fail; pass `--no_refine` to skip it (raw AnySplat output only).\n\nAn example scene is provided at [examples\u002FOidor_Chapel](examples\u002FOidor_Chapel\u002F).\n\n## Running on Your Own Images\n\n### 1. Organize the scene\n\n```\n\u003Cscene_folder>\u002F\n└── images\u002F\n    ├── img_001.jpg\n    ├── img_002.jpg\n    └── ...\n```\n\n### 2. Generate transient-object masks\n\n```bash\npython3 create_masks.py\n```\n\nThis writes a `\u003Cscene_folder>\u002Fmasks\u002F` directory next to `images\u002F`. Tune the `--conf` flag (default `0.1`) inside `create_masks.py` to control how aggressively transient objects are suppressed — lower confidence masks more, higher confidence masks less.\n\n### 3. Run inference\n\nUse either of the eval scripts below. Both expect the `images\u002F` + `masks\u002F` layout from step 1.\n\n## Evaluation\n\nThe two evaluation scripts in [eval_script.sh](eval_script.sh) cover the two main use cases. Both load the same checkpoint; pick the one that matches what you want to produce.\n\nBy default, both scripts pipe their renderings through **SyncFix** for diffusion-based refinement, using the input images as references (best multi-view consistency).\n\n- `--no_ref` *(video script only)* — reference-free refinement. Use this if you want to **preserve the lighting\u002Fappearance** of the rendered video instead of pulling it back toward the reference images.\n- `--no_refine` — skip SyncFix entirely and keep the raw AnySplat rendering.\n\n### 1. Render an interpolated novel-view video\n\nWalks the predicted camera trajectory and writes `\u003Coutput_path>\u002Fvideo.mp4` (refined RGB) and `\u003Coutput_path>\u002Fdepth.mp4` (raw depth — never refined). The first 6 images (sorted naturally) are used as context views. All intermediate files are deleted after refinement.\n\n```bash\n# Default: refinement with reference images (best consistency)\npython src\u002Feval_nvs_video.py \\\n  --data_dir     examples\u002FOidor_Chapel \\\n  --output_path  eval_outputs\u002FOidor_Chapel \\\n  --ckpt_path    checkpoint\u002Fmodel.safetensors \\\n  --syncfix_ckpt SyncFix\u002Fcheckpoint\n```\n\n```bash\n# Reference-free refinement (preserves rendered lighting)\npython src\u002Feval_nvs_video.py \\\n  --data_dir     examples\u002FOidor_Chapel \\\n  --output_path  eval_outputs\u002FOidor_Chapel \\\n  --ckpt_path    checkpoint\u002Fmodel.safetensors \\\n  --syncfix_ckpt SyncFix\u002Fcheckpoint \\\n  --no_ref\n```\n\n```bash\n# No refinement — raw AnySplat output only\npython src\u002Feval_nvs_video.py \\\n  --data_dir     examples\u002FOidor_Chapel \\\n  --output_path  eval_outputs\u002FOidor_Chapel \\\n  --ckpt_path    checkpoint\u002Fmodel.safetensors \\\n  --no_refine\n```\n\n### 2. Render & score target views\n\nSplits the input into 6 context views + the rest as target views, renders the targets from the predicted Gaussian splat, and reports PSNR \u002F SSIM \u002F LPIPS to stdout. Refined views are written to `\u003Coutput_path>\u002Frefined\u002F`. All intermediate `gt\u002F` and `pred\u002F` folders are deleted after refinement.\n\n```bash\n# Default: refinement with reference images\npython src\u002Feval_nvs_tgt.py \\\n  --data_dir     examples\u002FOidor_Chapel \\\n  --output_path  eval_outputs\u002FOidor_Chapel \\\n  --ckpt_path    checkpoint\u002Fmodel.safetensors \\\n  --syncfix_ckpt SyncFix\u002Fcheckpoint\n```\n\n```bash\n# No refinement\npython src\u002Feval_nvs_tgt.py \\\n  --data_dir     examples\u002FOidor_Chapel \\\n  --output_path  eval_outputs\u002FOidor_Chapel \\\n  --ckpt_path    checkpoint\u002Fmodel.safetensors \\\n  --no_refine\n```\n\nYou can also run both via:\n\n```bash\nbash eval_script.sh\n```\n\n\u003C!--\n## Training\n\nSingle-node training on DL3DV (see [train.sh](train.sh) for the full command line):\n\n```bash\npython src\u002Fmain.py +experiment=dl3dv \\\n  trainer.num_nodes=1 trainer.max_steps=50000 \\\n  model.encoder.use_depth_cam_from_distill=true \\\n  model.encoder.train_latent_model=true \\\n  model.encoder.data_appearance=true \\\n  model.encoder.use_occlusions_masks=true \\\n  train.use_occlusions_masks=true \\\n  hydra.run.dir=outputs\u002Fgenwildsplat\u002F \\\n  dataset.dl3dv.roots=[DL3DV-10K\u002FDL3DV-10K-DiffRendAll]\n```\n\nFor multi-node, launch the same entry point under `torchrun` with your usual `--nnodes` \u002F `--nproc_per_node` \u002F `--rdzv-*` flags. -->\n\n## TODO\n\n- [x] Code release for inference & evaluation\n- [x] Interpolation video rendering script (with SyncFix refinement)\n- [x] Target-view rendering with PSNR \u002F SSIM \u002F LPIPS\n- [x] Pretrained checkpoint\n- [x] Codebase cleanup\n- [x] SyncFix integration (reference-based and reference-free)\n- [ ] Release of training code with reproducible recipes & dataset prep scripts\n- [ ] Hugging Face Spaces demo\n\n## Citation\n\n```\n@article{gupta2026genwildsplat,\n  title={Generalizable Sparse-View 3D Reconstruction from Unconstrained Images},\n  author={Gupta, Vinayak and Lin, Chih-Hao and Wang, Shenlong and Bhattad, Anand and Huang, Jia-Bin},\n  journal={CVPR},\n  year={2026}\n}\n```\n\n## Acknowledgements\n\nBuilt on top of [AnySplat](https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FAnySplat), [VGGT](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fvggt), and [SyncFix](https:\u002F\u002Fgithub.com\u002FDeming77\u002FSyncFix). Thanks also to [NoPoSplat](https:\u002F\u002Fgithub.com\u002Fcvg\u002FNoPoSplat), [CUT3R](https:\u002F\u002Fgithub.com\u002FCUT3R\u002FCUT3R), and [gsplat](https:\u002F\u002Fgithub.com\u002Fnerfstudio-project\u002Fgsplat).\n","GenWildSplat 是一种从无约束的“野外”图像中进行稀疏视角3D重建的前馈方法。该项目的核心功能包括从2到6张未定位的图像中生成3D高斯点云，并在单个A6000 GPU上大约3秒内完成，同时联合预测相机姿态、深度和每个高斯参数。此外，轻量级的潜在外观编码器调节视图依赖的外观并保持几何一致性，分割路径则抑制瞬态物体。最终渲染通过SyncFix进行多视图一致性优化。GenWildSplat基于AnySplat构建，并使用VGGT作为骨干网络。此项目适用于需要快速且高质量3D重建的应用场景，如虚拟现实、增强现实以及计算机视觉研究等。","2026-06-11 04:01:53","CREATED_QUERY"]