[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82722":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":12,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},82722,"dvlt","nv-tlabs\u002Fdvlt","nv-tlabs","Official implementation of Déjà View: Looping Transformers for Multi-View 3D Reconstruction",null,"Python",304,13,1,3,0,111,191,61,3.44,"Other",false,"main",[],"2026-06-12 02:04:27","\u003Cdiv align=\"center\">\n\u003Ch1>Déjà View: Looping Transformers for Multi-View 3D Reconstruction\u003C\u002Fh1>\n\n\u003Ca href=\"https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdvl\u002Fprojects\u002Fdvlt\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-1f72b1.svg\" alt=\"Project Page\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.30215\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.30215-b31b1b.svg\" alt=\"arXiv\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002Fdvlt\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-yellow\" alt=\"Hugging Face\">\u003C\u002Fa>\n\n**[NVIDIA](https:\u002F\u002Fwww.nvidia.com\u002F)** &nbsp;&nbsp;&nbsp; **[University of Modena and Reggio Emilia](https:\u002F\u002Fwww.unimore.it\u002Fit)** &nbsp;&nbsp;&nbsp; **[University of Toronto](https:\u002F\u002Fwww.utoronto.ca\u002F)** &nbsp;&nbsp;&nbsp; **[ETH Zurich](https:\u002F\u002Fethz.ch\u002F)**\n\n[Alessandro Burzio*](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdvl\u002Fauthor\u002Falessandro-burzio\u002F), [Tobias Fischer*](https:\u002F\u002Ftobiasfshr.github.io\u002F), [Sven Elflein](https:\u002F\u002Fselflein.github.io\u002F), [Qunjie Zhou](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdvl\u002Fauthor\u002Fqunjie-zhou\u002F), [Riccardo de Lutio](https:\u002F\u002Friccardodelutio.github.io\u002F), [Jiawei Ren](https:\u002F\u002Fjiawei-ren.github.io\u002F), [Jiahui Huang](https:\u002F\u002Fhuangjh-pub.github.io\u002F), [Shengyu Huang](https:\u002F\u002Fshengyuh.github.io\u002F), [Marc Pollefeys](https:\u002F\u002Fpeople.inf.ethz.ch\u002Fmarc.pollefeys\u002F), [Laura Leal-Taixé](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdvl\u002Fauthor\u002Flaura-leal-taixe\u002F), [Zan Gojcic+](https:\u002F\u002Fzgojcic.github.io\u002F), [Haithem Turki+](https:\u002F\u002Fhaithemturki.com\u002F)\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fnvidia-hq-dvlt.gif\" alt=\"Déjà View demo\" width=\"100%\">\n\u003C\u002Fp>\n\n## Overview\n\nDéjàView (DVLT) is a recurrent transformer for multi-view 3D reconstruction. It\nloops a shared block of frame\u002Fglobal attention with discrete depth indexing,\nproducing per-pixel rays, depth, confidence, and camera poses from an unordered\nset of images. Trained once, the number of refinement steps `K` becomes an\ninference-time compute knob, matching or outperforming substantially larger\nfeed-forward baselines at a fraction of their parameters.\n\nThis repository contains:\n\n- The DVLT model + four configurable ablations (vanilla, decoupled blocks,\n  no `s_out` token, no depth-scaling).\n- Evaluation wrappers for five baselines: VGGT, VGGT-Omega, Depth-Anything-3,\n  MapAnything, and Pi3. Each wrapper imports the upstream package (installed\n  separately — see [INSTALL.md](docs\u002FINSTALL.md)).\n- A training stack built on `accelerate` + Hydra, with optional W&B logging.\n- A Stage-2 fine-tune recipe for the depth-conv head.\n- Rerun-based visualization tools.\n\n## Release status\n\n- [x] Inference code\n- [x] Model weights\n- [x] Evaluation code\n  - [x] eval datasets preprocess and loaders\n- [x] Training code\n  - [x] ScanNet++ training dataset loader\n  - [ ] other training dataset loaders\n\n## Quickstart\n\n### Install\n\nSee [docs\u002FINSTALL.md](docs\u002FINSTALL.md). The short version:\n\n```bash\nconda create -n dvlt python=3.12 && conda activate dvlt\nconda install pytorch=2.5.1 torchvision pytorch-cuda=12.4 -c pytorch -c nvidia -c conda-forge\npip install -e .[all]\n```\n\n### Quick setup\n\nQuick example script:\n\n```python\nimport torch\nfrom accelerate import Accelerator\n\nfrom dvlt.model.dvlt.model import DVLT\nfrom dvlt.util.preprocess import load_sequence, preprocess_images\n\ncheckpoint_path = \"nvidia\u002Fdvlt\"  # local dir, HTTPS URL, or HF Hub repo id\n# load_sequence accepts a directory, a single video, or an explicit list of files.\ninput_path = \"path\u002Fto\u002Fscene_dir\"\n# Or: input_path = \"path\u002Fto\u002Fclip.mp4\"\n# Or: from glob import glob; input_path = sorted(glob(\"path\u002Fto\u002Fscene_dir\u002F*.png\"))\n\naccelerator = Accelerator(mixed_precision=\"bf16\")\n\nmodel = DVLT(img_size=504)\nmodel.load_pretrained(checkpoint_path, strict=True)\nmodel.setup_test(accelerator)\n\n_, frames = load_sequence(input_path)\nbatch = preprocess_images(frames, img_size=504, patch_size=14, device=accelerator.device)\n\nwith torch.no_grad(), accelerator.autocast():\n    predictions = model.predict(batch, accelerator)\n\ncameras = predictions[\"cameras\"][0]            # Cameras object with shape [S]\nextrinsics_c2w = cameras.camera_to_worlds       # (S, 3, 4) — OpenCV convention [R | t]\nintrinsics = cameras.get_intrinsics_matrices()  # (S, 3, 3)\n\ndepths = predictions[\"depths\"][0]              # (S, H, W)\nworld_points = predictions[\"world_points\"][0]  # (S, H, W, 3)\n```\n\n### Train\n\n```bash\n# Single-GPU\npython -m dvlt.scripts.train --config-name dvlt-large data=scannetpp\n\n# Multi-GPU (4 GPUs)\naccelerate launch --num-processes 4 -m dvlt.scripts.train --config-name dvlt-large data=scannetpp\n\n# Resume\npython -m dvlt.scripts.train \\\n    --config-dir=outputs\u002F\u003Crun> \\\n    --config-name=config.yaml \\\n    trainer.resume_from_checkpoint=latest\n```\n\n### Evaluate\n\n`benchmark_lite` (DTU, ETH3D, 7Scenes) is a convenience benchmark over the\ndatasets that don't require heavy preprocessing; the full `benchmark` adds\n[ScanNet++](src\u002Fdvlt\u002Fscripts\u002Fpreprocess\u002Fpreprocess_scannetpp.md) and\n[NuScenes](src\u002Fdvlt\u002Fscripts\u002Fpreprocess\u002Fpreprocess_nuscenes.md).\n\n```bash\npython -m dvlt.scripts.test --config-name dvlt data=benchmark\n# multi-GPU: accelerate launch --num-processes \u003CN> -m dvlt.scripts.test --config-name dvlt data=benchmark\npython -m dvlt.scripts.test --config-name dvlt data=benchmark_lite\n# multi-GPU: accelerate launch --num-processes \u003CN> -m dvlt.scripts.test --config-name dvlt data=benchmark_lite\n```\n\nDVLT reference results on the full `benchmark`:\n\n| Dataset | Pose AUC@3 | Pose AUC@30 | Depth inlier@3% | Depth AbsRel |\n|---|---|---|---|---|\n| DTU | 0.8319 | 0.9880 | 0.9706 | 0.0093 |\n| ETH3D | 0.6604 | 0.9536 | 0.7717 | 0.0267 |\n| 7Scenes | 0.1393 | 0.8172 | 0.7437 | 0.0349 |\n| ScanNet++ | 0.7941 | 0.9803 | 0.9239 | 0.0167 |\n| NuScenes | 0.4340 | 0.8534 | 0.5853 | 0.0673 |\n\n\n### Interactive demo\n\nBrowser UI for uploading images \u002F video and exploring the predicted 3D point\ncloud, depth maps and camera trajectory. The dropdown switches between DVLT\nand the baseline wrappers (VGGT, VGGT-Omega, DA3, Pi3, MapAnything);\neach baseline requires its upstream package installed (see\n[docs\u002FINSTALL.md](docs\u002FINSTALL.md)).\n\n```bash\n# Launch on http:\u002F\u002Flocalhost:7860 (DVLT preselected)\npython -m dvlt.scripts.gradio_demo\n```\n\nThe same script also has a headless **offline mode** that skips Gradio\nand writes a `.glb` + `.rrd` per (sequence, model) under\n`demo_outputs\u002F\u003Csequence_name>\u002F`. `--input` accepts a directory of images, a\nsingle image, or a video file (mp4\u002Fmov\u002Fgif\u002F...), and may be repeated to\nprocess multiple sequences in one go; `--models` is a comma-separated list of\nconfig names from the curated registry (or `all`).\n\n```bash\n# Run two models on two sequences (one image dir, one video)\npython -m dvlt.scripts.gradio_demo --offline \\\n    --input \u002Fpath\u002Fto\u002Fscene_dir \\\n    --input \u002Fpath\u002Fto\u002Fclip.mp4 \\\n    --models dvlt\n\n# Run every registered model on one sequence\npython -m dvlt.scripts.gradio_demo --offline --input \u002Fpath\u002Fto\u002Fscene_dir --models all\n```\n\n## Configuration\n\nDVLT uses [Hydra](https:\u002F\u002Fhydra.cc) for configuration. Top-level experiment\nconfigs live in `src\u002Fdvlt\u002Fconfig\u002Fexperiments\u002F`:\n\n| Config | Description |\n|---|---|\n| `dvlt-large` | Stage-1 recipe (large model, full training schedule, linear depth head). |\n| `dvlt-large-ablation` | Vanilla ablation parent — toggle decoupled blocks, no-`s_out`, no-depthscale via overrides. |\n| `dvlt-large-ablation-decoupled` | Fully decoupled blocks (`recurrence_mode=none`, no looping): a distinct block per step, fixed 16 steps. |\n| `dvlt-large-depthconv-stage2` | Stage-2 depth-conv head fine-tune (matches the released checkpoint and the model's default `depth_head_type=\"conv\"`). |\n| `dvlt` | Inference-only alias for the released stage-2 checkpoint. |\n| `vggt`, `vggt_omega`, `da3-{base,large,giant}`, `pi3`, `pi3x`, `mapanything` | Eval-only baseline wrappers. Require the upstream package installed (see [INSTALL.md](docs\u002FINSTALL.md)). |\n\n### User configuration (data paths)\n\nPer-user settings (most importantly, the dataset root) live in\n`src\u002Fdvlt\u002Fconfig\u002Fexperiments\u002Fuser\u002F`. Copy `default.yaml` to `local.yaml`,\nedit `data_root`, and select it via `user=local`:\n\n```bash\npython -m dvlt.scripts.train --config-name dvlt-large data=scannetpp user=local\n```\n\n`user.data_root` can also be overridden inline or via the `DVLT_DATA_ROOT`\nenvironment variable.\n\n### Selecting datasets\n\nPick a single curated dataset config:\n\n```bash\npython -m dvlt.scripts.train --config-name dvlt-large data=scannetpp\npython -m dvlt.scripts.train --config-name dvlt-large data=mixed_all\n```\n\n## Tab completion\n\nFor scripts using the `@cli` decorator (train, test, visualize):\n\n```bash\neval \"$(python -m dvlt.scripts.train -sc install)\"\n# later, to remove:\neval \"$(python -m dvlt.scripts.train -sc uninstall)\"\n```\n\n## Documentation\n\n- [docs\u002FINSTALL.md](docs\u002FINSTALL.md) — environment setup + baseline installs\n- [docs\u002Fdata\u002FDATA.md](docs\u002Fdata\u002FDATA.md) — data pipeline overview + how to\n  add a new dataset parser\n- [docs\u002FCONTRIB.md](docs\u002FCONTRIB.md) — dev setup, code style, tests\n- [docs\u002FTESTING.md](docs\u002FTESTING.md) — full test-runner documentation\n\n## Acknowledgments\n\nWe are also grateful to several other open-source repositories that we drew inspiration from or built upon during the development of our pipeline:\n- [VGGT](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fvggt)\n- [Pi3](https:\u002F\u002Fgithub.com\u002Fyyfz\u002FPi3)\n- [CUT3R](https:\u002F\u002Fgithub.com\u002FCUT3R\u002FCUT3R)\n- [MapAnything](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fmap-anything)\n- [Depth-Anything-3](https:\u002F\u002Fgithub.com\u002Fbytedance-seed\u002Fdepth-anything-3)\n\n## Citation\n\nIf you find this work useful, please cite:\n\n```bibtex\n@article{burzio2026dejaview,\n  title   = {D\\'ej\\`a View: Looping Transformers for Multi-View 3D Reconstruction},\n  author  = {Burzio, Alessandro and Fischer, Tobias and Elflein, Sven and Zhou, Qunjie and de Lutio, Riccardo and Ren, Jiawei and Huang, Jiahui and Huang, Shengyu and Pollefeys, Marc and Leal-Taix{\\'e}, Laura and Gojcic, Zan and Turki, Haithem},\n  journal = {arXiv preprint arXiv:2605.30215},\n  year    = {2026}\n}\n```\n\n## License + attribution\n\nThe DVLT **code** is released mostly under the **Apache License, Version 2.0** — see\n[LICENSE](LICENSE). The **model weights** (the `nvidia\u002Fdvlt` checkpoint) are\nreleased under the **NVIDIA License** — non-commercial, research-and-evaluation\nuse only; see [LICENSES\u002FNVIDIA-LICENSE.txt](LICENSES\u002FNVIDIA-LICENSE.txt).\n\nPortions of the codebase are adapted from third-party open-source projects\n(DINOv2, PyTorch3D, MoGe, AnyCalib, MultiNeRF, Depth-Anything-3, VGGT). Each\nadapted file carries the upstream copyright + license notice in its header;\nsee [THIRD_PARTY_LICENSES.md](THIRD_PARTY_LICENSES.md) for the full attribution\nmap and full upstream license texts. The VGGT-derived files are distributed\nunder the VGGT License; see [LICENSES\u002FVGGT-LICENSE.txt](LICENSES\u002FVGGT-LICENSE.txt).\n\nThe baseline evaluation wrappers in `src\u002Fdvlt\u002Fmodel\u002F{vggt,vggt_omega,da3,mapanything,pi3}\u002F`\nimport (do not vendor) their respective upstream packages, each of which is\ngoverned by its own license — see\n[THIRD_PARTY_LICENSES.md](THIRD_PARTY_LICENSES.md) §\"Upstream packages used\nfor evaluation\".\n","Déjà View (DVLT) 是一个用于多视角3D重建的循环变换器，能够从无序图像集中生成逐像素光线、深度、置信度和相机姿态。该项目的核心功能包括通过共享块的帧\u002F全局注意力循环与离散深度索引结合，实现高效且高质量的3D重建。训练完成后，可以通过调整细化步骤数K来灵活控制推理时的计算量，以较低参数量达到甚至超越大型前馈模型的效果。DVLT适用于需要高精度3D重建但资源受限的场景，如移动设备上的增强现实应用或对实时性要求较高的机器人视觉任务。",2,"2026-06-11 04:09:00","CREATED_QUERY"]