[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83797":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":14,"stars7d":16,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":16,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":9,"trendingCount":15,"starSnapshotCount":15,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},83797,"MDA","biansy000\u002FMDA","biansy000","Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation",null,"Python",57,4,51,1,0,5,2.1,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:04:35","\u003Cdiv align=\"center\">\n\u003Ch1 style=\"border-bottom: none; margin-bottom: 0px\">Modeling Depth Ambiguity:\u003Cbr>A Mixture-Density Representation for Flying-Point-Free Depth Estimation\u003C\u002Fh1>\n\n**Siyuan Bian\\*, Congrong Xu\\*, Jun Gao**\n\n\u003Ca href=\"https:\u002F\u002Fbiansy000.github.io\u002Fmda-site\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-MDA-green\" alt=\"Project Page\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fsy000\u002FMDA\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Checkpoints-yellow\" alt=\"Hugging Face Checkpoints\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.02552\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2606.02552-b31b1b\" alt=\"arXiv\">\u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\nThis repository is the official code for **MDA**, from the paper *\"Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation\"* ([arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.02552) · [project page](https:\u002F\u002Fbiansy000.github.io\u002Fmda-site\u002F)).\n\nCommon feed-forward depth models predict one depth value per pixel. At object edges this fails: the pixel covers both foreground and background, so its depth is *ambiguous*, and a single value falls between the two surfaces — a *flying point* that corrupts the reconstruction.\n\n**MDA** replaces the single value with a *mixture density*: each pixel predicts a few depth hypotheses with probabilities, then picks one instead of averaging. This **largely eliminates flying points**, stays **robust to input blur**, adds **negligible overhead**, and works across backbones — both **DA3** and **VGGT**.\n\n## 📰 News\n\n- **2026-06-02:** Public release. Training code, evaluation scripts, and the `mda_mog_sky_l2` and `vggt_mog_l2` checkpoints are now available.\n\n\n\n## 🚀 Quick Start\n\n### 📦 Installation\n\nMDA installs in two passes. The core package covers inference and the mixture-density head. A few extras are needed only for training, evaluation, or Gaussian-splatting export.\n\n```bash\nconda create -n mda python=3.10 -y\nconda activate mda\n\n# Install PyTorch for your CUDA version. This is one example.\npip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n\n# Core package (mixture-density head + inference).\npip install -e .\n\n# Optional: all inference extras (point-cloud viewers, format converters).\npip install -e \".[all]\"\n```\n\n**Training and evaluation extras.** The Hydra\u002FLightning training launcher and the benchmark eval scripts use a few libraries that `pyproject.toml` does not declare. Install them when you need to train or evaluate:\n\n```bash\n# Training stack.\npip install hydra-core lightning lightning-bolts torchmetrics rootutils \\\n            accelerate peft pyyaml tqdm\n\n# Eval and visualization utilities (src\u002Ftesting\u002Feval_cut3r\u002F*, src\u002Ftraining\u002Fda3_wrapper.py).\npip install matplotlib scipy sympy\n```\n\n**ffmpeg.** The `bash src\u002Ftesting\u002Frun_demo.sh \u003Cvideo>` flow extracts frames from a video, so `ffmpeg` must be on your `PATH`:\n\n- Debian \u002F Ubuntu: `sudo apt-get install ffmpeg`\n- macOS (Homebrew): `brew install ffmpeg`\n\n### 🧱 Download the checkpoints\n\nThe pretrained MDA checkpoints are on the Hugging Face Hub at [`sy000\u002FMDA`](https:\u002F\u002Fhuggingface.co\u002Fsy000\u002FMDA). Download them into `checkpoints\u002FMDA\u002F`, the path `src\u002Ftesting\u002Futils\u002Fmodel_choice.py` expects:\n\n```bash\nhf download sy000\u002FMDA --local-dir checkpoints\u002FMDA\n```\n\nThis places two checkpoints:\n\n| `--model_name` | Backbone | Checkpoint file | Notes |\n|---|---|---|---|\n| `mda_mog_sky_l2` **(default)** | DA3 Giant + Gaussian mixture + sky | `checkpoints\u002FMDA\u002FDA3_MOG_Sky_LogL2.ckpt` | Default model; main results in the paper. |\n| `vggt_mog_l2` | VGGT-1B + MDA head | `checkpoints\u002FMDA\u002FVGGT_MOG_LogL2.ckpt` | Same head on a VGGT-1B backbone. |\n\n(`hf` ships with `huggingface_hub`. Run `pip install -U huggingface_hub` if the command is missing.)\n\n### 💻 Run the demo\n\n`demo.py` takes a folder of images, a single video file (frames extracted with `ffmpeg`), or a single image (monocular inference). All settings live in the `DemoConfig` at the top of the file; every field is also a CLI flag.\n\n```bash\n# 1. Bundled multi-view examples (video frames or unordered indoor stills).\npython demo.py assets\u002Fexamples\u002Fdolomiti\npython demo.py assets\u002Fexamples\u002Fdiode_indoor\n\n# 2. Single-image (monocular) example.\npython demo.py assets\u002Fexamples\u002Fmono\u002Fpainting\u002Fpainting.jpeg\n\n# 3. Your own data.\npython demo.py path\u002Fto\u002Fvideo.mp4 --fps 5\npython demo.py path\u002Fto\u002Fimage_folder --image_stride 10   # keep every 10th image\npython demo.py path\u002Fto\u002Fimage_folder --model_name vggt_mog_l2\n```\n\nThe default model is `mda_mog_sky_l2`. Override it with `--model_name` (see the table above, or `src\u002Ftesting\u002Futils\u002Fmodel_choice.py` for all names). Outputs go to `--output_dir` (default `eval_results\u002Fdemo\u002F\u003Cinput_basename>\u002F\u003Cmodel_name>\u002F`):\n\nAfter inference, an interactive viser point-cloud viewer launches automatically (disable with `--no-viewer`). To browse several finished runs in one viewer with a dropdown:\n\n```bash\npython view.py --data_dir eval_results\u002Fdemo --method mda_mog_sky_l2\n```\n\nThe original shell pipeline (`src\u002Ftesting\u002Frun_demo.sh` wrapping `src\u002Ftesting\u002Frun_inference_video.py`) is still available for the `.ply`-export flow.\n\n## 🏋️ Training\n\nTraining uses Hydra to compose an experiment config under `configs\u002Fexperiment\u002Fmda\u002F` (the `.yaml` extension is implicit). Each config finetunes a pretrained DA3 or VGGT checkpoint with **K = 4** mixture components for **10k steps** on **4 × RTX Pro 6000**, learning rate **1e-4**, batch size **48** (paper §5.1.1).\n\n```bash\n# Default: DA3 + Gaussian mixture + sky component.\npython src\u002Ftraining\u002Ftrain.py experiment=mda\u002Fda3_mog_sky_full\n```\n\nOther recipes under `configs\u002Fexperiment\u002Fmda\u002F`:\n\n| Config | Description |\n|---|---|\n| `da3_mog_sky_full` | DA3 + Gaussian mixture + sky component **(default)** |\n| `da3_mog_sky_full_l1` | DA3 + Laplacian mixture (paper Table 1, \"LMM\" row) |\n| `vggt_mog_full` | VGGT backbone + MDA head |\n\nOverride any Hydra field on the command line:\n\n```bash\npython src\u002Ftraining\u002Ftrain.py experiment=mda\u002Fda3_mog_sky_full \\\n    trainer.devices=4 data.num_views=8 logger=wandb\n```\n\n**Training data.** The synthetic training mix follows the DA3 recipe: AriaSyntheticENV, HyperSim, MvsSynth, OmniWorld, PointOdyssey, TartanAir, vKitti2, DynamicReplica, UnrealStereo4K (paper §5.1.1).\n\n## 📊 Evaluation\n\nTwo launcher scripts cover the two benchmark tracks in the paper. They use the same checkpoints as the demo. Each script selects the model by name through `src\u002Ftesting\u002Futils\u002Fmodel_choice.py`, and both default to `mda_mog_sky_l2`. To evaluate a different model, edit the `model_names` array at the top of the script (for example, set it to `vggt_mog_l2`).\n\n```bash\n# Boundary-quality benchmark (NRGBD, 7Scenes, HiRoom) — paper Table 1.\nbash src\u002Ftesting\u002Feval_cut3r\u002Fmv_recon\u002Frun_mv_recon.sh\n\n# Video-depth benchmark (Sintel, Bonn, KITTI, DIODE) — paper Table 2.\nbash src\u002Ftesting\u002Feval_cut3r\u002Fvideo_depth\u002Frun_video_depth.sh\n```\n\nBoth scripts write per-dataset and per-model outputs under `eval_results\u002F`. \n## 📝 Citation\n\nIf you build on **MDA**, please cite:\n\n```bibtex\n@misc{bian2026modeling,\n  title         = {Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation},\n  author        = {Siyuan Bian and Congrong Xu and Jun Gao},\n  year          = {2026},\n  eprint        = {2606.02552},\n  archivePrefix = {arXiv},\n  primaryClass  = {cs.CV},\n  url           = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.02552}\n}\n```\n\n## 🙏 Acknowledgements\n\nThis codebase builds on these open-source releases:\n\n- [**Depth Anything 3**](https:\u002F\u002Fgithub.com\u002FByteDance-Seed\u002FDepth-Anything-3) — one of the two backbones, and the source of the DINOv2-based encoder, DPT head, and inference code.\n- [**Stream3R**](https:\u002F\u002Fgithub.com\u002FNIRVANALAN\u002FSTream3R) — the Hydra\u002FLightning training launcher, multi-view DUSt3R data modules, and streaming VGGT-style sequence wrapper.\n\nWe thank the authors for their work.\n",2,"2026-06-11 04:11:29","CREATED_QUERY"]