[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82859":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":27,"discoverSource":28},82859,"PaGeR","prs-eth\u002FPaGeR","prs-eth","PaGeR — Unified Panoramic Geometry Estimation via Multi-View Foundation Models",null,"Python",122,9,2,0,4,47,64,27,77.4,"Other",false,"main",[],"2026-06-12 04:01:39","\u003Cdiv align=\"center\">\n\n\u003Ch1>📟 PaGeR — Unified Panoramic Geometry Estimation via Multi-View Foundation Models\u003C\u002Fh1>\n\n\u003Cp>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.26368\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-b31b1b\" alt=\"Paper\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fpager360.github.io\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-online-1f6feb\" alt=\"Project Page\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fprs-eth\u002FPaGeR\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97-Demo-yellow\" alt=\"Demo\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fprs-eth\u002Fpager-697241d06b3733a6f18e4d39\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97-Collection-FFD21E\" alt=\"HF Collection\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fprs-eth\u002FZuriPano\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDataset-ZuriPano-7e57c2\" alt=\"ZuriPano dataset\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fprs-eth\u002FPanoInfinigen\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDataset-PanoInfinigen-7e57c2\" alt=\"PanoInfinigen dataset\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode-Apache_2.0-green\" alt=\"Code license\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE-MODEL\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeights-CC_BY--NC_4.0-orange\" alt=\"Weights license\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cimg src=\"assets\u002Fteaser.png\" alt=\"PaGeR teaser: panoramic depth, normals, and sky from a single ERP input\" width=\"100%\" \u002F>\n\n\u003C\u002Fdiv>\n\nPaGeR (**Pa**noramic **Ge**ometry **R**econstruction) lifts a perspective 3D\nfoundation model to the 360° panoramic domain. From a single equirectangular\nimage, a single forward pass returns:\n\n- **Scale-invariant depth** at full panoramic resolution.\n- **Metric depth** in metres — recovered by multiplying the SI depth by a\n  single per-panorama scale emitted by a separate scale head; PaGeR\n  ships two such heads (indoor \u002F outdoor) and the inference pipeline picks\n  one per panorama via a CLIP router.\n- **Surface normals** as unit vectors in the world frame of the panorama.\n- **Sky segmentation** for masking unbounded depth regions.\n\nThis repository contains the **inference and evaluation code** that produced\nthe numbers in our paper, plus a Gradio demo and helpers to export the\npredicted geometry as a coloured point cloud.\n\n## Release status\n\n- **2026-05-27** — arXiv preprint and project website go live.\n- **2026-05-26** — code, three pretrained checkpoints (`prs-eth\u002FPaGeR`,\n  `prs-eth\u002FPaGeR-metric-depth`, `prs-eth\u002FPaGeR-normals`) and both datasets\n  (`prs-eth\u002FZuriPano`, `prs-eth\u002FPanoInfinigen`) are live on the Hub.\n\n---\n\n## Table of contents\n\n- [Release status](#release-status)\n- [Installation](#installation)\n- [Pretrained models](#pretrained-models)\n- [Gradio demo](#gradio-demo)\n- [Batch inference](#batch-inference)\n- [Evaluation](#evaluation)\n- [Point-cloud export](#point-cloud-export)\n- [Citation](#citation)\n- [License](#license)\n\n---\n\n## Installation\n\nPaGeR is tested with **Python 3.10**, **PyTorch ≥ 2.0**, and **CUDA ≥ 12.1**\non Linux. Other configurations likely work; these are the ones we run in CI.\nPaGeR always projects the input panorama into a fixed **6 × 504 × 504**\ncubemap before the forward pass, regardless of the input ERP resolution, so\npeak VRAM and runtime do not scale with the input size (only the final\ncubemap-to-ERP stitch does). At that fixed resolution the unified checkpoint\n(backbone + depth\u002Fnormals\u002Fsky\u002Fscale heads + CLIP router) needs\n**≈ 11.5 GiB of VRAM** at `fp16` on an RTX 4090, so any 3090 \u002F 4090 \u002F\nA4000-class GPU with ≥ 12 GB is enough; the depth-only and normals-only\ncheckpoints fit in ≈ 9.8 GiB.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fprs-eth\u002FPaGeR.git\ncd PaGeR\n\n# Editable install — exposes `depth_anything_3` as an importable package so\n# the backbone's internal absolute imports resolve. The rest of the code\n# (Pager, dataloaders, eval scripts) is run directly from the repo root.\npip install -e .\n\n# Gradio demo (optional)\npip install -e \".[app]\"\n```\n\nIf you hit XFormers wheel issues on older GPUs, see\n[the upstream FAQ](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fxformers) for build\ninstructions.\n\n## Pretrained models\n\nThree checkpoints on the Hub, same ViT-Giant backbone, different heads:\n\n| Checkpoint | `--checkpoint` alias | SI depth | Metric depth | Normals | Sky |\n|---|---|---|---|---|---|\n| **PaGeR** *(recommended)* — [`prs-eth\u002FPaGeR`](https:\u002F\u002Fhuggingface.co\u002Fprs-eth\u002FPaGeR) | `pager` | ✅ | ✅ (SI × CLIP-routed indoor \u002F outdoor scale head) | ✅ | ✅ |\n| PaGeR-Metric-Depth — [`prs-eth\u002FPaGeR-metric-depth`](https:\u002F\u002Fhuggingface.co\u002Fprs-eth\u002FPaGeR-metric-depth) | `pager-metric-depth` | — | ✅ (single direct head) | — | — |\n| PaGeR-Normals — [`prs-eth\u002FPaGeR-normals`](https:\u002F\u002Fhuggingface.co\u002Fprs-eth\u002FPaGeR-normals) | `pager-normals` | — | — | ✅ | — |\n\n`inference.py` \u002F `app.py` accept the alias, a Hub repo id, or a local\ndirectory with `model.safetensors` + `config.yaml`; Hub weights are streamed\ninto the HF cache on first use.\n\n> **Heads up — indoor \u002F outdoor scale routing.** On the unified checkpoint,\n> `--scene_mode auto` (default) runs a small CLIP ViT-B\u002F32 classifier on the\n> cubemap to route each panorama through the matching indoor \u002F outdoor scale head;\n> pass `--scene_mode indoor` or `outdoor` to force one head and reproduce\n> the per-domain paper numbers. Note that the automatic CLIP-based indoor \u002F\n> outdoor routing was added **after** the paper submission — the paper\n> numbers were produced with `--scene_mode indoor` \u002F `outdoor` forced per\n> dataset, so use those flags to exactly reproduce the reported results.\n\n## Gradio demo\n\n```bash\npython app.py --checkpoint pager     # or prs-eth\u002FPaGeR \u002F a local dir\n```\n\nOpen `http:\u002F\u002F127.0.0.1:7860` and drop a panorama into the file picker. The\ndemo includes example panoramas, switches between map and point-cloud\noutput, and exposes the same auto \u002F indoor \u002F outdoor scale-head routing as\nthe CLI. A hosted version is also available on the\n[PaGeR HuggingFace Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fprs-eth\u002FPaGeR).\n\n## Batch inference\n\n`inference.py` runs the model on every panorama in a chosen evaluation\ndataset and writes raw predictions plus side-by-side previews:\n\n```bash\npython inference.py \\\n    --config configs\u002Finference.yaml \\\n    --checkpoint \u003Cmodel_name> \\\n    --data_path \u002Fpath\u002Fto\u002Fdatasets \\\n    --dataset \u003Cdataset_name> \\\n    --results_path results\u002F \\\n    --scene_mode auto \\\n    --generate_eval\n```\n\nKey flags:\n\n- `--checkpoint` — which model to run. Accepts a short alias for one of the\n  released checkpoints — `pager` (default,\n  [`prs-eth\u002FPaGeR`](https:\u002F\u002Fhuggingface.co\u002Fprs-eth\u002FPaGeR)), `pager-metric-depth`\n  ([`prs-eth\u002FPaGeR-metric-depth`](https:\u002F\u002Fhuggingface.co\u002Fprs-eth\u002FPaGeR-metric-depth)),\n  `pager-normals` ([`prs-eth\u002FPaGeR-normals`](https:\u002F\u002Fhuggingface.co\u002Fprs-eth\u002FPaGeR-normals))\n  — *or* a HuggingFace Hub repo id (`\u003Cuser>\u002F\u003Crepo>`) *or* a local directory\n  containing `model.safetensors` + `config.yaml`.\n- `--scene_mode {auto,indoor,outdoor}` — controls the scale-head routing. On\n  the unified checkpoint `auto` is the default; on single-domain checkpoints\n  this flag has no effect.\n- `--generate_eval` — also write per-sample `.npz` arrays under\n  `\u003Cresults>\u002F\u003Cmodality>\u002Fpreds\u002F` so the evaluation scripts can pick them up.\n- `--sky_mask_threshold`, `--sky_mask_softness`, `--sky_mask_open_kernel` —\n  tune the soft sky-fill applied to the depth \u002F normals outputs.\n\nExpected dataset layout under `--data_path`:\n\n```\n\u003Cdata_path>\u002F\n├── Matterport3D360\u002F\n├── Stanford2D3DS\u002F\n├── Structured3D\u002F\n└── Replica360_4K\u002F\n```\n\nDownload \u002F access pages for each dataset:\n\n| Dataset | Source | Use in PaGeR |\n|---|---|---|\n| [Matterport3D360](https:\u002F\u002Fresearchdata.bath.ac.uk\u002F1126\u002F) | re-projected from [Matterport3D](https:\u002F\u002Fniessner.github.io\u002FMatterport\u002F) | eval only |\n| [Stanford 2D-3D-S](https:\u002F\u002Fsdss.redivis.com\u002Fdatasets\u002Ff304-a3vhsvcaf) | Armeni et al., Stanford | eval only |\n| [Replica360_4K](https:\u002F\u002Fopendatalab.com\u002FOpenDataLab\u002FReplica_360_2k_4k_RGBD) | 4K equirectangular renders of [Replica](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Freplica-dataset) (Facebook Research) | eval only |\n| [Structured3D](https:\u002F\u002Fstructured3d-dataset.org\u002F) | Zheng et al. | training + eval (normals) |\n| [ZüriPano](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fprs-eth\u002FZuriPano) | released with PaGeR | eval |\n| [PanoInfinigen](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fprs-eth\u002FPanoInfinigen) | released with PaGeR (Infinigen \u002F iCity renders) | training |\n\nThe three eval-only datasets (Matterport3D360, Stanford2D3DS, Replica360_4K)\nare gated behind the upstream EULAs; please obtain them from the linked\nsource pages. See [`NOTICE`](NOTICE) for the per-dataset licenses and\nobligations, and each dataloader in [`dataloaders\u002F`](dataloaders\u002F) for the\nexact on-disk layout it expects (image \u002F depth \u002F mask \u002F normals naming).\n\n## Evaluation\n\n`evaluation\u002Fdepth_evaluation.py` scores the cached depth predictions against\nground truth, on **Matterport3D360**, **Stanford2D3DS** or **ZuriPano**. Run\nit once per dataset:\n\n```bash\n# Metric depth (in-metres, no alignment).\npython evaluation\u002Fdepth_evaluation.py \\\n    --data_path \u002Fpath\u002Fto\u002Fdatasets \\\n    --dataset \u003Cdataset-name> \\\n    --pred_path results \\\n    --alignment_type metric\n\n# Scale-invariant depth (least-squares scale alignment).\npython evaluation\u002Fdepth_evaluation.py \\\n    --data_path \u002Fpath\u002Fto\u002Fdatasets \\\n    --dataset \u003Cdataset-name> \\\n    --pred_path results \\\n    --alignment_type scale\n```\n\nReported metrics: **AbsRel**, **RMSE (linear)**, **δ₁**, averaged uniformly\nover the valid ERP pixels. Results land in\n`\u003Cpred_path>\u002Fdepth\u002Fevaluation_metrics_\u003Calignment>.txt`.\n\nFor surface normals on Structured3D:\n\n```bash\npython evaluation\u002Fnormals_evaluation.py \\\n    --data_path \u002Fpath\u002Fto\u002Fdatasets \\\n    --dataset Structured3D \\\n    --pred_path results\n```\n\nTo quantify cubemap-stitching artefacts in the depth predictions (Table 4a in\nthe paper) on **Replica360_4K**:\n\n```bash\npython evaluation\u002Fseams_evaluation.py \\\n    --data_path \u002Fpath\u002Fto\u002Fdatasets \\\n    --dataset Replica360_4K \\\n    --pred_path results\n```\n\nReports the three metrics `seam_defect_density`, `seam_prevalence`,\n`seam_severity` — see the appendix of our paper for the exact definitions.\n\n## Point-cloud export\n\n```bash\npython generate_point_cloud.py \\\n    --data_path \u002Fpath\u002Fto\u002Fdatasets \\\n    --dataset \u003Cdataset-name> \\\n    --depth_path results \\\n    --color_modality rgb \\\n    --max_points 1000000\n```\n\nGLBs land under `\u003Cdepth_path>\u002Fdepth\u002Fpoint_clouds\u002F` (or\n`\u003Cdepth_path>\u002Fnormals\u002Fpoint_clouds\u002F` when colouring by predicted normals).\nPass `--color_modality normals` to colour by predicted normals.\n\n## Citation\n\nIf you use PaGeR, the released checkpoints, or the ZüriPano \u002F PanoInfinigen\ndatasets in your work, please cite:\n\n```bibtex\n@article{bozic2026pager,\n  title   = {Unified Panoramic Geometry Estimation via Multi-View Foundation Models},\n  author  = {Bozic, Vukasin and Slavkovic, Isidora and Narnhofer, Dominik and\n             Metzger, Nando and Rozumny, Denis and Schindler, Konrad and\n             Kalischek, Nikolai},\n  journal = {arXiv preprint arXiv:2605.26368},\n  year    = {2026}\n}\n```\n\n## License\n\nPaGeR is released under **three separate licenses**, one per artifact\nclass, because the training and modeling stack is not uniform.\n\n| Artifact | License | File |\n|---|---|---|\n| **Source code** (this repository) | [Apache License 2.0](LICENSE) | `LICENSE` |\n| **Pretrained model weights** (HuggingFace `prs-eth\u002FPaGeR*`) | [CC BY-NC 4.0](https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc\u002F4.0\u002F) — academic \u002F non-commercial only | `LICENSE-MODEL` |\n| **ZüriPano dataset** | [CC BY 4.0](https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002F) | (on the dataset HF repo) |\n| **PanoInfinigen — nature split** | [BSD 3-Clause](https:\u002F\u002Fopensource.org\u002Flicense\u002Fbsd-3-clause) (inherited from Infinigen) | (on the dataset HF repo) |\n| **PanoInfinigen — indoor split** | [BSD 3-Clause](https:\u002F\u002Fopensource.org\u002Flicense\u002Fbsd-3-clause) (inherited from Infinigen) | (on the dataset HF repo) |\n| **PanoInfinigen — urban split** | [CC BY-NC 4.0](https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby-nc\u002F4.0\u002F) (iCity-encumbered) | (on the dataset HF repo) |\n\nThe non-commercial restriction on the weights is inherited from the\n[Depth Anything 3](https:\u002F\u002Fgithub.com\u002FByteDance-Seed\u002FDepth-Anything-3)\nViT-Giant backbone (CC BY-NC 4.0) and from the non-commercial \u002F research-only\nterms of PanoInfinigen — urban subset. See [`NOTICE`](NOTICE) for the full\nthird-party attribution list and per-dataset breakdown.\n\n## Acknowledgements\n\nPaGeR builds on top of\n[Depth Anything 3](https:\u002F\u002Fgithub.com\u002FByteDance-Seed\u002FDepth-Anything-3) —\nthe backbone and the multi-view inference code under\n[`src\u002Fdepth_anything_3\u002F`](src\u002Fdepth_anything_3\u002F) are derived from that\nproject. The training script is adapted from\n[Marigold-E2E-FT](https:\u002F\u002Fgithub.com\u002FVisualComputingInstitute\u002Fdiffusion-e2e-ft).\nThe indoor\u002Foutdoor classifier uses OpenAI CLIP via\n[`open_clip`](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002Fopen_clip).\n","PaGeR是一个基于多视角基础模型的全景几何估计统一框架。它能够从单张等矩形图像中通过一次前向传递生成全景分辨率下的尺度不变深度、度量深度（米）、表面法线以及天空分割。项目采用Python开发，具备室内\u002F室外场景自适应选择功能，并提供Gradio演示界面和点云导出工具。适用于需要对360°全景图像进行深度估计、法线计算或天空区域识别的应用场景，如虚拟现实、增强现实及三维重建等领域。","2026-06-11 04:09:27","CREATED_QUERY"]