[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75873":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},75873,"vggt-omega","facebookresearch\u002Fvggt-omega","facebookresearch","[CVPR 2026 Oral] VGGT Omega","",null,"Python",2913,115,31,24,0,29,449,1979,222,28.19,"Other",false,"main",[],"2026-06-12 02:03:36","\u003Cdiv align=\"center\">\n\u003Ch1>VGGT-&Omega;\u003C\u002Fh1>\n\n\u003Ca href=\"http:\u002F\u002Fvggt-omega.github.io\u002F\" target=\"_blank\" rel=\"noopener noreferrer\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-green\" alt=\"Project Page\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15195\" target=\"_blank\" rel=\"noopener noreferrer\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.15195-b31b1b\" alt=\"arXiv\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fvggt-omega\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Demo-blue'>\u003C\u002Fa>\n\n\u003Cp>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fjytime.github.io\u002F\">Jianyuan Wang\u003C\u002Fa>\u003Csup>1,2\u003C\u002Fsup>\u003C\u002Fspan>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fsilent-chen.github.io\u002F\">Minghao Chen\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\u003C\u002Fspan>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=FUDsZkEAAAAJ&amp;hl=zh-CN\">Shangzhan Zhang\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\u003C\u002Fspan>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fnikitakaraevv.github.io\u002F\">Nikita Karaev\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\u003C\u002Fspan>\n  \u003Cbr>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fdemuc.de\u002F\">Johannes Schönberger\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>\u003C\u002Fspan>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=IJidh-UAAAAJ&amp;hl=fr\">Patrick Labatut\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>\u003C\u002Fspan>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=lJ_oh2EAAAAJ&amp;hl=en\">Piotr Bojanowski\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>\u003C\u002Fspan>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fd-novotny.github.io\u002F\">David Novotny\u003C\u002Fa>\u003C\u002Fspan>\n  \u003Cbr>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vedaldi\u002F\">Andrea Vedaldi\u003C\u002Fa>\u003Csup>1,2\u003C\u002Fsup>\u003C\u002Fspan>\n  \u003Cspan class=\"author\">\u003Ca href=\"https:\u002F\u002Fchrirupp.github.io\u002F\">Christian Rupprecht\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\u003C\u002Fspan>\n\u003C\u002Fp>\n\n**\u003Csup>1\u003C\u002Fsup>[Visual Geometry Group, University of Oxford](https:\u002F\u002Fwww.robots.ox.ac.uk\u002F~vgg\u002F)**; **\u003Csup>2\u003C\u002Fsup>[Meta AI](https:\u002F\u002Fai.facebook.com\u002Fresearch\u002F)**\n\u003C\u002Fdiv>\n\n## Pretrained models\n\nBefore using the models, please request access to the checkpoints [here](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002FVGGT-Omega). Once your request is approved, you can download the checkpoints. Please note that access requests are reviewed by an automated process based on the information provided in the request.\n\n| Model | Resolution | Text alignment | Download |\n| :--- | :--- | :--- | :--- |\n| `VGGT-Omega-1B-512` | 512 | No | [Link](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002FVGGT-Omega\u002Fblob\u002Fmain\u002Fvggt_omega_1b_512.pt) |\n| `VGGT-Omega-1B-256-Text-Alignment` | 256 | Yes | [Link](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002FVGGT-Omega\u002Fblob\u002Fmain\u002Fvggt_omega_1b_256_text.pt) |\n\nThe authors are not involved in the review process and cannot approve or reject individual applications. However, the [🤗 Hugging Face demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Ffacebook\u002Fvggt-omega) is available to everyone.\n\n\n## Quick Start\n\nFirst, clone this repository and install the dependencies:\n\n```bash\ngit clone git@github.com:facebookresearch\u002Fvggt-omega.git\ncd vggt-omega\npip install -r requirements.txt\npip install -e .\n```\n\n\nNow, try the model with a few lines of code:\n\n```python\nimport torch\n\nfrom vggt_omega.models import VGGTOmega\nfrom vggt_omega.utils.load_fn import load_and_preprocess_images\nfrom vggt_omega.utils.pose_enc import encoding_to_camera\n\ncheckpoint_path = \"path\u002Fto\u002Fvggt_omega_1b_512.pt\"\nimage_names = [\"path\u002Fto\u002FimageA.png\", \"path\u002Fto\u002FimageB.png\", \"path\u002Fto\u002FimageC.png\"]\n\nmodel = VGGTOmega().to(\"cuda\").eval()\nmodel.load_state_dict(torch.load(checkpoint_path, map_location=\"cpu\"))\n\nimages = load_and_preprocess_images(image_names, image_resolution=512).to(\"cuda\")\n\nwith torch.inference_mode():\n    predictions = model(images)\n\nextrinsics, intrinsics = encoding_to_camera(\n    predictions[\"pose_enc\"],\n    predictions[\"images\"].shape[-2:],\n)\n\ndepth = predictions[\"depth\"]\ndepth_conf = predictions[\"depth_conf\"]\ncamera_and_register_tokens = predictions[\"camera_and_register_tokens\"]\ncamera_tokens = camera_and_register_tokens[:, :, :1]\nregisters = camera_and_register_tokens[:, :, 1:]\n```\n\nFor the text-aligned checkpoint, use `VGGTOmega(enable_alignment=True)` with `image_resolution=256` and read `predictions[\"text_alignment_embedding\"]`.\n\n\n## Interactive Demo\n\nInstall the demo dependencies:\n\n```bash\npip install -r requirements_demo.txt\n```\n\nLaunch the Gradio demo with a local checkpoint path:\n\n```bash\npython demo_gradio.py \\\n  --checkpoint checkpoints\u002FVGGT-Omega-1B-512\u002Fmodel.pt \\\n  --image-resolution 512\n```\n\nThe demo accepts uploaded images or a video, runs camera and depth inference,\nand visualizes the depth-unprojected point cloud and predicted cameras as a GLB\nscene.\n\n## Runtime and GPU Memory\n\nWe benchmark the end-to-end peak GPU memory usage of `VGGT-Omega-1B-512` on a\nsingle NVIDIA A100 GPU with 624x416 input images. The measurement covers the full\ninference program, from loading the model weights onto the GPU through the\nforward pass, so it includes both the memory needed to store the model itself\nand the memory used by inference activations and buffers. In other words, a GPU\nwith at least the listed available memory is able to run the corresponding\nnumber of input frames under this setup.\n\n| **Input Frames** | 1 | 10 | 25 | 50 | 100 | 200 | 300 | 400 | 500 |\n|:----------------:|:-:|:--:|:--:|:--:|:---:|:---:|:---:|:---:|:---:|\n| **Peak Memory (GB)** | 6.02 | 6.67 | 7.80 | 9.66 | 13.37 | 20.82 | 28.26 | 35.71 | 43.15 |\n\nThe benchmark uses [`load_and_preprocess_images`](.\u002Fvggt_omega\u002Futils\u002Fload_fn.py)\nwith the default `mode=\"balanced\"` and `image_resolution=512`. For these roughly\n3:2 landscape images, this produces 624x416 inputs. You can set\n`mode=\"max_size\"` to resize the longest side to 512 instead; for the same aspect\nratio, this gives about 512x336 inputs and uses less GPU memory.\n\n## License\n\nSee the [LICENSE](.\u002FLICENSE) file for details about the license under which\nthis code is made available.\n\n[^release]: This Release is intended to support the open source research community.\n\n```bibtex\n@misc{wang2026vggtomega,\n      title={VGGT-$\\Omega$}, \n      author={Jianyuan Wang and Minghao Chen and Shangzhan Zhang and Nikita Karaev and Johannes Schönberger and Patrick Labatut and Piotr Bojanowski and David Novotny and Andrea Vedaldi and Christian Rupprecht},\n      year={2026},\n      eprint={2605.15195},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15195}, \n}\n```\n","VGGT Omega 是一个先进的计算机视觉模型，主要用于图像理解和生成任务。该项目的核心功能包括高分辨率图像处理和文本对齐能力，提供了两种预训练模型：一种支持512分辨率的图像处理但不支持文本对齐，另一种则在256分辨率下实现了文本对齐。技术上基于Python开发，并通过Hugging Face平台提供模型访问与演示服务。适用于需要高质量图像生成或特定内容创作的应用场景，如艺术设计、广告制作等领域。",2,"2026-06-11 03:53:33","CREATED_QUERY"]