[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82167":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":13,"subscribersCount":13,"size":13,"stars1d":13,"stars7d":16,"stars30d":17,"stars90d":13,"forks30d":13,"starsTrendScore":13,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":13,"starSnapshotCount":13,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},82167,"g3t","g3t-paper\u002Fg3t","g3t-paper","Code for G3T and G3T-Long","https:\u002F\u002Fg3t-paper.github.io\u002F",null,"Python",41,0,29,1,9,12,42.7,false,"main",[],"2026-06-12 04:01:37","# G3T Up! Gravity Aligned Coordinate Frames Simplify Pointmap Processing \n\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-4CAF50?logo=googlechrome&logoColor=green)](https:\u002F\u002Fg3t-paper.github.io\u002F)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-b31b1b?logo=arxiv&logoColor=red)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.27372)\n[![Weights](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Weights-FFD21E)](https:\u002F\u002Fhuggingface.co\u002Fthatbrguy\u002Fg3t)\n\n[Bharath Raj Nagoor Kani](https:\u002F\u002Fbharathrajn.com\u002F), [Noah Snavely](https:\u002F\u002Fwww.cs.cornell.edu\u002F~snavely\u002F) \u003Cbr\u002F>\nCornell University\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"media\u002Fteaser.png\" alt=\"teaser_img\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\"> We introduce \u003Cstrong>G3T\u003C\u002Fstrong>, a transformer that predicts upright, gravity-aligned pointmaps regardless of input image orientation, and \u003Cstrong>G3T-Long\u003C\u002Fstrong>, a pipeline that leverages this uprightness to enable robust long-sequence 3D reconstruction. Checkout our \u003Ca href=\"https:\u002F\u002Fg3t-paper.github.io\u002F\">project page\u003C\u002Fa> for interactive visualizations.\u003C\u002Fp>\n\n## Setup\n\nTo begin, create a conda environment:\n```\nconda create --name g3t python=3.10\nconda activate g3t\n```\n\nThen, execute the following commands within the conda environment to install all dependencies:\n```\npip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\npip install -r requirements.txt\n```\n\nPre-trained G3T model weights are available in [huggingface](https:\u002F\u002Fhuggingface.co\u002Fthatbrguy\u002Fg3t). By default, the inference script will automatically dowload the weights from this repository, but you can also manually download them if you prefer.\n\n## Feed-forward inference using G3T\n\nIn this section, we demonstrate how to obtain upright, gravity-aligned 3D reconstruction of a scene using G3T from a collection of images.\n\n### Step 1: Setup data\n\nTo begin, place your images in a folder. For demo purposes, we have included three self-captured scenes (the ones visualized in our [project page](https:\u002F\u002Fg3t-paper.github.io\u002F)) in `examples\u002Fg3t`.\n\n### Step 2: Run inference\n\nYou can run inference by executing the below command:\n\n```\npython run_inference.py \\\n    --input \".\u002Fexamples\u002Fg3t\u002Fbench\" \\\n    --output_dir \".\u002Foutput\u002Fbench\" \\\n    --backend \"feed_forward\"\n```\n\nThe script will automatically download pre-trained G3T weights from [HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fthatbrguy\u002Fg3t). If you'd rather use a local checkpoint, you can provide the path using `--ckpt_path`. For a full list of supported options, check out `vggt\u002Futils\u002Finference_utils.py`.\n\n### Step 3: Visualize results\n\nUse our viser-based visualizer to explore the reconstruction:\n\n```\npython visualize_results.py --scene_root \".\u002Foutput\u002Fbench\"\n```\n\nThis opens the visualizer at http:\u002F\u002Flocalhost:27272 by default. See `visualize_results.py` for additional options.\n\n## Long-sequence recontruction using G3T-Long\n\nIn this section, we demonstrate how to perform gravity-aligned submap-based reconstruction to process long video sequences robustly using G3T-Long.\n\n> **NOTE:** The VGGT-Long codebase implements loop closure correction modules in both Python and C++. Currently we only support the Python based loop closure correction module for G3T-Long. We plan to extend support for the C++ solver in a future release.\n\n### Step 1: Setup data\n\nTo begin, you should either provide a video, or a folder with a contiguous frames. If you pass a video, the script will automatically extract frames to a cache folder before processing.\n\nFor demo purposes, we provide a self-captured scene (the same one used in the alignment demo in the [project page](https:\u002F\u002Fg3t-paper.github.io\u002F)): [lounge.tar.gz](https:\u002F\u002Fhuggingface.co\u002Fthatbrguy\u002Fg3t\u002Fresolve\u002Fmain\u002Fexamples\u002Fg3t_long\u002Flounge.tar.gz). You can download this file, extract it using `tar -xzvf lounge.tar.gz`, and place the contents in `examples\u002Fg3t_long\u002Flounge`.\n\n### Step 2: Setup weights\n\nFollow the instructions in [download_weights.sh](https:\u002F\u002Fgithub.com\u002FDengKaiCQ\u002FVGGT-Long\u002Fblob\u002Fmain\u002Fscripts\u002Fdownload_weights.sh) in the VGGT-Long repository to download weights for SALAD, DINO and DBoW. Place the downloaded weights into `vggt_long\u002Fweights`. If you put them somewhere else, update the paths in `vggt_long\u002Fconfigs\u002Fg3t_long.yaml`.\n\n### Step 3: Run inference\n\nYou can run inference by executing the below command:\n\n```\npython run_inference.py \\\n    --input \".\u002Fexamples\u002Fg3t_long\u002Flounge\" \\\n    --output_dir \".\u002Foutput\u002Flounge\" \\\n    --backend \"g3t_long\" \\\n    --loop_enable\n```\n\nAs before, G3T weights are downloaded from HuggingFace automatically (override with --ckpt_path).\n\nA few useful hyperparameters to experiment with: --chunk_size, --overlap, --loop_chunk_size, and --loop_enable. The --loop_enable flag activates the loop closure mechanism, though note that the pipeline may not always detect loop closure events even when it's enabled.\n\nIf you pass a video, frames are extracted to `cache\u002Fframes` by default (you can modify this using `--cache_dir`). By default, the script will extract every 5th frame (you can modify this using `--nth_frame`). See `vggt\u002Futils\u002Finference_utils.py` for the full list of options.\n\n### Step 4: Visualize results\n\nUse our viser-based visualizer to explore the reconstruction:\n\n```\npython visualize_results.py --scene_root \".\u002Foutput\u002Flounge\"\n```\n\nThis opens the visualizer at http:\u002F\u002Flocalhost:27272 by default. See `visualize_results.py` for additional options.\n\n## Training G3T\n\n> **NOTE:** This section is not complete yet!\n\nG3T was trained on gravity-aligned data from five large-scale datasets (MegaDepth, Hypersim, ARKitScenes, DL3DV and TartanAir). Our model was fine-tuned from the VGGT-1B checkpoint.\n\nWe plan to release the data preprocessing code and training code soon. In the meantime, if you would like to dig into some implementation details behind the training process, the files `g3t_trainer.py` and `train_utils\u002Floss.py` should have some useful information.\n\n## TODOs\n- [ ] Complete the code release for the training section.\n- [ ] Extend support for the C++ based loop closure mechanism for G3T-Long.\n\n## Acknowledgements\n\nWe thank the authors of [DUSt3R](https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdust3r), [VGGT](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fvggt), [CUT3R](https:\u002F\u002Fgithub.com\u002FCUT3R\u002FCUT3R), and [VGGT-Long](https:\u002F\u002Fgithub.com\u002FDengKaiCQ\u002FVGGT-Long) for open-sourcing their projects, which our work builds upon. Additionally, we would like to thank Aditya Chetan, Haian Jin and Jay Karhade for their feedback on initial drafts of the paper.\n\n## Citation\n\nIf you find our work useful, please consider citing our paper:\n\n```\n@article{kani2026g3t,\n  author    = {Nagoor Kani, Bharath Raj and Snavely, Noah},\n  title     = {G3T Up! Gravity Aligned Coordinate Frames Simplify Pointmap Processing},\n  journal   = {arXiv preprint},\n  year      = {2026},\n}\n```\n","G3T是一个用于预测垂直于重力方向的点云图的Transformer模型，无论输入图像的方向如何，并且其扩展版本G3T-Long能够支持长序列的3D重建。该项目利用了Python语言实现，核心功能在于通过预测与重力对齐的坐标框架简化点云处理过程，从而提高3D重建的鲁棒性和准确性。用户可以通过简单的命令行操作完成环境配置、数据准备以及模型推理等步骤，非常适合需要从多视角图像中恢复三维场景结构的应用场景，如虚拟现实、增强现实和机器人导航等领域。",2,"2026-06-11 04:07:55","CREATED_QUERY"]