[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72468":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":16,"starSnapshotCount":16,"syncStatus":34,"lastSyncTime":35,"discoverSource":36},72468,"MoGe","microsoft\u002FMoGe","microsoft","[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision","https:\u002F\u002Fwangrc.site\u002FMoGePage\u002F",null,"Python",2517,188,56,74,0,17,29,67,51,100.03,"Other",false,"main",true,[27,28,29,30],"3d-reconstruction","3d-vision","monocular-depth-estimation","monocular-geometry-estimation","2026-06-12 04:01:05","# MoGe: Accurate Monocular Geometry Estimation\n\nMoGe is a powerful model for recovering 3D geometry from monocular open-domain images, including metric point maps, metric depth maps, normal maps and camera FOV. ***Check our websites ([MoGe-1](https:\u002F\u002Fwangrc.site\u002FMoGePage), [MoGe-2](https:\u002F\u002Fwangrc.site\u002FMoGe2Page)) for videos and interactive results!***\n\n## 📖 Publications\n\n### MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.02546\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fwangrc.site\u002FMoGe2Page\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FRuicheng\u002FMoGe-2'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Demo_(MoGe_v2)-blue'>\u003C\u002Fa>\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8f9ae680-659d-4f7f-82e2-b9ed9d6b988a\n\n\u003C\u002Fdiv>\n\n### MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.19115\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fwangrc.site\u002FMoGePage\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FRuicheng\u002FMoGe'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Demo_(MoGe_v1)-blue'>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cimg src=\".\u002Fassets\u002Foverview_simplified.png\" width=\"100%\" alt=\"Method overview\" align=\"center\">\n\n\n## 🌟 Features\n\n* **Accurate 3D geometry estimation**: Estimate point maps & depth maps & [normal maps](docs\u002Fnormal.md) from open-domain single images with high precision -- all capabilities in one model, one forward pass.\n* **Optional ground-truth FOV input**: Enhance model accuracy further by providing the true field of view.\n* **Flexible resolution support**: Works seamlessly with various resolutions and aspect ratios, from 2:1 to 1:2.\n* **Optimized for speed**: Achieves 60ms latency per image (A100 or RTX3090, FP16, ViT-L). Adjustable inference resolution for even faster speed.\n\n## ✨ News\n\n***(2025-10-16)***\n* Updated training code for MoGe-2.\n\n***(2025-06-10)***\n\n* ❗**Released MoGe-2**, a state-of-the-art model for monocular geometry, with these new capabilities in one unified model:\n  * point map prediction in **metric scale**;\n  * comparable and even better performance over MoGe-1;\n  * significant improvement of **visual sharpness**;\n  * high-quality [**normal map** estimation](docs\u002Fnormal.md);\n  * lower inference latency.\n\n## 📦 Installation\n\n### Install via pip\n  \n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FMoGe.git\n```\n\n### Or clone this repository\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FMoGe.git\ncd MoGe\npip install -r requirements.txt   # install the requirements\n```\n\nNote: MoGe should be compatible with most requirements versions. Please check the `requirements.txt` for more details if you encounter any dependency issues.\n\n## 🤗 Pretrained Models\n\nOur pretrained models are available on the huggingface hub:\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Version\u003C\u002Fth>\n      \u003Cth>Hugging Face Model\u003C\u002Fth>\n      \u003Cth>Metric scale\u003C\u002Fth>\n      \u003Cth>Normal\u003C\u002Fth>\n      \u003Cth>#Params\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>MoGe-1\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FRuicheng\u002Fmoge-vitl\" target=\"_blank\">\u003Ccode>Ruicheng\u002Fmoge-vitl\u003C\u002Fcode>\u003Ca>\u003C\u002Ftd>\n      \u003Ctd>-\u003C\u002Ftd>\n      \u003Ctd>-\u003C\u002Ftd>\n      \u003Ctd>314M\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd rowspan=\"4\">MoGe-2\u003C\u002Ftd>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FRuicheng\u002Fmoge-2-vitl\" target=\"_blank\">\u003Ccode>Ruicheng\u002Fmoge-2-vitl\u003C\u002Fcode>\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>✅\u003C\u002Ftd>\n      \u003Ctd>-\u003C\u002Ftd>\n      \u003Ctd>326M\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FRuicheng\u002Fmoge-2-vitl-normal\" target=\"_blank\">\u003Ccode>Ruicheng\u002Fmoge-2-vitl-normal\u003C\u002Fcode>\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>✅\u003C\u002Ftd>\n      \u003Ctd>✅\u003C\u002Ftd>\n      \u003Ctd>331M\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FRuicheng\u002Fmoge-2-vitb-normal\" target=\"_blank\">\u003Ccode>Ruicheng\u002Fmoge-2-vitb-normal\u003C\u002Fcode>\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>✅\u003C\u002Ftd>\n      \u003Ctd>✅\u003C\u002Ftd>\n      \u003Ctd>104M\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FRuicheng\u002Fmoge-2-vits-normal\" target=\"_blank\">\u003Ccode>Ruicheng\u002Fmoge-2-vits-normal\u003C\u002Fcode>\u003C\u002Fa>\u003C\u002Ftd>\n      \u003Ctd>✅\u003C\u002Ftd>\n      \u003Ctd>✅\u003C\u002Ftd>\n      \u003Ctd>35M\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n\n> NOTE: `moge-2-vitl-normal` has full capabilities, with almost the same level of performance as `moge-2-vitl` plus extra normal map estimation.\n\nYou may import the `MoGeModel` class of the matched version, then load the pretrained weights via `MoGeModel.from_pretrained(\"HUGGING_FACE_MODEL_REPO_NAME\")` with automatic downloading.\nIf loading a local checkpoint, replace the model name with the local path.\n\nFor ONNX support, please refer to [docs\u002Fonnx.md](docs\u002Fonnx.md).\n\n## 💡 Minimal Code Example \n\nHere is a minimal example for loading the model and inferring on a single image. \n\n```python\nimport cv2\nimport torch\n# from moge.model.v1 import MoGeModel\nfrom moge.model.v2 import MoGeModel # Let's try MoGe-2\n\ndevice = torch.device(\"cuda\")\n\n# Load the model from huggingface hub (or load from local).\nmodel = MoGeModel.from_pretrained(\"Ruicheng\u002Fmoge-2-vitl-normal\").to(device)                             \n\n# Read the input image and convert to tensor (3, H, W) with RGB values normalized to [0, 1]\ninput_image = cv2.cvtColor(cv2.imread(\"PATH_TO_IMAGE.jpg\"), cv2.COLOR_BGR2RGB)                       \ninput_image = torch.tensor(input_image \u002F 255, dtype=torch.float32, device=device).permute(2, 0, 1)    \n\n# Infer \noutput = model.infer(input_image)\n\"\"\"\n`output` has keys \"points\", \"depth\", \"mask\", \"normal\" (optional) and \"intrinsics\",\nThe maps are in the same size as the input image. \n{\n    \"points\": (H, W, 3),    # point map in OpenCV camera coordinate system (x right, y down, z forward). For MoGe-2, the point map is in metric scale.\n    \"depth\": (H, W),        # depth map\n    \"normal\": (H, W, 3)     # normal map in OpenCV camera coordinate system. (available for MoGe-2-normal)\n    \"mask\": (H, W),         # a binary mask for valid pixels. \n    \"intrinsics\": (3, 3),   # normalized camera intrinsics\n}\n\"\"\"\n```\nFor more usage details, see the `MoGeModel.infer()` docstring.\n\n## 💡 Usage\n\n### Gradio demo | `moge app`\n\n> The demo for MoGe-1 is also available at our [Hugging Face Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FRuicheng\u002FMoGe).\n\n```bash\n# Using the command line tool\nmoge app        # will run MoGe-2 demo by default.\n\n# In this repo\npython moge\u002Fscripts\u002Fapp.py   # --share for Gradio public sharing\n```\n\nSee also [`moge\u002Fscripts\u002Fapp.py`](moge\u002Fscripts\u002Fapp.py) \n\n\n### Inference | `moge infer`\n\nRun the script `moge\u002Fscripts\u002Finfer.py` via the following command:\n\n```bash\n# Save the output [maps], [glb] and [ply] files\nmoge infer -i IMAGES_FOLDER_OR_IMAGE_PATH --o OUTPUT_FOLDER --maps --glb --ply\n\n# Show the result in a window (requires pyglet \u003C 2.0, e.g. pip install pyglet==1.5.29)\nmoge infer -i IMAGES_FOLDER_OR_IMAGE_PATH --o OUTPUT_FOLDER --show\n```\n\nFor detailed options, run `moge infer --help`:\n\n```\nUsage: moge infer [OPTIONS]\n\n  Inference script\n\nOptions:\n  -i, --input PATH            Input image or folder path. \"jpg\" and \"png\" are\n                              supported.\n  --fov_x FLOAT               If camera parameters are known, set the\n                              horizontal field of view in degrees. Otherwise,\n                              MoGe will estimate it.\n  -o, --output PATH           Output folder path\n  --pretrained TEXT           Pretrained model name or path. If not provided,\n                              the corresponding default model will be chosen.\n  --version [v1|v2]           Model version. Defaults to \"v2\"\n  --device TEXT               Device name (e.g. \"cuda\", \"cuda:0\", \"cpu\").\n                              Defaults to \"cuda\"\n  --fp16                      Use fp16 precision for much faster inference.\n  --resize INTEGER            Resize the image(s) & output maps to a specific\n                              size. Defaults to None (no resizing).\n  --resolution_level INTEGER  An integer [0-9] for the resolution level for\n                              inference. Higher value means more tokens and\n                              the finer details will be captured, but\n                              inference can be slower. Defaults to 9. Note\n                              that it is irrelevant to the output size, which\n                              is always the same as the input size.\n                              `resolution_level` actually controls\n                              `num_tokens`. See `num_tokens` for more details.\n  --num_tokens INTEGER        number of tokens used for inference. A integer\n                              in the (suggested) range of `[1200, 2500]`.\n                              `resolution_level` will be ignored if\n                              `num_tokens` is provided. Default: None\n  --threshold FLOAT           Threshold for removing edges. Defaults to 0.01.\n                              Smaller value removes more edges. \"inf\" means no\n                              thresholding.\n  --maps                      Whether to save the output maps (image, point\n                              map, depth map, normal map, mask) and fov.\n  --glb                       Whether to save the output as a.glb file. The\n                              color will be saved as a texture.\n  --ply                       Whether to save the output as a.ply file. The\n                              color will be saved as vertex colors.\n  --show                      Whether show the output in a window. Note that\n                              this requires pyglet\u003C2 installed as required by\n                              trimesh.\n  --help                      Show this message and exit.\n```\n\nSee also [`moge\u002Fscripts\u002Finfer.py`](moge\u002Fscripts\u002Finfer.py)\n\n### 360° panorama images | `moge infer_panorama` \n\n> *NOTE: This is an experimental extension of MoGe.*\n\nThe script will split the 360-degree panorama image into multiple perspective views and infer on each view separately. \nThe output maps will be combined to produce a panorama depth map and point map. \n\nNote that the panorama image must have spherical parameterization (e.g., environment maps or equirectangular images). Other formats must be converted to spherical format before using this script. Run `moge infer_panorama --help` for detailed options.\n\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fpanorama_pipeline.png\" width=\"80%\">\n\nThe photo is from [this URL](https:\u002F\u002Fcommons.wikimedia.org\u002Fwiki\u002FCategory:360%C2%B0_panoramas_with_equirectangular_projection#\u002Fmedia\u002FFile:Braunschweig_Sankt-%C3%84gidien_Panorama_02.jpg)\n\u003C\u002Fdiv>\n\nSee also [`moge\u002Fscripts\u002Finfer_panorama.py`](moge\u002Fscripts\u002Finfer_panorama.py)\n\n## 🏋️‍♂️ Training & Finetuning\n\nSee [docs\u002Ftrain.md](docs\u002Ftrain.md)\n\n## 🧪 Evaluation\n\nSee [docs\u002Feval.md](docs\u002Feval.md)\n\n## ⚖️ License\n\nMoGe code is released under the MIT license, except for DINOv2 code in `moge\u002Fmodel\u002Fdinov2` which is released by Meta AI under the Apache 2.0 license. \nSee [LICENSE](LICENSE) for more details.\n\n\n## 📜 Citation\n\nIf you find our work useful in your research, we gratefully request that you consider citing our paper:\n\n```\n@inproceedings{wang2025moge,\n  title={Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision},\n  author={Wang, Ruicheng and Xu, Sicheng and Dai, Cassie and Xiang, Jianfeng and Deng, Yu and Tong, Xin and Yang, Jiaolong},\n  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},\n  pages={5261--5271},\n  year={2025}\n}\n\n@misc{wang2025moge2,\n      title={MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details}, \n      author={Ruicheng Wang and Sicheng Xu and Yue Dong and Yu Deng and Jianfeng Xiang and Zelong Lv and Guangzhong Sun and Xin Tong and Jiaolong Yang},\n      year={2025},\n      eprint={2507.02546},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.02546}, \n}\n```\n","MoGe 是一个用于从单目开放域图像中恢复3D几何结构的强大模型，能够生成度量点图、深度图、法线图和相机视场。其核心功能包括高精度的3D几何估计、可选的真实视场输入以进一步提高准确性、对不同分辨率和宽高比的支持以及优化后的快速推理速度（在A100或RTX3090上达到每张图片60毫秒的延迟）。该工具非常适合需要从单张图片中获取高质量3D信息的应用场景，如增强现实、虚拟现实、自动驾驶等。",2,"2026-06-11 03:42:11","high_star"]