[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72384":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},72384,"mast3r","naver\u002Fmast3r","naver","Grounding Image Matching in 3D with MASt3R","",null,"Python",2989,269,32,94,0,11,24,91,33,29.29,"Other",false,"main",true,[],"2026-06-12 02:03:02","![banner](assets\u002Fmast3r.jpg)\n\nOfficial implementation of `Grounding Image Matching in 3D with MASt3R`  \n[[Project page](https:\u002F\u002Feurope.naverlabs.com\u002Fblog\u002Fmast3r-matching-and-stereo-3d-reconstruction\u002F)], [[MASt3R arxiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09756)], [[DUSt3R arxiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2312.14132)]  \n\n![Example of matching results obtained from MASt3R](assets\u002Fexamples.jpg)\n\n![High level overview of MASt3R's architecture](assets\u002Fmast3r_archi.jpg)\n\n```bibtex\n@misc{mast3r_eccv24,\n      title={Grounding Image Matching in 3D with MASt3R}, \n      author={Vincent Leroy and Yohann Cabon and Jerome Revaud},\n      booktitle = {ECCV},\n      year = {2024}\n}\n\n@misc{mast3r_arxiv24,\n      title={Grounding Image Matching in 3D with MASt3R}, \n      author={Vincent Leroy and Yohann Cabon and Jerome Revaud},\n      year={2024},\n      eprint={2406.09756},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n\n@inproceedings{dust3r_cvpr24,\n      title={DUSt3R: Geometric 3D Vision Made Easy}, \n      author={Shuzhe Wang and Vincent Leroy and Yohann Cabon and Boris Chidlovskii and Jerome Revaud},\n      booktitle = {CVPR},\n      year = {2024}\n}\n\n@inproceedings{\n    duisterhof2025mastrsfm,\n    title={{MAS}t3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion},\n    author={Bardienus Pieter Duisterhof and Lojze Zust and Philippe Weinzaepfel and Vincent Leroy and Yohann Cabon and Jerome Revaud},\n    booktitle={International Conference on 3D Vision 2025},\n    year={2025},\n    url={https:\u002F\u002Fopenreview.net\u002Fforum?id=5uw1GRBFoT}\n} \n```\n\n## Table of Contents\n\n- [Table of Contents](#table-of-contents)\n- [License](#license)\n- [Get Started](#get-started)\n  - [Installation](#installation)\n  - [Checkpoints](#checkpoints)\n    - [MASt3R Model](#mast3r-model)\n    - [Retrieval Model](#retrieval-model)\n    - [Dune Model](#dune-model)\n  - [MASt3R-SfM](#mast3r-sfm)\n  - [Interactive demo](#interactive-demo)\n  - [Interactive demo with docker](#interactive-demo-with-docker)\n- [Usage](#usage)\n  - [Usage MASt3R](#usage-mast3r)\n  - [Usage DUNE+MASt3R](#usage-dunemast3r)\n- [Training](#training)\n  - [Datasets](#datasets)\n  - [Demo](#demo)\n  - [Our Hyperparameters](#our-hyperparameters)\n- [Visual Localization](#visual-localization)\n  - [Dataset preparation](#dataset-preparation)\n  - [Example Commands](#example-commands)\n\n## License\n\nThe code is distributed under the CC BY-NC-SA 4.0 License.\nSee [LICENSE](LICENSE) for more information.\n\n```python\n# Copyright (C) 2024-present Naver Corporation. All rights reserved.\n# Licensed under CC BY-NC-SA 4.0 (non-commercial use only).\n```\n\n## Get Started\n\n### Installation\n\n1. Clone MASt3R.\n```bash\ngit clone --recursive https:\u002F\u002Fgithub.com\u002Fnaver\u002Fmast3r\ncd mast3r\n# if you have already cloned mast3r:\n# git submodule update --init --recursive\n```\n\n2. Create the environment, here we show an example using conda.\n```bash\nconda create -n mast3r python=3.11 cmake=3.14.0\nconda activate mast3r \nconda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia  # use the correct version of cuda for your system\npip install -r requirements.txt\npip install -r dust3r\u002Frequirements.txt\n# Optional: you can also install additional packages to:\n# - add support for HEIC images\n# - add required packages for visloc.py\npip install -r dust3r\u002Frequirements_optional.txt\n```\n\n3. compile and install ASMK\n```bash\npip install cython\n\ngit clone https:\u002F\u002Fgithub.com\u002Fjenicek\u002Fasmk\ncd asmk\u002Fcython\u002F\ncythonize *.pyx\ncd ..\npip install .  # or python3 setup.py build_ext --inplace\ncd ..\n```\n\n4. Optional, compile the cuda kernels for RoPE (as in CroCo v2).\n```bash\n# DUST3R relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.\ncd dust3r\u002Fcroco\u002Fmodels\u002Fcurope\u002F\npython setup.py build_ext --inplace\ncd ..\u002F..\u002F..\u002F..\u002F\n```\n\n### Checkpoints\n\n#### MASt3R Model\nYou can obtain the model checkpoints by two ways:\n\n1) You can use our huggingface_hub integration: the models will be downloaded automatically.\n\n2) Otherwise, download it from our server:\n\n| Modelname   | Training resolutions | Head | Encoder | Decoder |\n|-------------|----------------------|------|---------|---------|\n| [`MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric`](https:\u002F\u002Fdownload.europe.naverlabs.com\u002FComputerVision\u002FMASt3R\u002FMASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth) | 512x384, 512x336, 512x288, 512x256, 512x160 | CatMLP+DPT | ViT-L | ViT-B |\n\nYou can check the hyperparameters we used to train these models in the [section: Our Hyperparameters](#our-hyperparameters)\nMake sure to check license of the datasets we used. \n\nTo download `MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth`:\n```bash\nmkdir -p checkpoints\u002F\nwget https:\u002F\u002Fdownload.europe.naverlabs.com\u002FComputerVision\u002FMASt3R\u002FMASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth -P checkpoints\u002F\n```\n\nMake sure to agree to the license of all the training datasets we used, in addition to CC-BY-NC-SA 4.0. \nThe mapfree dataset license in particular is very restrictive. For more information, check [CHECKPOINTS_NOTICE](CHECKPOINTS_NOTICE).\n\n#### Retrieval Model\nThis retrieval model is for `MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric` only.\nYou need to download both the `trainingfree.pth` and `codebook.pkl` files, and put them in the same directory.\n\n[`MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_trainingfree`](https:\u002F\u002Fdownload.europe.naverlabs.com\u002FComputerVision\u002FMASt3R\u002FMASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_trainingfree.pth)  \n[`MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_codebook`](https:\u002F\u002Fdownload.europe.naverlabs.com\u002FComputerVision\u002FMASt3R\u002FMASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_codebook.pkl)  \n\n```bash\nmkdir -p checkpoints\u002F\nwget https:\u002F\u002Fdownload.europe.naverlabs.com\u002FComputerVision\u002FMASt3R\u002FMASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_trainingfree.pth -P checkpoints\u002F\nwget https:\u002F\u002Fdownload.europe.naverlabs.com\u002FComputerVision\u002FMASt3R\u002FMASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric_retrieval_codebook.pkl -P checkpoints\u002F\n```\n\n#### Dune Model\n\nWe added partial support of the [Dune](https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdune) encoder. Check the associated [Dune License](https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdune\u002Fblob\u002Fmain\u002FProject%20NLE%20DUNE%20LICENSE.txt).  \nYou can find the MASt3R decoder that goes with it here:\n\n[`dunemast3r_cvpr25_vitbase`](https:\u002F\u002Fdownload.europe.naverlabs.com\u002Fdune\u002Fdunemast3r_cvpr25_vitbase.pth)  \n[`dunemast3r_cvpr25_vitsmall`](https:\u002F\u002Fdownload.europe.naverlabs.com\u002Fdune\u002Fdunemast3r_cvpr25_vitsmall.pth)  \n\n```bash\nmkdir -p checkpoints\u002F\nwget https:\u002F\u002Fdownload.europe.naverlabs.com\u002Fdune\u002Fdunemast3r_cvpr25_vitbase.pth -P checkpoints\u002F\n```\n\nThis model have limited compatility with the rest of the codebase, but we wanted to include it as it achieves impressive results on the Map-free Visual Relocalization benchmark.\nMake sure to check the [Usage DUNE+MASt3R](#usage-dune-mast3r) section if you are interested.\n\n### MASt3R-SfM\n\nA few words about the addition of MASt3R-SfM to this repository. \n\nMASt3R-SfM refers to the make_pairs (retrieval) + sparse_global_alignment that you can find here: [demo.py#L142](mast3r\u002Fdemo.py#L142). \n\nIn this repository, you will also find `kapture_mast3r_mapping.py` and `demo_glomap.py`. These two scripts are unrelated to MASt3R-SfM. They are \"toys\" that attempt to use mast3r matches to do standard Sfm reconstructions with colmap\u002Fglomap. As such, they were not extensively tested, and may fail on edge cases.\n\n### Interactive demo\n\nWe made one huggingface space running the new sparse global alignment in a simplified demo for small scenes: [naver\u002FMASt3R](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fnaver\u002FMASt3R)\nThere are two demos available to run locally:\n\n```\ndemo.py is the updated demo for MASt3R. It uses our new sparse global alignment method that allows you to reconstruct larger scenes\n\npython3 demo.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric\n\n# Use --weights to load a checkpoint from a local file, eg --weights checkpoints\u002FMASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth\n# Use --retrieval_model and point to the retrieval checkpoint (*trainingfree.pth) to enable retrieval as a pairing strategy, asmk must be installed\n# Use --local_network to make it accessible on the local network, or --server_name to specify the url manually\n# Use --server_port to change the port, by default it will search for an available port starting at 7860\n# Use --device to use a different device, by default it's \"cuda\"\n\ndemo_dust3r_ga.py is the same demo as in dust3r (+ compatibility for MASt3R models)\nsee https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdust3r?tab=readme-ov-file#interactive-demo for details\n```\n\n### Interactive demo with docker\n\nTODO update with asmk\u002Fretrieval model\n\nTo run MASt3R using Docker, including with NVIDIA CUDA support, follow these instructions:\n\n1. **Install Docker**: If not already installed, download and install `docker` and `docker compose` from the [Docker website](https:\u002F\u002Fwww.docker.com\u002Fget-started).\n\n2. **Install NVIDIA Docker Toolkit**: For GPU support, install the NVIDIA Docker toolkit from the [Nvidia website](https:\u002F\u002Fdocs.nvidia.com\u002Fdatacenter\u002Fcloud-native\u002Fcontainer-toolkit\u002Flatest\u002Finstall-guide.html).\n\n3. **Build the Docker image and run it**: `cd` into the `.\u002Fdocker` directory and run the following commands: \n\n```bash\ncd docker\nbash run.sh --with-cuda --model_name=\"MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric\"\n```\n\nOr if you want to run the demo without CUDA support, run the following command:\n\n```bash \ncd docker\nbash run.sh --model_name=\"MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric\"\n```\n\nBy default, `demo.py` is launched with the option `--local_network`.  \nVisit `http:\u002F\u002Flocalhost:7860\u002F` to access the web UI (or replace `localhost` with the machine's name to access it from the network).  \n\n`run.sh` will launch docker-compose using either the [docker-compose-cuda.yml](docker\u002Fdocker-compose-cuda.yml) or [docker-compose-cpu.ym](docker\u002Fdocker-compose-cpu.yml) config file, then it starts the demo using [entrypoint.sh](docker\u002Ffiles\u002Fentrypoint.sh).\n\n___\n\n![demo](assets\u002Fdemo.jpg)\n\n## Usage\n### Usage MASt3R\n\n\u003Cdetails>\n\u003Csummary>\nCode sample to compute matches with MASt3R for a pair of images\n\u003C\u002Fsummary>\n\n```python\nfrom mast3r.model import AsymmetricMASt3R\nfrom mast3r.fast_nn import fast_reciprocal_NNs\n\nimport mast3r.utils.path_to_dust3r\nfrom dust3r.inference import inference\nfrom dust3r.utils.image import load_images\n\nif __name__ == '__main__':\n    device = 'cuda'\n    model_name = \"naver\u002FMASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric\"\n    # you can put the path to a local checkpoint in model_name if needed\n    model = AsymmetricMASt3R.from_pretrained(model_name).to(device)\n    images = load_images(['dust3r\u002Fcroco\u002Fassets\u002FChateau1.png', 'dust3r\u002Fcroco\u002Fassets\u002FChateau2.png'], size=512)\n    output = inference([tuple(images)], model, device, batch_size=1, verbose=False)\n\n    # at this stage, you have the raw dust3r predictions\n    view1, pred1 = output['view1'], output['pred1']\n    view2, pred2 = output['view2'], output['pred2']\n\n    desc1, desc2 = pred1['desc'].squeeze(0).detach(), pred2['desc'].squeeze(0).detach()\n\n    # find 2D-2D matches between the two images\n    matches_im0, matches_im1 = fast_reciprocal_NNs(desc1, desc2, subsample_or_initxy1=8,\n                                                   device=device, dist='dot', block_size=2**13)\n\n    # ignore small border around the edge\n    H0, W0 = view1['true_shape'][0]\n    valid_matches_im0 = (matches_im0[:, 0] >= 3) & (matches_im0[:, 0] \u003C int(W0) - 3) & (\n        matches_im0[:, 1] >= 3) & (matches_im0[:, 1] \u003C int(H0) - 3)\n\n    H1, W1 = view2['true_shape'][0]\n    valid_matches_im1 = (matches_im1[:, 0] >= 3) & (matches_im1[:, 0] \u003C int(W1) - 3) & (\n        matches_im1[:, 1] >= 3) & (matches_im1[:, 1] \u003C int(H1) - 3)\n\n    valid_matches = valid_matches_im0 & valid_matches_im1\n    matches_im0, matches_im1 = matches_im0[valid_matches], matches_im1[valid_matches]\n\n    # visualize a few matches\n    import numpy as np\n    import torch\n    import torchvision.transforms.functional\n    from matplotlib import pyplot as pl\n\n    n_viz = 20\n    num_matches = matches_im0.shape[0]\n    match_idx_to_viz = np.round(np.linspace(0, num_matches - 1, n_viz)).astype(int)\n    viz_matches_im0, viz_matches_im1 = matches_im0[match_idx_to_viz], matches_im1[match_idx_to_viz]\n\n    image_mean = torch.as_tensor([0.5, 0.5, 0.5], device='cpu').reshape(1, 3, 1, 1)\n    image_std = torch.as_tensor([0.5, 0.5, 0.5], device='cpu').reshape(1, 3, 1, 1)\n\n    viz_imgs = []\n    for i, view in enumerate([view1, view2]):\n        rgb_tensor = view['img'] * image_std + image_mean\n        viz_imgs.append(rgb_tensor.squeeze(0).permute(1, 2, 0).cpu().numpy())\n\n    H0, W0, H1, W1 = *viz_imgs[0].shape[:2], *viz_imgs[1].shape[:2]\n    img0 = np.pad(viz_imgs[0], ((0, max(H1 - H0, 0)), (0, 0), (0, 0)), 'constant', constant_values=0)\n    img1 = np.pad(viz_imgs[1], ((0, max(H0 - H1, 0)), (0, 0), (0, 0)), 'constant', constant_values=0)\n    img = np.concatenate((img0, img1), axis=1)\n    pl.figure()\n    pl.imshow(img)\n    cmap = pl.get_cmap('jet')\n    for i in range(n_viz):\n        (x0, y0), (x1, y1) = viz_matches_im0[i].T, viz_matches_im1[i].T\n        pl.plot([x0, x1 + W0], [y0, y1], '-+', color=cmap(i \u002F (n_viz - 1)), scalex=False, scaley=False)\n    pl.show(block=True)\n```\n\n\u003C\u002Fdetails>\n\n![matching example on croco pair](assets\u002Fmatching.jpg)\n\n### Usage DUNE+MASt3R\n\nAt the moment, you can only do two things:\n\n1) Extract matches, following the subset of code below\n2) Run the `demo_dust3r_ga.py` script with option `--weights checkpoints\u002Fdunemast3r_cvpr25_vitbase.pth --image_size 518`\n\n\u003Cdetails>\n\u003Csummary>\nCode sample to compute matches with DUNE+MASt3R for a pair of images\n\u003C\u002Fsummary>\n\n```python\nfrom mast3r.model import load_dune_mast3r_model\nfrom mast3r.fast_nn import fast_reciprocal_NNs\n\nimport mast3r.utils.path_to_dust3r  # noqa\nfrom dust3r.utils.image import load_images\nfrom dust3r.inference import inference\n\nimport torch\n\nif __name__ == '__main__':\n    device = torch.device('cuda:0')\n    model = load_dune_mast3r_model('checkpoints\u002Fdunemast3r_cvpr25_vitbase.pth', device)\n\n    images = load_images(['dust3r\u002Fcroco\u002Fassets\u002FChateau1.png', 'dust3r\u002Fcroco\u002Fassets\u002FChateau2.png'],\n                        size=518, patch_size=model.patch_size, square_ok=True)\n\n    output = inference([tuple(images)], model, device, batch_size=1, verbose=False)\n\n    # at this stage, you have the raw dust3r predictions\n    view1, pred1 = output['view1'], output['pred1']\n    view2, pred2 = output['view2'], output['pred2']\n\n    desc1, desc2 = pred1['desc'].squeeze(0).detach(), pred2['desc'].squeeze(0).detach()\n\n    # find 2D-2D matches between the two images\n    matches_im0, matches_im1 = fast_reciprocal_NNs(desc1, desc2, subsample_or_initxy1=8,\n                                                device=device, dist='dot', block_size=2**13)\n\n    # ignore small border around the edge\n    H0, W0 = view1['true_shape'][0]\n    valid_matches_im0 = (matches_im0[:, 0] >= 3) & (matches_im0[:, 0] \u003C int(W0) - 3) & (\n        matches_im0[:, 1] >= 3) & (matches_im0[:, 1] \u003C int(H0) - 3)\n\n    H1, W1 = view2['true_shape'][0]\n    valid_matches_im1 = (matches_im1[:, 0] >= 3) & (matches_im1[:, 0] \u003C int(W1) - 3) & (\n        matches_im1[:, 1] >= 3) & (matches_im1[:, 1] \u003C int(H1) - 3)\n\n    valid_matches = valid_matches_im0 & valid_matches_im1\n    matches_im0, matches_im1 = matches_im0[valid_matches], matches_im1[valid_matches]\n\n    # visualize a few matches\n    import numpy as np\n    import torch\n    import torchvision.transforms.functional\n    from matplotlib import pyplot as pl\n\n    n_viz = 20\n    num_matches = matches_im0.shape[0]\n    match_idx_to_viz = np.round(np.linspace(0, num_matches - 1, n_viz)).astype(int)\n    viz_matches_im0, viz_matches_im1 = matches_im0[match_idx_to_viz], matches_im1[match_idx_to_viz]\n\n    image_mean = torch.as_tensor([0.5, 0.5, 0.5], device='cpu').reshape(1, 3, 1, 1)\n    image_std = torch.as_tensor([0.5, 0.5, 0.5], device='cpu').reshape(1, 3, 1, 1)\n\n    viz_imgs = []\n    for i, view in enumerate([view1, view2]):\n        rgb_tensor = view['img'] * image_std + image_mean\n        viz_imgs.append(rgb_tensor.squeeze(0).permute(1, 2, 0).cpu().numpy())\n\n    H0, W0, H1, W1 = *viz_imgs[0].shape[:2], *viz_imgs[1].shape[:2]\n    img0 = np.pad(viz_imgs[0], ((0, max(H1 - H0, 0)), (0, 0), (0, 0)), 'constant', constant_values=0)\n    img1 = np.pad(viz_imgs[1], ((0, max(H0 - H1, 0)), (0, 0), (0, 0)), 'constant', constant_values=0)\n    img = np.concatenate((img0, img1), axis=1)\n    pl.figure()\n    pl.imshow(img)\n    cmap = pl.get_cmap('jet')\n    for i in range(n_viz):\n        (x0, y0), (x1, y1) = viz_matches_im0[i].T, viz_matches_im1[i].T\n        pl.plot([x0, x1 + W0], [y0, y1], '-+', color=cmap(i \u002F (n_viz - 1)), scalex=False, scaley=False)\n    pl.show(block=True)\n\n```\n\u003C\u002Fdetails>\n\n## Training\n\nIn this section, we present a short demonstration to get started with training MASt3R.\n\n### Datasets\n\nSee [Datasets section in DUSt3R](https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdust3r?tab=readme-ov-file#datasets)\n\n### Demo\n\nLike for the DUSt3R training demo, we're going to download and prepare the same subset of [CO3Dv2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fco3d) - [Creative Commons Attribution-NonCommercial 4.0 International](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fco3d\u002Fblob\u002Fmain\u002FLICENSE) and launch the training code on it.\nIt is the exact same process as DUSt3R.\nThe demo model will be trained for a few epochs on a very small dataset.\nIt will not be very good.\n\n```bash\n# download and prepare the co3d subset\nmkdir -p data\u002Fco3d_subset\ncd data\u002Fco3d_subset\ngit clone https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fco3d\ncd co3d\npython3 .\u002Fco3d\u002Fdownload_dataset.py --download_folder ..\u002F --single_sequence_subset\nrm ..\u002F*.zip\ncd ..\u002F..\u002F..\n\npython3 datasets_preprocess\u002Fpreprocess_co3d.py --co3d_dir data\u002Fco3d_subset --output_dir data\u002Fco3d_subset_processed  --single_sequence_subset\n\n# download the pretrained dust3r checkpoint\nmkdir -p checkpoints\u002F\nwget https:\u002F\u002Fdownload.europe.naverlabs.com\u002FComputerVision\u002FDUSt3R\u002FDUSt3R_ViTLarge_BaseDecoder_512_dpt.pth -P checkpoints\u002F\n\n# for this example we'll do fewer epochs, for the actual hyperparameters we used in the paper, see the next section: \"Our Hyperparameters\"\ntorchrun --nproc_per_node=4 train.py \\\n    --train_dataset \"1000 @ Co3d(split='train', ROOT='data\u002Fco3d_subset_processed', aug_crop='auto', aug_monocular=0.005, aug_rot90='diff', mask_bg='rand', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], n_corres=8192, nneg=0.5, transform=ColorJitter)\" \\\n    --test_dataset \"100 @ Co3d(split='test', ROOT='data\u002Fco3d_subset_processed', resolution=(512,384), n_corres=1024, seed=777)\" \\\n    --model \"AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True)\" \\\n    --train_criterion \"ConfLoss(Regr3D(L21, norm_mode='?avg_dis'), alpha=0.2) + 0.075*ConfMatchingLoss(MatchingLoss(InfoNCE(mode='proper', temperature=0.05), negatives_padding=0, blocksize=8192), alpha=10.0, confmode='mean')\" \\\n    --test_criterion \"Regr3D_ScaleShiftInv(L21, norm_mode='?avg_dis', gt_scale=True, sky_loss_value=0) + -1.*MatchingLoss(APLoss(nq='torch', fp=torch.float16), negatives_padding=12288)\" \\\n    --pretrained \"checkpoints\u002FDUSt3R_ViTLarge_BaseDecoder_512_dpt.pth\" \\\n    --lr 0.0001 --min_lr 1e-06 --warmup_epochs 1 --epochs 10 --batch_size 4 --accum_iter 4 \\\n    --save_freq 1 --keep_freq 5 --eval_freq 1 --disable_cudnn_benchmark \\\n    --output_dir \"checkpoints\u002Fmast3r_demo\"\n\n```\n\n### Our Hyperparameters\nWe didn't release all the training datasets, but here are the commands we used for training our models:\n\n```bash\n# MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric - train mast3r with metric regression and matching loss\n# we used cosxl to generate variations of DL3DV: \"foggy\", \"night\", \"rainy\", \"snow\", \"sunny\" but we were not convinced by it.\n\ntorchrun --nproc_per_node=8 train.py \\\n    --train_dataset \"57_000 @ Habitat512(1_000_000, split='train', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 68_400 @ BlendedMVS(split='train', mask_sky=True, resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 68_400 @ MegaDepth(split='train', mask_sky=True, resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 45_600 @ ARKitScenes(split='train', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 22_800 @ Co3d(split='train', mask_bg='rand', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 22_800 @ StaticThings3D(mask_bg='rand', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 45_600 @ ScanNetpp(split='train', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 45_600 @ TartanAir(pairs_subset='', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 4_560 @ UnrealStereo4K(resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 1_140 @ VirtualKitti(optical_center_is_centered=True, resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 22_800 @ WildRgbd(split='train', mask_bg='rand', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 145_920 @ NianticMapFree(split='train', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 57_000 @ DL3DV(split='nlight', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 57_000 @ DL3DV(split='not-nlight', cosxl_augmentations=None, resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 34_200 @ InternalUnreleasedDataset(resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5)\" \\\n    --test_dataset \"Habitat512(1_000, split='val', resolution=(512,384), seed=777, n_corres=1024) + 1_000 @ BlendedMVS(split='val', resolution=(512,384), mask_sky=True, seed=777, n_corres=1024) + 1_000 @ ARKitScenes(split='test', resolution=(512,384), seed=777, n_corres=1024) + 1_000 @ MegaDepth(split='val', mask_sky=True, resolution=(512,336), seed=777, n_corres=1024) + 1_000 @ Co3d(split='test', resolution=(512,384), mask_bg='rand', seed=777, n_corres=1024)\" \\\n    --model \"AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf))\" \\\n    --train_criterion \"ConfLoss(Regr3D(L21, norm_mode='?avg_dis'), alpha=0.2, loss_in_log=False) + 0.075*ConfMatchingLoss(MatchingLoss(InfoNCE(mode='proper', temperature=0.05), negatives_padding=0, blocksize=8192), alpha=10.0, confmode='mean')\" \\\n    --test_criterion \"Regr3D(L21, norm_mode='?avg_dis', gt_scale=True, sky_loss_value=0) + -1.*MatchingLoss(APLoss(nq='torch', fp=torch.float16), negatives_padding=12288)\" \\\n    --pretrained \"checkpoints\u002FDUSt3R_ViTLarge_BaseDecoder_512_dpt.pth\" \\\n    --lr 0.0001 --min_lr 1e-06 --warmup_epochs 8 --epochs 50 --batch_size 4 --accum_iter 2 \\\n    --save_freq 1 --keep_freq 5 --eval_freq 1 --print_freq=10 --disable_cudnn_benchmark \\\n    --output_dir \"checkpoints\u002FMASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric\"\n\n```\n\n## Visual Localization\n\n### Dataset preparation\n\nSee [Visloc section in DUSt3R](https:\u002F\u002Fgithub.com\u002Fnaver\u002Fdust3r\u002Fblob\u002Fmain\u002Fdust3r_visloc\u002FREADME.md#dataset-preparation)\n\n### Example Commands\n\nWith `visloc.py` you can run our visual localization experiments on Aachen-Day-Night, InLoc, Cambridge Landmarks and 7 Scenes.\n\n\n```bash\n# Aachen-Day-Night-v1.1:\n# scene in 'day' 'night'\n# scene can also be 'all'\npython3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset \"VislocAachenDayNight('\u002Fpath\u002Fto\u002Fprepared\u002FAachen-Day-Night-v1.1\u002F', subscene='${scene}', pairsfile='fire_top50', topk=20)\" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir \u002Fpath\u002Fto\u002Foutput\u002FAachen-Day-Night-v1.1\u002F${scene}\u002Floc\n\n# or with coarse to fine:\n\npython3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset \"VislocAachenDayNight('\u002Fpath\u002Fto\u002Fprepared\u002FAachen-Day-Night-v1.1\u002F', subscene='${scene}', pairsfile='fire_top50', topk=20)\" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir \u002Fpath\u002Fto\u002Foutput\u002FAachen-Day-Night-v1.1\u002F${scene}\u002Floc --coarse_to_fine --max_batch_size 48 --c2f_crop_with_homography\n\n# InLoc\npython3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset \"VislocInLoc('\u002Fpath\u002Fto\u002Fprepared\u002FInLoc\u002F', pairsfile='pairs-query-netvlad40-temporal', topk=20)\" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir \u002Fpath\u002Fto\u002Foutput\u002FInLoc\u002Floc\n\n# or with coarse to fine:\n\npython3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset \"VislocInLoc('\u002Fpath\u002Fto\u002Fprepared\u002FInLoc\u002F', pairsfile='pairs-query-netvlad40-temporal', topk=20)\" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir \u002Fpath\u002Fto\u002Foutput\u002FInLoc\u002Floc --coarse_to_fine --max_image_size 1200 --max_batch_size 48 --c2f_crop_with_homography\n\n# 7-scenes:\n# scene in 'chess' 'fire' 'heads' 'office' 'pumpkin' 'redkitchen' 'stairs'\npython3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset \"VislocSevenScenes('\u002Fpath\u002Fto\u002Fprepared\u002F7-scenes\u002F', subscene='${scene}', pairsfile='APGeM-LM18_top20', topk=1)\" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir \u002Fpath\u002Fto\u002Foutput\u002F7-scenes\u002F${scene}\u002Floc\n\n# Cambridge Landmarks:\n# scene in 'ShopFacade' 'GreatCourt' 'KingsCollege' 'OldHospital' 'StMarysChurch'\npython3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset \"VislocCambridgeLandmarks('\u002Fpath\u002Fto\u002Fprepared\u002FCambridge_Landmarks\u002F', subscene='${scene}', pairsfile='APGeM-LM18_top50', topk=20)\" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir \u002Fpath\u002Fto\u002Foutput\u002FCambridge_Landmarks\u002F${scene}\u002Floc\n\n```\n","MASt3R 项目提供了一种基于图像匹配的三维重建技术。其核心功能是通过结合图像匹配和立体视觉来实现高质量的3D重建，支持从多视角图像中提取深度信息并生成精确的3D模型。该项目使用Python开发，并提供了详细的安装指南和预训练模型，方便用户快速上手。特别适合需要进行大规模场景建模、机器人导航以及增强现实应用等场景。此外，还包含了一个交互式演示版本，便于理解和测试其算法效果。",2,"2026-06-11 03:41:50","high_star"]