[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71996":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":30,"readmeContent":31,"aiSummary":32,"trendingCount":16,"starSnapshotCount":16,"syncStatus":33,"lastSyncTime":34,"discoverSource":35},71996,"Depth-Anything","LiheYoung\u002FDepth-Anything","LiheYoung","[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation","https:\u002F\u002Fdepth-anything.github.io",null,"Python",8119,615,49,127,0,5,30,15,39.37,"Apache License 2.0",false,"main",true,[26,27,28,29],"depth-estimation","image-synthesis","metric-depth-estimation","monocular-depth-estimation","2026-06-12 02:02:57","\u003Cdiv align=\"center\">\n\u003Ch2>Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data\u003C\u002Fh2>\n\n[**Lihe Yang**](https:\u002F\u002Fliheyoung.github.io\u002F)\u003Csup>1\u003C\u002Fsup> · [**Bingyi Kang**](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=NmHgX-wAAAAJ)\u003Csup>2&dagger;\u003C\u002Fsup> · [**Zilong Huang**](http:\u002F\u002Fspeedinghzl.github.io\u002F)\u003Csup>2\u003C\u002Fsup> · [**Xiaogang Xu**](https:\u002F\u002Fxiaogang00.github.io\u002F)\u003Csup>3,4\u003C\u002Fsup> · [**Jiashi Feng**](https:\u002F\u002Fsites.google.com\u002Fsite\u002Fjshfeng\u002F)\u003Csup>2\u003C\u002Fsup> · [**Hengshuang Zhao**](https:\u002F\u002Fhszhao.github.io\u002F)\u003Csup>1*\u003C\u002Fsup>\n\n\u003Csup>1\u003C\u002Fsup>HKU&emsp;&emsp;&emsp;&emsp;\u003Csup>2\u003C\u002Fsup>TikTok&emsp;&emsp;&emsp;&emsp;\u003Csup>3\u003C\u002Fsup>CUHK&emsp;&emsp;&emsp;&emsp;\u003Csup>4\u003C\u002Fsup>ZJU\n\n&dagger;project lead&emsp;*corresponding author\n\n**CVPR 2024**\n\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.10891\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Depth Anything-red' alt='Paper PDF'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fdepth-anything.github.io'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-Depth Anything-green' alt='Project Page'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FLiheYoung\u002FDepth-Anything'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2401.10891'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Paper-yellow'>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\nThis work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and **62M+ unlabeled images**.\n\n![teaser](assets\u002Fteaser.png)\n\n\u003Cdiv align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FDepthAnything\u002FDepth-Anything-V2\">\u003Cb>Try our latest Depth Anything V2 models!\u003C\u002Fb>\u003C\u002Fa>\u003Cbr>\n\u003C\u002Fdiv>\n\n## News\n\n* **2024-06-14:** [Depth Anything V2](https:\u002F\u002Fgithub.com\u002FDepthAnything\u002FDepth-Anything-V2) is released.\n* **2024-02-27:** Depth Anything is accepted by CVPR 2024.\n* **2024-02-05:** [Depth Anything Gallery](.\u002Fgallery.md) is released. Thank all the users!\n* **2024-02-02:** Depth Anything serves as the default depth processor for [InstantID](https:\u002F\u002Fgithub.com\u002FInstantID\u002FInstantID) and [InvokeAI](https:\u002F\u002Fgithub.com\u002Finvoke-ai\u002FInvokeAI\u002Freleases\u002Ftag\u002Fv3.6.1).\n* **2024-01-25:** Support [video depth visualization](.\u002Frun_video.py). An [online demo for video](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FJohanDL\u002FDepth-Anything-Video) is also available.\n* **2024-01-23:** The new ControlNet based on Depth Anything is integrated into [ControlNet WebUI](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet) and [ComfyUI's ControlNet](https:\u002F\u002Fgithub.com\u002FFannovel16\u002Fcomfyui_controlnet_aux).\n* **2024-01-23:** Depth Anything [ONNX](https:\u002F\u002Fgithub.com\u002Ffabio-sim\u002FDepth-Anything-ONNX) and [TensorRT](https:\u002F\u002Fgithub.com\u002Fspacewalk01\u002Fdepth-anything-tensorrt) versions are supported.\n* **2024-01-22:** Paper, project page, code, models, and demo ([HuggingFace](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FLiheYoung\u002FDepth-Anything), [OpenXLab](https:\u002F\u002Fopenxlab.org.cn\u002Fapps\u002Fdetail\u002Fyyfan\u002Fdepth_anything)) are released.\n\n\n## Features of Depth Anything\n\n***If you need other features, please first check [existing community supports](#community-support).***\n\n- **Relative depth estimation**:\n    \n    Our foundation models listed [here](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FLiheYoung\u002FDepth-Anything\u002Ftree\u002Fmain\u002Fcheckpoints) can provide relative depth estimation for any given image robustly. Please refer [here](#running) for details.\n\n- **Metric depth estimation**\n\n    We fine-tune our Depth Anything model with metric depth information from NYUv2 or KITTI. It offers strong capabilities of both in-domain and zero-shot metric depth estimation. Please refer [here](.\u002Fmetric_depth) for details.\n\n\n- **Better depth-conditioned ControlNet**\n\n    We re-train **a better depth-conditioned ControlNet** based on Depth Anything. It offers more precise synthesis than the previous MiDaS-based ControlNet. Please refer [here](.\u002Fcontrolnet\u002F) for details. You can also use our new ControlNet based on Depth Anything in [ControlNet WebUI](https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet) or [ComfyUI's ControlNet](https:\u002F\u002Fgithub.com\u002FFannovel16\u002Fcomfyui_controlnet_aux).\n\n- **Downstream high-level scene understanding**\n\n    The Depth Anything encoder can be fine-tuned to downstream high-level perception tasks, *e.g.*, semantic segmentation, 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K. Please refer [here](.\u002Fsemseg\u002F) for details.\n\n\n## Performance\n\nHere we compare our Depth Anything with the previously best MiDaS v3.1 BEiT\u003Csub>L-512\u003C\u002Fsub> model.\n\nPlease note that the latest MiDaS is also trained on KITTI and NYUv2, while we do not.\n\n| Method | Params | KITTI || NYUv2 || Sintel || DDAD || ETH3D || DIODE ||\n|-|-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|\n| | | AbsRel | $\\delta_1$ | AbsRel | $\\delta_1$ | AbsRel | $\\delta_1$ | AbsRel | $\\delta_1$ | AbsRel | $\\delta_1$ | AbsRel | $\\delta_1$ |\n| MiDaS | 345.0M | 0.127 | 0.850 | 0.048 | *0.980* | 0.587 | 0.699 | 0.251 | 0.766 | 0.139 | 0.867 | 0.075 | 0.942 | \n| **Ours-S** | 24.8M | 0.080 | 0.936 | 0.053 | 0.972 | 0.464 | 0.739 | 0.247 | 0.768 | 0.127 | **0.885** | 0.076 | 0.939 |\n| **Ours-B** | 97.5M | *0.080* | *0.939* | *0.046* | 0.979 | **0.432** | *0.756* | *0.232* | *0.786* | **0.126** | *0.884* | *0.069* | *0.946* |\n| **Ours-L** | 335.3M | **0.076** | **0.947** | **0.043** | **0.981** | *0.458* | **0.760** | **0.230** | **0.789** | *0.127* | 0.882 | **0.066** | **0.952** |\n\nWe highlight the **best** and *second best* results in **bold** and *italic* respectively (**better results**: AbsRel $\\downarrow$ , $\\delta_1 \\uparrow$).\n\n## Pre-trained models\n\nWe provide three models of varying scales for robust relative depth estimation:\n\n| Model | Params | Inference Time on V100 (ms) | A100 | RTX4090 ([TensorRT](https:\u002F\u002Fgithub.com\u002Fspacewalk01\u002Fdepth-anything-tensorrt)) |\n|:-|-:|:-:|:-:|:-:|\n| Depth-Anything-Small | 24.8M | 12 | 8 | 3 |\n| Depth-Anything-Base | 97.5M | 13 | 9 | 6 |\n| Depth-Anything-Large | 335.3M | 20 | 13 | 12 |\n\nNote that the V100 and A100 inference time (*without TensorRT*) is computed by excluding the pre-processing and post-processing stages, whereas the last column RTX4090 (*with TensorRT*) is computed by including these two stages (please refer to [Depth-Anything-TensorRT](https:\u002F\u002Fgithub.com\u002Fspacewalk01\u002Fdepth-anything-tensorrt)).\n\nYou can easily load our pre-trained models by:\n```python\nfrom depth_anything.dpt import DepthAnything\n\nencoder = 'vits' # can also be 'vitb' or 'vitl'\ndepth_anything = DepthAnything.from_pretrained('LiheYoung\u002Fdepth_anything_{:}14'.format(encoder))\n```\n\nDepth Anything is also supported in [``transformers``](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers). You can use it for depth prediction within [3 lines of code](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fmodel_doc\u002Fdepth_anything) (credit to [@niels](https:\u002F\u002Fhuggingface.co\u002Fnielsr)).\n\n### *No network connection, cannot load these models?*\n\n\u003Cdetails>\n\u003Csummary>Click here for solutions\u003C\u002Fsummary>\n\n- First, manually download the three checkpoints: [depth-anything-large](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FLiheYoung\u002FDepth-Anything\u002Fblob\u002Fmain\u002Fcheckpoints\u002Fdepth_anything_vitl14.pth), [depth-anything-base](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FLiheYoung\u002FDepth-Anything\u002Fblob\u002Fmain\u002Fcheckpoints\u002Fdepth_anything_vitb14.pth), and [depth-anything-small](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FLiheYoung\u002FDepth-Anything\u002Fblob\u002Fmain\u002Fcheckpoints\u002Fdepth_anything_vits14.pth).\n\n- Second, upload the folder containing the checkpoints to your remote server.\n\n- Lastly, load the model locally:\n```python\nfrom depth_anything.dpt import DepthAnything\n\nmodel_configs = {\n    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},\n    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},\n    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}\n}\n\nencoder = 'vitl' # or 'vitb', 'vits'\ndepth_anything = DepthAnything(model_configs[encoder])\ndepth_anything.load_state_dict(torch.load(f'.\u002Fcheckpoints\u002Fdepth_anything_{encoder}14.pth'))\n```\nNote that in this locally loading manner, you also do not have to install the ``huggingface_hub`` package. In this way, please feel free to delete this [line](https:\u002F\u002Fgithub.com\u002FLiheYoung\u002FDepth-Anything\u002Fblob\u002Fe7ef4b4b7a0afd8a05ce9564f04c1e5b68268516\u002Fdepth_anything\u002Fdpt.py#L5) and the ``PyTorchModelHubMixin`` in this [line](https:\u002F\u002Fgithub.com\u002FLiheYoung\u002FDepth-Anything\u002Fblob\u002Fe7ef4b4b7a0afd8a05ce9564f04c1e5b68268516\u002Fdepth_anything\u002Fdpt.py#L169).\n\u003C\u002Fdetails>\n\n\n## Usage \n\n### Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FLiheYoung\u002FDepth-Anything\ncd Depth-Anything\npip install -r requirements.txt\n```\n\n### Running\n\n```bash\npython run.py --encoder \u003Cvits | vitb | vitl> --img-path \u003Cimg-directory | single-img | txt-file> --outdir \u003Coutdir> [--pred-only] [--grayscale]\n```\nArguments:\n- ``--img-path``: you can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.\n- ``--pred-only`` is set to save the predicted depth map only. Without it, by default, we visualize both image and its depth map side by side.\n- ``--grayscale`` is set to save the grayscale depth map. Without it, by default, we apply a color palette to the depth map.\n\nFor example:\n```bash\npython run.py --encoder vitl --img-path assets\u002Fexamples --outdir depth_vis\n```\n\n**If you want to use Depth Anything on videos:**\n```bash\npython run_video.py --encoder vitl --video-path assets\u002Fexamples_video --outdir video_depth_vis\n```\n\n### Gradio demo \u003Ca href='https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgradio-app\u002Fgradio'>\u003C\u002Fa> \n\nTo use our gradio demo locally:\n\n```bash\npython app.py\n```\n\nYou can also try our [online demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FLiheYoung\u002FDepth-Anything).\n\n### Import Depth Anything to your project\n\nIf you want to use Depth Anything in your own project, you can simply follow [``run.py``](run.py) to load our models and define data pre-processing. \n\n\u003Cdetails>\n\u003Csummary>Code snippet (note the difference between our data pre-processing and that of MiDaS)\u003C\u002Fsummary>\n\n```python\nfrom depth_anything.dpt import DepthAnything\nfrom depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet\n\nimport cv2\nimport torch\nfrom torchvision.transforms import Compose\n\nencoder = 'vits' # can also be 'vitb' or 'vitl'\ndepth_anything = DepthAnything.from_pretrained('LiheYoung\u002Fdepth_anything_{:}14'.format(encoder)).eval()\n\ntransform = Compose([\n    Resize(\n        width=518,\n        height=518,\n        resize_target=False,\n        keep_aspect_ratio=True,\n        ensure_multiple_of=14,\n        resize_method='lower_bound',\n        image_interpolation_method=cv2.INTER_CUBIC,\n    ),\n    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n    PrepareForNet(),\n])\n\nimage = cv2.cvtColor(cv2.imread('your image path'), cv2.COLOR_BGR2RGB) \u002F 255.0\nimage = transform({'image': image})['image']\nimage = torch.from_numpy(image).unsqueeze(0)\n\n# depth shape: 1xHxW\ndepth = depth_anything(image)\n```\n\u003C\u002Fdetails>\n\n### Do not want to define image pre-processing or download model definition files?\n\nEasily use Depth Anything through [``transformers``](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) within 3 lines of code! Please refer to [these instructions](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fmodel_doc\u002Fdepth_anything) (credit to [@niels](https:\u002F\u002Fhuggingface.co\u002Fnielsr)).\n\n**Note:** If you encounter ``KeyError: 'depth_anything'``, please install the latest [``transformers``](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) from source:\n```bash\npip install git+https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers.git\n```\n\u003Cdetails>\n\u003Csummary>Click here for a brief demo:\u003C\u002Fsummary>\n\n```python\nfrom transformers import pipeline\nfrom PIL import Image\n\nimage = Image.open('Your-image-path')\npipe = pipeline(task=\"depth-estimation\", model=\"LiheYoung\u002Fdepth-anything-small-hf\")\ndepth = pipe(image)[\"depth\"]\n```\n\u003C\u002Fdetails>\n\n## Community Support\n\n**We sincerely appreciate all the extensions built on our Depth Anything from the community. Thank you a lot!**\n\nHere we list the extensions we have found:\n- Depth Anything TensorRT: \n    - https:\u002F\u002Fgithub.com\u002Fspacewalk01\u002Fdepth-anything-tensorrt\n    - https:\u002F\u002Fgithub.com\u002Fthinvy\u002FDepthAnythingTensorrtDeploy\n    - https:\u002F\u002Fgithub.com\u002Fdaniel89710\u002Ftrt-depth-anything\n- Depth Anything ONNX: https:\u002F\u002Fgithub.com\u002Ffabio-sim\u002FDepth-Anything-ONNX\n- Depth Anything in Transformers.js (3D visualization): https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FXenova\u002Fdepth-anything-web\n- Depth Anything for video (online demo): https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FJohanDL\u002FDepth-Anything-Video\n- Depth Anything in ControlNet WebUI: https:\u002F\u002Fgithub.com\u002FMikubill\u002Fsd-webui-controlnet\n- Depth Anything in ComfyUI's ControlNet: https:\u002F\u002Fgithub.com\u002FFannovel16\u002Fcomfyui_controlnet_aux\n- Depth Anything in X-AnyLabeling: https:\u002F\u002Fgithub.com\u002FCVHub520\u002FX-AnyLabeling\n- Depth Anything in OpenXLab: https:\u002F\u002Fopenxlab.org.cn\u002Fapps\u002Fdetail\u002Fyyfan\u002Fdepth_anything\n- Depth Anything in OpenVINO: https:\u002F\u002Fgithub.com\u002Fopenvinotoolkit\u002Fopenvino_notebooks\u002Ftree\u002Fmain\u002Fnotebooks\u002F280-depth-anything\n- Depth Anything ROS:\n    - https:\u002F\u002Fgithub.com\u002Fscepter914\u002FDepthAnything-ROS\n    - https:\u002F\u002Fgithub.com\u002Fpolatztrk\u002Fdepth_anything_ros\n- Depth Anything Android:\n    - https:\u002F\u002Fgithub.com\u002FFeiGeChuanShu\u002Fncnn-android-depth_anything\n    - https:\u002F\u002Fgithub.com\u002Fshubham0204\u002FDepth-Anything-Android\n- Depth Anything in TouchDesigner: https:\u002F\u002Fgithub.com\u002Folegchomp\u002FTDDepthAnything\n- LearnOpenCV research article on Depth Anything: https:\u002F\u002Flearnopencv.com\u002Fdepth-anything\n- Learn more about the DPT architecture we used: https:\u002F\u002Fgithub.com\u002Fheyoeyo\u002Fmuggled_dpt\n- Depth Anything in NVIDIA Jetson Orin: https:\u002F\u002Fgithub.com\u002FZhuYaoHui1998\u002Fjetson-examples\u002Fblob\u002Fmain\u002FreComputer\u002Fscripts\u002Fdepth-anything\n\n\nIf you have your amazing projects supporting or improving (*e.g.*, speed) Depth Anything, please feel free to drop an issue. We will add them here.\n\n\n## Acknowledgement\n\nWe would like to express our deepest gratitude to [AK(@_akhaliq)](https:\u002F\u002Ftwitter.com\u002F_akhaliq) and the awesome HuggingFace team ([@niels](https:\u002F\u002Fhuggingface.co\u002Fnielsr), [@hysts](https:\u002F\u002Fhuggingface.co\u002Fhysts), and [@yuvraj](https:\u002F\u002Fhuggingface.co\u002Fysharma)) for helping improve the online demo and build the HF models.\n\nBesides, we thank the [MagicEdit](https:\u002F\u002Fmagic-edit.github.io\u002F) team for providing some video examples for video depth estimation, and [Tiancheng Shen](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=iRY1YVoAAAAJ) for evaluating the depth maps with MagicEdit.\n\n## Citation\n\nIf you find this project useful, please consider citing:\n\n```bibtex\n@inproceedings{depthanything,\n      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, \n      author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},\n      booktitle={CVPR},\n      year={2024}\n}\n```\n","Depth Anything 是一个用于单目深度估计的基础模型，通过结合150万张标注图像和超过6200万张未标注图像进行训练。该项目的核心功能是利用大规模无标签数据来提高深度估计的准确性和鲁棒性，采用了先进的深度学习技术，特别是针对单目图像的深度预测。此外，它还支持视频深度可视化，并且已经被集成到多个知名AI平台中作为默认深度处理器。适用于需要高精度深度信息的应用场景，如自动驾驶、增强现实以及3D重建等领域。",2,"2026-06-11 03:39:53","high_star"]