[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71997":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},71997,"Depth-Anything-V2","DepthAnything\u002FDepth-Anything-V2","DepthAnything","[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation","https:\u002F\u002Fdepth-anything-v2.github.io",null,"Python",8260,851,53,219,0,34,58,170,102,39.79,"Apache License 2.0",false,"main",true,[27],"monocular-depth-estimation","2026-06-12 02:02:57","\u003Cdiv align=\"center\">\n\u003Ch1>Depth Anything V2\u003C\u002Fh1>\n\n[**Lihe Yang**](https:\u002F\u002Fliheyoung.github.io\u002F)\u003Csup>1\u003C\u002Fsup> · [**Bingyi Kang**](https:\u002F\u002Fbingykang.github.io\u002F)\u003Csup>2&dagger;\u003C\u002Fsup> · [**Zilong Huang**](http:\u002F\u002Fspeedinghzl.github.io\u002F)\u003Csup>2\u003C\u002Fsup>\n\u003Cbr>\n[**Zhen Zhao**](http:\u002F\u002Fzhaozhen.me\u002F) · [**Xiaogang Xu**](https:\u002F\u002Fxiaogang00.github.io\u002F) · [**Jiashi Feng**](https:\u002F\u002Fsites.google.com\u002Fsite\u002Fjshfeng\u002F)\u003Csup>2\u003C\u002Fsup> · [**Hengshuang Zhao**](https:\u002F\u002Fhszhao.github.io\u002F)\u003Csup>1*\u003C\u002Fsup>\n\n\u003Csup>1\u003C\u002Fsup>HKU&emsp;&emsp;&emsp;\u003Csup>2\u003C\u002Fsup>TikTok\n\u003Cbr>\n&dagger;project lead&emsp;*corresponding author\n\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.09414\">\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Depth Anything V2-red' alt='Paper PDF'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fdepth-anything-v2.github.io'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-Depth Anything V2-green' alt='Project Page'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdepth-anything\u002FDepth-Anything-V2'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Demo-blue'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fdepth-anything\u002FDA-2K'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBenchmark-DA--2K-yellow' alt='Benchmark'>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\nThis work presents Depth Anything V2. It significantly outperforms [V1](https:\u002F\u002Fgithub.com\u002FLiheYoung\u002FDepth-Anything) in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy.\n\n![teaser](assets\u002Fteaser.png)\n\n\n## News\n- **2025-01-22:** [Video Depth Anything](https:\u002F\u002Fvideodepthanything.github.io) has been released. It generates consistent depth maps for super-long videos (e.g., over 5 minutes).\n- **2024-12-22:** [Prompt Depth Anything](https:\u002F\u002Fpromptda.github.io\u002F) has been released. It supports 4K resolution metric depth estimation when low-res LiDAR is used to prompt the DA models.\n- **2024-07-06:** Depth Anything V2 is supported in [Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002F). See the [instructions](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fdepth_anything_v2) for convenient usage.\n- **2024-06-25:** Depth Anything is integrated into [Apple Core ML Models](https:\u002F\u002Fdeveloper.apple.com\u002Fmachine-learning\u002Fmodels\u002F). See the instructions ([V1](https:\u002F\u002Fhuggingface.co\u002Fapple\u002Fcoreml-depth-anything-small), [V2](https:\u002F\u002Fhuggingface.co\u002Fapple\u002Fcoreml-depth-anything-v2-small)) for usage.\n- **2024-06-22:** We release [smaller metric depth models](https:\u002F\u002Fgithub.com\u002FDepthAnything\u002FDepth-Anything-V2\u002Ftree\u002Fmain\u002Fmetric_depth#pre-trained-models) based on Depth-Anything-V2-Small and Base.\n- **2024-06-20:** Our repository and project page are flagged by GitHub and removed from the public for 6 days. Sorry for the inconvenience.\n- **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released.\n\n\n## Pre-trained Models\n\nWe provide **four models** of varying scales for robust relative depth estimation:\n\n| Model | Params | Checkpoint |\n|:-|-:|:-:|\n| Depth-Anything-V2-Small | 24.8M | [Download](https:\u002F\u002Fhuggingface.co\u002Fdepth-anything\u002FDepth-Anything-V2-Small\u002Fresolve\u002Fmain\u002Fdepth_anything_v2_vits.pth?download=true) |\n| Depth-Anything-V2-Base | 97.5M | [Download](https:\u002F\u002Fhuggingface.co\u002Fdepth-anything\u002FDepth-Anything-V2-Base\u002Fresolve\u002Fmain\u002Fdepth_anything_v2_vitb.pth?download=true) |\n| Depth-Anything-V2-Large | 335.3M | [Download](https:\u002F\u002Fhuggingface.co\u002Fdepth-anything\u002FDepth-Anything-V2-Large\u002Fresolve\u002Fmain\u002Fdepth_anything_v2_vitl.pth?download=true) |\n| Depth-Anything-V2-Giant | 1.3B | Coming soon |\n\n\n## Usage\n\n### Prepraration\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FDepthAnything\u002FDepth-Anything-V2\ncd Depth-Anything-V2\npip install -r requirements.txt\n```\n\nDownload the checkpoints listed [here](#pre-trained-models) and put them under the `checkpoints` directory.\n\n### Use our models\n```python\nimport cv2\nimport torch\n\nfrom depth_anything_v2.dpt import DepthAnythingV2\n\nDEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'\n\nmodel_configs = {\n    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},\n    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},\n    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},\n    'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}\n}\n\nencoder = 'vitl' # or 'vits', 'vitb', 'vitg'\n\nmodel = DepthAnythingV2(**model_configs[encoder])\nmodel.load_state_dict(torch.load(f'checkpoints\u002Fdepth_anything_v2_{encoder}.pth', map_location='cpu'))\nmodel = model.to(DEVICE).eval()\n\nraw_img = cv2.imread('your\u002Fimage\u002Fpath')\ndepth = model.infer_image(raw_img) # HxW raw depth map in numpy\n```\n\nIf you do not want to clone this repository, you can also load our models through [Transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002F). Below is a simple code snippet. Please refer to the [official page](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fdepth_anything_v2) for more details.\n\n- Note 1: Make sure you can connect to Hugging Face and have installed the latest Transformers.\n- Note 2: Due to the [upsampling difference](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fpull\u002F31522#issuecomment-2184123463) between OpenCV (we used) and Pillow (HF used), predictions may differ slightly. So you are more recommended to use our models through the way introduced above.\n```python\nfrom transformers import pipeline\nfrom PIL import Image\n\npipe = pipeline(task=\"depth-estimation\", model=\"depth-anything\u002FDepth-Anything-V2-Small-hf\")\nimage = Image.open('your\u002Fimage\u002Fpath')\ndepth = pipe(image)[\"depth\"]\n```\n\n### Running script on *images*\n\n```bash\npython run.py \\\n  --encoder \u003Cvits | vitb | vitl | vitg> \\\n  --img-path \u003Cpath> --outdir \u003Coutdir> \\\n  [--input-size \u003Csize>] [--pred-only] [--grayscale]\n```\nOptions:\n- `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.\n- `--input-size` (optional): By default, we use input size `518` for model inference. ***You can increase the size for even more fine-grained results.***\n- `--pred-only` (optional): Only save the predicted depth map, without raw image.\n- `--grayscale` (optional): Save the grayscale depth map, without applying color palette.\n\nFor example:\n```bash\npython run.py --encoder vitl --img-path assets\u002Fexamples --outdir depth_vis\n```\n\n### Running script on *videos*\n\n```bash\npython run_video.py \\\n  --encoder \u003Cvits | vitb | vitl | vitg> \\\n  --video-path assets\u002Fexamples_video --outdir video_depth_vis \\\n  [--input-size \u003Csize>] [--pred-only] [--grayscale]\n```\n\n***Our larger model has better temporal consistency on videos.***\n\n### Gradio demo\n\nTo use our gradio demo locally:\n\n```bash\npython app.py\n```\n\nYou can also try our [online demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FDepth-Anything\u002FDepth-Anything-V2).\n\n***Note: Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https:\u002F\u002Fgithub.com\u002FLiheYoung\u002FDepth-Anything\u002Fissues\u002F81)).*** In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use [intermediate features](https:\u002F\u002Fgithub.com\u002FDepthAnything\u002FDepth-Anything-V2\u002Fblob\u002F2cbc36a8ce2cec41d38ee51153f112e87c8e42d8\u002Fdepth_anything_v2\u002Fdpt.py#L164-L169) instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.\n\n\n## Fine-tuned to Metric Depth Estimation\n\nPlease refer to [metric depth estimation](.\u002Fmetric_depth).\n\n\n## DA-2K Evaluation Benchmark\n\nPlease refer to [DA-2K benchmark](.\u002FDA-2K.md).\n\n\n## Community Support\n\n**We sincerely appreciate all the community support for our Depth Anything series. Thank you a lot!**\n\n- Apple Core ML:\n    - https:\u002F\u002Fdeveloper.apple.com\u002Fmachine-learning\u002Fmodels\n    - https:\u002F\u002Fhuggingface.co\u002Fapple\u002Fcoreml-depth-anything-v2-small\n    - https:\u002F\u002Fhuggingface.co\u002Fapple\u002Fcoreml-depth-anything-small\n- Transformers:\n    - https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fdepth_anything_v2\n    - https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fen\u002Fmodel_doc\u002Fdepth_anything\n- TensorRT:\n    - https:\u002F\u002Fgithub.com\u002Fspacewalk01\u002Fdepth-anything-tensorrt\n    - https:\u002F\u002Fgithub.com\u002Fzhujiajian98\u002FDepth-Anythingv2-TensorRT-python\n- ONNX: https:\u002F\u002Fgithub.com\u002Ffabio-sim\u002FDepth-Anything-ONNX\n- ComfyUI: https:\u002F\u002Fgithub.com\u002Fkijai\u002FComfyUI-DepthAnythingV2\n- Transformers.js (real-time depth in web): https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FXenova\u002Fwebgpu-realtime-depth-estimation\n- Android:\n  - https:\u002F\u002Fgithub.com\u002Fshubham0204\u002FDepth-Anything-Android\n  - https:\u002F\u002Fgithub.com\u002FFeiGeChuanShu\u002Fncnn-android-depth_anything\n\n\n## Acknowledgement\n\nWe are sincerely grateful to the awesome Hugging Face team ([@Pedro Cuenca](https:\u002F\u002Fhuggingface.co\u002Fpcuenq), [@Niels Rogge](https:\u002F\u002Fhuggingface.co\u002Fnielsr), [@Merve Noyan](https:\u002F\u002Fhuggingface.co\u002Fmerve), [@Amy Roberts](https:\u002F\u002Fhuggingface.co\u002Famyeroberts), et al.) for their huge efforts in supporting our models in Transformers and Apple Core ML.\n\nWe also thank the [DINOv2](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdinov2) team for contributing such impressive models to our community.\n\n\n## LICENSE\n\nDepth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base\u002FLarge\u002FGiant models are under the CC-BY-NC-4.0 license.\n\n\n## Citation\n\nIf you find this project useful, please consider citing:\n\n```bibtex\n@article{depth_anything_v2,\n  title={Depth Anything V2},\n  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},\n  journal={arXiv:2406.09414},\n  year={2024}\n}\n\n@inproceedings{depth_anything_v1,\n  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, \n  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},\n  booktitle={CVPR},\n  year={2024}\n}\n```\n","Depth Anything V2 是一个用于单目深度估计的基础模型。该项目通过改进的算法在细节表现和鲁棒性上显著优于其前一版本，并且相比基于SD的模型，它具有更快的推理速度、更少的参数量以及更高的深度准确性。项目提供了四个不同规模的预训练模型以适应不同的应用场景需求。这些特性使得Depth Anything V2非常适合需要高效准确地从单个图像中提取深度信息的应用场景，如自动驾驶、增强现实及机器人导航等领域。此外，该模型还被集成到了Hugging Face Transformers库中，并支持Apple Core ML平台，方便开发者快速部署应用。",2,"2026-06-11 03:39:53","high_star"]