[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72045":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72045,"YOLO-World","AILab-CVC\u002FYOLO-World","AILab-CVC","[CVPR 2024] Real-Time Open-Vocabulary Object Detection","https:\u002F\u002Fwww.yoloworld.cc",null,"Python",6403,610,47,407,0,7,19,58,21,39.36,"GNU General Public License v3.0",false,"master",[],"2026-06-12 02:02:57","\u003Cdiv align=\"center\">\n\u003Cimg src=\".\u002Fassets\u002Fyolo_logo.png\" width=60%>\n\u003Cbr>\n\u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?hl=zh-CN&user=PH8rJHYAAAAJ\">Tianheng Cheng\u003C\u002Fa>\u003Csup>\u003Cspan>2,3,*\u003C\u002Fspan>\u003C\u002Fsup>, \n\u003Ca href=\"https:\u002F\u002Flinsong.info\u002F\">Lin Song\u003C\u002Fa>\u003Csup>\u003Cspan>1,📧,*\u003C\u002Fspan>\u003C\u002Fsup>,\n\u003Ca href=\"https:\u002F\u002Fyxgeee.github.io\u002F\">Yixiao Ge\u003C\u002Fa>\u003Csup>\u003Cspan>1,🌟,2\u003C\u002Fspan>\u003C\u002Fsup>,\n\u003Ca href=\"http:\u002F\u002Feic.hust.edu.cn\u002Fprofessor\u002Fliuwenyu\u002F\"> Wenyu Liu\u003C\u002Fa>\u003Csup>\u003Cspan>3\u003C\u002Fspan>\u003C\u002Fsup>,\n\u003Ca href=\"https:\u002F\u002Fxwcv.github.io\u002F\">Xinggang Wang\u003C\u002Fa>\u003Csup>\u003Cspan>3,📧\u003C\u002Fspan>\u003C\u002Fsup>,\n\u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=4oXBp9UAAAAJ&hl=en\">Ying Shan\u003C\u002Fa>\u003Csup>\u003Cspan>1,2\u003C\u002Fspan>\u003C\u002Fsup>\n\u003C\u002Fbr>\n\n\\* Equal contribution 🌟 Project lead 📧 Corresponding author\n\n\u003Csup>1\u003C\u002Fsup> Tencent AI Lab,  \u003Csup>2\u003C\u002Fsup> ARC Lab, Tencent PCG\n\u003Csup>3\u003C\u002Fsup> Huazhong University of Science and Technology\n\u003Cbr>\n\u003Cdiv>\n\n[![arxiv paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-green)](https:\u002F\u002Fwondervictor.github.io\u002F)\n[![arxiv paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-red)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2401.17270)\n\u003Ca href=\"https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FAILab-CVC\u002FYOLO-World\u002Fblob\u002Fmaster\u002Finference.ipynb\">\u003Cimg src=\"https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg\" alt=\"Open In Colab\">\u003C\u002Fa>\n[![demo](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗HugginngFace-Spaces-orange)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fstevengrove\u002FYOLO-World)\n[![Replicate](https:\u002F\u002Freplicate.com\u002Fzsxkib\u002Fyolo-world\u002Fbadge)](https:\u002F\u002Freplicate.com\u002Fzsxkib\u002Fyolo-world)\n[![hfpaper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗HugginngFace-Paper-yellow)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2401.17270)\n[![license](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-GPLv3.0-blue)](LICENSE)\n[![yoloworldseg](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FYOLOWorldxEfficientSAM-🤗Spaces-orange)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FSkalskiP\u002FYOLO-World)\n[![yologuide](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F📖Notebook-roboflow-purple)](https:\u002F\u002Fsupervision.roboflow.com\u002Fdevelop\u002Fnotebooks\u002Fzero-shot-object-detection-with-yolo-world)\n[![deploy](https:\u002F\u002Fmedia.roboflow.com\u002Fdeploy.svg)](https:\u002F\u002Finference.roboflow.com\u002Ffoundation\u002Fyolo_world\u002F)\n\n\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n## Notice\n\n**YOLO-World is still under active development!**\n\nWe recommend that everyone **use English to communicate on issues**, as this helps developers from around the world discuss, share experiences, and answer questions together.\n\nFor business licensing and other related inquiries, don't hesitate to contact `yixiaoge@tencent.com`.\n\n## 🔥 Updates \n`[2025-2-8]:` We release a new YOLO-World-V2.1, which includes new pre-trained weights and training code for image prompts. Please see the update [YOLO-World-V2.1-Blog](.\u002Fdocs\u002Fupdate_20250123.md) for details.\\\n`[2024-11-5]`: We update the `YOLO-World-Image` and you can try it at HuggingFace [YOLO-World-Image (Preview Version)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fwondervictor\u002FYOLO-World-Image). It's a *preview* version and we are still improving it! Detailed documents about training and few-shot inference are coming soon.\\\n`[2024-7-8]`: YOLO-World now has been integrated into [ComfyUI](https:\u002F\u002Fgithub.com\u002FStevenGrove\u002FComfyUI-YOLOWorld)! Come and try adding YOLO-World to your workflow now! You can access it at [StevenGrove\u002FComfyUI-YOLOWorld](https:\u002F\u002Fgithub.com\u002FStevenGrove\u002FComfyUI-YOLOWorld)!  \n`[2024-5-18]:` YOLO-World models have been [integrated with the FiftyOne computer vision toolkit](https:\u002F\u002Fdocs.voxel51.com\u002Fintegrations\u002Fultralytics.html#open-vocabulary-detection) for streamlined open-vocabulary inference across image and video datasets.  \n`[2024-5-16]:` Hey guys! Long time no see! This update contains (1) [fine-tuning guide](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World?#highlights--introduction) and (2) [TFLite Export](.\u002Fdocs\u002Ftflite_deploy.md) with INT8 Quantization.  \n`[2024-5-9]:` This update contains the real [`reparameterization`](.\u002Fdocs\u002Freparameterize.md) 🪄, and it's better for fine-tuning on custom datasets and improves the training\u002Finference efficiency 🚀!  \n`[2024-4-28]:` Long time no see! This update contains bugfixs and improvements: (1) ONNX demo; (2) image demo (support tensor input); (2) new pre-trained models; (3) image prompts; (4) simple version for fine-tuning \u002F deployment; (5) guide for installation (include a `requirements.txt`).  \n`[2024-3-28]:` We provide: (1) more high-resolution pre-trained models (e.g., S, M, X) ([#142](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fissues\u002F142)); (2) pre-trained models with CLIP-Large text encoders. Most importantly, we preliminarily fix the **fine-tuning without `mask-refine`** and explore a new fine-tuning setting ([#160](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fissues\u002F160),[#76](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fissues\u002F76)). In addition, fine-tuning YOLO-World with `mask-refine` also obtains significant improvements, check more details in [configs\u002Ffinetune_coco](.\u002Fconfigs\u002Ffinetune_coco\u002F).  \n`[2024-3-16]:` We fix the bugs about the demo ([#110](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fissues\u002F110),[#94](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fissues\u002F94),[#129](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fissues\u002F129), [#125](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fissues\u002F125)) with visualizations of segmentation masks, and release [**YOLO-World with Embeddings**](.\u002Fdocs\u002Fprompt_yolo_world.md), which supports prompt tuning, text prompts and image prompts.  \n`[2024-3-3]:` We add the **high-resolution YOLO-World**, which supports `1280x1280` resolution with higher accuracy and better performance for small objects!  \n`[2024-2-29]:` We release the newest version of [ **YOLO-World-v2**](.\u002Fdocs\u002Fupdates.md) with higher accuracy and faster speed! We hope the community can join us to improve YOLO-World!  \n`[2024-2-28]:` Excited to announce that YOLO-World has been accepted by **CVPR 2024**! We're continuing to make YOLO-World faster and stronger, as well as making it better to use for all.  \n`[2024-2-22]:` We sincerely thank [RoboFlow](https:\u002F\u002Froboflow.com\u002F) and [@Skalskip92](https:\u002F\u002Ftwitter.com\u002Fskalskip92) for the [**Video Guide**](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=X7gKBGVz4vs) about YOLO-World, nice work!  \n`[2024-2-18]:` We thank [@Skalskip92](https:\u002F\u002Ftwitter.com\u002Fskalskip92) for developing the wonderful segmentation demo via connecting YOLO-World and EfficientSAM. You can try it now at the [🤗 HuggingFace Spaces](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FSkalskiP\u002FYOLO-World).   \n`[2024-2-17]:` The largest model **X** of YOLO-World is released, which achieves better zero-shot performance!   \n`[2024-2-17]:` We release the code & models for **YOLO-World-Seg** now! YOLO-World now supports open-vocabulary \u002F zero-shot object segmentation!  \n`[2024-2-15]:` The pre-traind YOLO-World-L with CC3M-Lite is released!     \n`[2024-2-14]:` We provide the [`image_demo`](demo.py) for inference on images or directories.   \n`[2024-2-10]:` We provide the [fine-tuning](.\u002Fdocs\u002Ffinetuning.md) and [data](.\u002Fdocs\u002Fdata.md) details for fine-tuning YOLO-World on the COCO dataset or the custom datasets!  \n`[2024-2-3]:` We support the `Gradio` demo now in the repo and you can build the YOLO-World demo on your own device!  \n`[2024-2-1]:` We've released the code and weights of YOLO-World now!  \n`[2024-2-1]:` We deploy the YOLO-World demo on [HuggingFace 🤗](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fstevengrove\u002FYOLO-World), you can try it now!  \n`[2024-1-31]:` We are excited to launch **YOLO-World**, a cutting-edge real-time open-vocabulary object detector.  \n\n\n## TODO\n\nYOLO-World is under active development and please stay tuned ☕️! \nIf you have suggestions📃 or ideas💡,**we would love for you to bring them up in the [Roadmap](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fissues\u002F109)** ❤️!\n> YOLO-World 目前正在积极开发中📃，如果你有建议或者想法💡，**我们非常希望您在 [Roadmap](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fissues\u002F109) 中提出来** ❤️！\n\n## [FAQ (Frequently Asked Questions)](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fdiscussions\u002F149)\n\nWe have set up an FAQ about YOLO-World in the discussion on GitHub. We hope everyone can raise issues or solutions during use here, and we also hope that everyone can quickly find solutions from it.\n\n> 我们在GitHub的discussion中建立了关于YOLO-World的常见问答，这里将收集一些常见问题，同时大家可以在此提出使用中的问题或者解决方案，也希望大家能够从中快速寻找到解决方案\n\n\n## Highlights & Introduction\n\nThis repo contains the PyTorch implementation, pre-trained weights, and pre-training\u002Ffine-tuning code for YOLO-World.\n\n* YOLO-World is pre-trained on large-scale datasets, including detection, grounding, and image-text datasets.\n\n* YOLO-World is the next-generation YOLO detector, with a strong open-vocabulary detection capability and grounding ability.\n\n* YOLO-World presents a *prompt-then-detect* paradigm for efficient user-vocabulary inference, which re-parameterizes vocabulary embeddings as parameters into the model and achieve superior inference speed. You can try to export your own detection model without extra training or fine-tuning in our [online demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fstevengrove\u002FYOLO-World)!\n\n\n\u003Cdiv align=\"center\">\n\u003Cimg width=800px src=\".\u002Fassets\u002Fyolo_arch.png\">\n\u003C\u002Fdiv>\n\n### Zero-shot Evaluation Results for Pre-trained Models\nWe evaluate all YOLO-World-V2.1 models on LVIS, LVIS-mini, and COCO in the zero-shot manner, and compare with the previous version (the improvements are annotated in the superscripts).\n\n\u003Ctable>\n    \u003Ctr>\n        \u003Cth rowspan=\"2\">Model\u003C\u002Fth>\u003Cth rowspan=\"2\">Resolution\u003C\u002Fth>\u003Cth colspan=\"4\" style=\"border-right: 1px solid\">LVIS AP\u003C\u002Fth>\u003Cth colspan=\"4\">LVIS-mini\u003C\u002Fth>\u003Cth colspan=\"4\" style=\"border-left: 1px solid\">COCO\u003C\u002Fth>\n    \u003C\u002Ftr>\n        \u003Ctd>AP\u003C\u002Ftd>\u003Ctd>AP\u003Csub>r\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd>AP\u003Csub>c\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd style=\"border-right: 1px solid\">AP\u003Csub>f\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd>AP\u003C\u002Ftd>\u003Ctd>AP\u003Csub>r\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd>AP\u003Csub>c\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd>AP\u003Csub>f\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd style=\"border-left: 1px solid\">AP\u003C\u002Ftd>\u003Ctd>AP\u003Csub>50\u003C\u002Fsub>\u003C\u002Ftd>\u003Ctd>AP\u003Csub>75\u003C\u002Fsub>\u003C\u002Ftd>\n    \u003Ctr style=\"border-top: 2px solid\">\n        \u003Ctd>YOLO-World-S\u003C\u002Ftd>\u003Ctd>640\u003C\u002Ftd>\u003Ctd>18.5\u003Csup>+1.2\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>12.6\u003C\u002Ftd>\u003Ctd>15.8\u003C\u002Ftd>\u003Ctd style=\"border-right: 1px solid\">24.1\u003C\u002Ftd>\u003Ctd>23.6\u003Csup>+0.9\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>16.4\u003C\u002Ftd>\u003Ctd>21.5\u003C\u002Ftd>\u003Ctd>26.6\u003C\u002Ftd>\u003Ctd style=\"border-left: 1px solid\">36.6\u003C\u002Ftd>\u003Ctd>51.0\u003C\u002Ftd>\u003Ctd>39.7\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>YOLO-World-S\u003C\u002Ftd>\u003Ctd>1280\u003C\u002Ftd>\u003Ctd>19.7\u003Csup>+0.9\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>13.5\u003C\u002Ftd>\u003Ctd>16.3\u003C\u002Ftd>\u003Ctd style=\"border-right: 1px solid\">26.3\u003C\u002Ftd>\u003Ctd>25.5\u003Csup>+1.4\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>19.1\u003C\u002Ftd>\u003Ctd>22.6\u003C\u002Ftd>\u003Ctd>29.3\u003C\u002Ftd>\u003Ctd style=\"border-left: 1px solid\">38.2\u003C\u002Ftd>\u003Ctd>54.2\u003C\u002Ftd>\u003Ctd>41.6\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr style=\"border-top: 2px solid\">\n        \u003Ctd>YOLO-World-M\u003C\u002Ftd>\u003Ctd>640\u003C\u002Ftd>\u003Ctd>24.1\u003Csup>+0.6\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>16.9\u003C\u002Ftd>\u003Ctd>21.1\u003C\u002Ftd>\u003Ctd style=\"border-right: 1px solid\">30.6\u003C\u002Ftd>\u003Ctd>30.6\u003Csup>+0.6\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>19.7\u003C\u002Ftd>\u003Ctd>29.0\u003C\u002Ftd>\u003Ctd>34.1\u003C\u002Ftd>\u003Ctd style=\"border-left: 1px solid\">43.0\u003C\u002Ftd>\u003Ctd>58.6\u003C\u002Ftd>\u003Ctd>46.7\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>YOLO-World-M\u003C\u002Ftd>\u003Ctd>1280\u003C\u002Ftd>\u003Ctd>26.0\u003Csup>+0.7\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>19.9\u003C\u002Ftd>\u003Ctd>22.5\u003C\u002Ftd>\u003Ctd style=\"border-right: 1px solid\">32.7\u003C\u002Ftd>\u003Ctd>32.7\u003Csup>+1.1\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>24.4\u003C\u002Ftd>\u003Ctd>30.2\u003C\u002Ftd>\u003Ctd>36.4\u003C\u002Ftd>\u003Ctd style=\"border-left: 1px solid\">43.8\u003C\u002Ftd>\u003Ctd>60.3\u003C\u002Ftd>\u003Ctd>47.7\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr style=\"border-top: 2px solid\">\n        \u003Ctd>YOLO-World-L\u003C\u002Ftd>\u003Ctd>640\u003C\u002Ftd>\u003Ctd>26.8\u003Csup>+0.7\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>19.8\u003C\u002Ftd>\u003Ctd>23.6\u003C\u002Ftd>\u003Ctd style=\"border-right: 1px solid\">33.4\u003C\u002Ftd>\u003Ctd>33.8\u003Csup>+0.9\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>24.5\u003C\u002Ftd>\u003Ctd>32.3\u003C\u002Ftd>\u003Ctd>36.8\u003C\u002Ftd>\u003Ctd style=\"border-left: 1px solid\">44.9\u003C\u002Ftd>\u003Ctd>60.4\u003C\u002Ftd>\u003Ctd>48.9\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>YOLO-World-L\u003C\u002Ftd>\u003Ctd>800\u003C\u002Ftd>\u003Ctd>28.3\u003C\u002Ftd>\u003Ctd>22.5\u003C\u002Ftd>\u003Ctd>24.4\u003C\u002Ftd>\u003Ctd style=\"border-right: 1px solid\">35.1\u003C\u002Ftd>\u003Ctd>35.2\u003C\u002Ftd>\u003Ctd>27.8\u003C\u002Ftd>\u003Ctd>32.6\u003C\u002Ftd>\u003Ctd>38.8\u003C\u002Ftd>\u003Ctd style=\"border-left: 1px solid\">47.4\u003C\u002Ftd>\u003Ctd>63.3\u003C\u002Ftd>\u003Ctd>51.8\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>YOLO-World-L\u003C\u002Ftd>\u003Ctd>1280\u003C\u002Ftd>\u003Ctd>28.7\u003Csup>+1.1\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>22.9\u003C\u002Ftd>\u003Ctd>24.9\u003C\u002Ftd>\u003Ctd style=\"border-right: 1px solid\">35.4\u003C\u002Ftd>\u003Ctd>35.5\u003Csup>+1.2\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>24.4\u003C\u002Ftd>\u003Ctd>34.0\u003C\u002Ftd>\u003Ctd>38.8\u003C\u002Ftd>\u003Ctd style=\"border-left: 1px solid\">46.0\u003C\u002Ftd>\u003Ctd>62.5\u003C\u002Ftd>\u003Ctd>50.0\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr style=\"border-top: 2px solid\">\n        \u003Ctd>YOLO-World-X\u003C\u002Ftd>\u003Ctd>640\u003C\u002Ftd>\u003Ctd>28.6\u003Csup>+0.2\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>22.0\u003C\u002Ftd>\u003Ctd>25.6\u003C\u002Ftd>\u003Ctd style=\"border-right: 1px solid\">34.9\u003C\u002Ftd>\u003Ctd>35.8\u003Csup>+0.4\u003C\u002Fsup>\u003C\u002Ftd>\u003Ctd>31.0\u003C\u002Ftd>\u003Ctd>33.7\u003C\u002Ftd>\u003Ctd>38.5\u003C\u002Ftd>\u003Ctd style=\"border-left: 1px solid\">46.7\u003C\u002Ftd>\u003Ctd>62.5\u003C\u002Ftd>\u003Ctd>51.0\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd colspan=\"13\">YOLO-World-X-1280 is coming soon.\u003C\u002Ftd>\n    \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### Model Card\n\n\u003Ctable>\n    \u003Ctr>\n        \u003Cth>Model\u003C\u002Fth>\u003Cth>Resolution\u003C\u002Fth>\u003Cth>Training\u003C\u002Fth>\u003Cth>Data\u003C\u002Fth>\u003Cth>Model Weights\u003C\u002Fth>\n    \u003C\u002Ftr>\n    \u003Ctr style=\"border-top: 2px solid\">\n        \u003Ctd>YOLO-World-S\u003C\u002Ftd>\u003Ctd>640\u003C\u002Ftd>\u003Ctd>PT (100e)\u003C\u002Ftd>\u003Ctd>O365v1+GoldG+CC-LiteV2\u003C\u002Ftd>\u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fwondervictor\u002FYOLO-World-V2.1\u002Fresolve\u002Fmain\u002Fx_stage1-62b674ad.pth\"> 🤗 HuggingFace\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>YOLO-World-S\u003C\u002Ftd>\u003Ctd>1280\u003C\u002Ftd>\u003Ctd>CPT (40e)\u003C\u002Ftd>\u003Ctd>O365v1+GoldG+CC-LiteV2\u003C\u002Ftd>\u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fwondervictor\u002FYOLO-World-V2.1\u002Fresolve\u002Fmain\u002Fs_stage2-4466ab94.pth\"> 🤗 HuggingFace\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr style=\"border-top: 2px solid\">\n        \u003Ctd>YOLO-World-M\u003C\u002Ftd>\u003Ctd>640\u003C\u002Ftd>\u003Ctd>PT (100e)\u003C\u002Ftd>\u003Ctd>O365v1+GoldG+CC-LiteV2\u003C\u002Ftd>\u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fwondervictor\u002FYOLO-World-V2.1\u002Fresolve\u002Fmain\u002Fm_stage1-7e1e5299.pth\"> 🤗 HuggingFace\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>YOLO-World-M\u003C\u002Ftd>\u003Ctd>1280\u003C\u002Ftd>\u003Ctd>CPT (40e)\u003C\u002Ftd>\u003Ctd>O365v1+GoldG+CC-LiteV2\u003C\u002Ftd>\u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fwondervictor\u002FYOLO-World-V2.1\u002Fresolve\u002Fmain\u002Fm_stage2-9987dcb1.pth\"> 🤗 HuggingFace\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr style=\"border-top: 2px solid\">\n        \u003Ctd>YOLO-World-L\u003C\u002Ftd>\u003Ctd>640\u003C\u002Ftd>\u003Ctd>PT (100e)\u003C\u002Ftd>\u003Ctd>O365v1+GoldG+CC-LiteV2\u003C\u002Ftd>\u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fwondervictor\u002FYOLO-World-V2.1\u002Fresolve\u002Fmain\u002Fl_stage1-7d280586.pth\"> 🤗 HuggingFace\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n        \u003Ctd>YOLO-World-L\u003C\u002Ftd>\u003Ctd>800 \u002F 1280\u003C\u002Ftd>\u003Ctd>CPT (40e)\u003C\u002Ftd>\u003Ctd>O365v1+GoldG+CC-LiteV2\u003C\u002Ftd>\u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fwondervictor\u002FYOLO-World-V2.1\u002Fresolve\u002Fmain\u002Fl_stage2-b3e3dc3f.pth\"> 🤗 HuggingFace\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr style=\"border-top: 2px solid\">\n        \u003Ctd>YOLO-World-X\u003C\u002Ftd>\u003Ctd>640\u003C\u002Ftd>\u003Ctd>PT (100e)\u003C\u002Ftd>\u003Ctd>O365v1+GoldG+CC-LiteV2\u003C\u002Ftd>\u003Ctd>\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fwondervictor\u002FYOLO-World-V2.1\u002Fresolve\u002Fmain\u002Fx_stage1-62b674ad.pth\"> 🤗 HuggingFace\u003C\u002Fa>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n**Notes:**\n* PT: Pre-training, CPT: continuing pre-training\n* CC-LiteV2: the newly-annotated CC3M subset, including 250k images.\n\n\n## Getting started\n\n### 1. Installation\n\nYOLO-World is developed based on `torch==1.11.0` `mmyolo==0.6.0` and `mmdetection==3.0.0`. Check more details about `requirements` and `mmcv` in [docs\u002Finstallation](.\u002Fdocs\u002Finstallation.md).\n\n#### Clone Project \n\n```bash\ngit clone --recursive https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World.git\n```\n#### Install\n\n```bash\npip install torch wheel -q\npip install -e .\n```\n\n### 2. Preparing Data\n\nWe provide the details about the pre-training data in [docs\u002Fdata](.\u002Fdocs\u002Fdata.md).\n\n\n## Training & Evaluation\n\nWe adopt the default [training](.\u002Ftools\u002Ftrain.py) or [evaluation](.\u002Ftools\u002Ftest.py) scripts of [mmyolo](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmyolo).\nWe provide the configs for pre-training and fine-tuning in `configs\u002Fpretrain` and `configs\u002Ffinetune_coco`.\nTraining YOLO-World is easy:\n\n```bash\nchmod +x tools\u002Fdist_train.sh\n# sample command for pre-training, use AMP for mixed-precision training\n.\u002Ftools\u002Fdist_train.sh configs\u002Fpretrain\u002Fyolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py 8 --amp\n```\n**NOTE:** YOLO-World is pre-trained on 4 nodes with 8 GPUs per node (32 GPUs in total). For pre-training, the `node_rank` and `nnodes` for multi-node training should be specified. \n\nEvaluating YOLO-World is also easy:\n\n```bash\nchmod +x tools\u002Fdist_test.sh\n.\u002Ftools\u002Fdist_test.sh path\u002Fto\u002Fconfig path\u002Fto\u002Fweights 8\n```\n\n**NOTE:** We mainly evaluate the performance on LVIS-minival for pre-training.\n\n## Fine-tuning YOLO-World\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\".\u002Fassets\u002Ffinetune_yoloworld.png\" width=800px>\n\u003C\u002Fdiv>\n\n\n\u003Cdiv align=\"center\">\n\u003Cb>\u003Cp>Chose your pre-trained YOLO-World and Fine-tune it!\u003C\u002Fp>\u003C\u002Fb> \n\u003C\u002Fdiv>\n\n\nYOLO-World supports **zero-shot inference**, and three types of **fine-tuning recipes**: **(1) normal fine-tuning**, **(2) prompt tuning**, and **(3) reparameterized fine-tuning**.\n\n* Normal Fine-tuning: we provide the details about fine-tuning YOLO-World in [docs\u002Ffine-tuning](.\u002Fdocs\u002Ffinetuning.md).\n\n* Prompt Tuning: we provide more details ahout prompt tuning in [docs\u002Fprompt_yolo_world](.\u002Fdocs\u002Fprompt_yolo_world.md).\n\n* Reparameterized Fine-tuning: the reparameterized YOLO-World is more suitable for specific domains far from generic scenes. You can find more details in [docs\u002Freparameterize](.\u002Fdocs\u002Freparameterize.md).\n\n## Deployment\n\nWe provide the details about deployment for downstream applications in [docs\u002Fdeployment](.\u002Fdocs\u002Fdeploy.md).\nYou can directly download the ONNX model through the online [demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fstevengrove\u002FYOLO-World) in Huggingface Spaces 🤗.\n\n- [x] ONNX export and demo: [docs\u002Fdeploy](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fblob\u002Fmaster\u002Fdocs\u002Fdeploy.md)\n- [x] TFLite and INT8 Quantization: [docs\u002Ftflite_deploy](https:\u002F\u002Fgithub.com\u002FAILab-CVC\u002FYOLO-World\u002Fblob\u002Fmaster\u002Fdocs\u002Ftflite_deploy.md)\n- [ ] TensorRT: coming soon.\n- [ ] C++: coming soon.\n\n## Demo\n\nSee [`demo`](.\u002Fdemo) for more details\n\n- [x] `gradio_demo.py`: Gradio demo, ONNX export\n- [x] `image_demo.py`: inference with images or a directory of images\n- [x] `simple_demo.py`: a simple demo of YOLO-World, using `array` (instead of path as input).\n- [x] `video_demo.py`: inference YOLO-World on videos.\n- [x] `inference.ipynb`: jupyter notebook for YOLO-World.\n- [x] [Google Colab Notebook](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1F_7S5lSaFM06irBCZqjhbN7MpUXo6WwO?usp=sharing): We sincerely thank [Onuralp](https:\u002F\u002Fgithub.com\u002Fonuralpszr) for sharing the [Colab Demo](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1F_7S5lSaFM06irBCZqjhbN7MpUXo6WwO?usp=sharing), you can have a try 😊！\n\n## Acknowledgement\n\nWe sincerely thank [mmyolo](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmyolo), [mmdetection](https:\u002F\u002Fgithub.com\u002Fopen-mmlab\u002Fmmdetection), [GLIP](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FGLIP), and [transformers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers) for providing their wonderful code to the community!\n\n## Citations\nIf you find YOLO-World is useful in your research or applications, please consider giving us a star 🌟 and citing it.\n\n```bibtex\n@inproceedings{Cheng2024YOLOWorld,\n  title={YOLO-World: Real-Time Open-Vocabulary Object Detection},\n  author={Cheng, Tianheng and Song, Lin and Ge, Yixiao and Liu, Wenyu and Wang, Xinggang and Shan, Ying},\n  booktitle={Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},\n  year={2024}\n}\n```\n\n## Licence\nYOLO-World is under the GPL-v3 Licence and is supported for commercial usage. If you need a commercial license for YOLO-World, please feel free to contact us.\n","YOLO-World 是一个用于实时开放词汇对象检测的项目。该项目基于 YOLO 系列算法，支持在未知类别的情况下进行目标检测，具有高度的灵活性和实用性。其核心技术特点包括高效的推理速度、强大的泛化能力和易于扩展的架构设计。YOLO-World 适用于需要快速识别多种对象的应用场景，如智能监控、自动驾驶、图像检索等。此外，项目提供了丰富的预训练模型和详细的文档，便于开发者快速上手和定制化开发。",2,"2026-06-11 03:40:07","high_star"]