[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2213":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":16,"starSnapshotCount":16,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},2213,"MiniGPT-4","Vision-CAIR\u002FMiniGPT-4","Vision-CAIR","Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https:\u002F\u002Fminigpt-4.github.io, https:\u002F\u002Fminigpt-v2.github.io\u002F)","https:\u002F\u002Fminigpt-4.github.io",null,"Python",25679,2892,213,358,0,5,70.5,"BSD 3-Clause \"New\" or \"Revised\" License",false,"main",true,[],"2026-06-12 04:00:13","# MiniGPT-V\n\n\u003Cfont size='5'>**MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning**\u003C\u002Ffont>\n\nJun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong☨, Mohamed Elhoseiny☨\n\n☨equal last author\n\n\u003Ca href='https:\u002F\u002Fminigpt-v2.github.io'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa> \u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.09478.pdf'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-red'>\u003C\u002Fa>  \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FVision-CAIR\u002FMiniGPT-v2'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'> \u003Ca href='https:\u002F\u002Fminigpt-v2.github.io'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGradio-Demo-blue'>\u003C\u002Fa> [![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=atFCwV2hSY4)\n\n\n\u003Cfont size='5'> **MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models**\u003C\u002Ffont>\n\nDeyao Zhu*, Jun Chen*, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny\n\n*equal contribution\n\n\u003Ca href='https:\u002F\u002Fminigpt-4.github.io'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa>  \u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2304.10592'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-red'>\u003C\u002Fa> \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FVision-CAIR\u002Fminigpt4'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'>\u003C\u002Fa> \u003Ca href='https:\u002F\u002Fhuggingface.co\u002FVision-CAIR\u002FMiniGPT-4'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-blue'>\u003C\u002Fa> [![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https:\u002F\u002Fbadges.aleen42.com\u002Fsrc\u002Fyoutube.svg)](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=__tftoxpBAw&feature=youtu.be)\n\n*King Abdullah University of Science and Technology*\n\n## 💡 Get help - [Q&A](https:\u002F\u002Fgithub.com\u002FVision-CAIR\u002FMiniGPT-4\u002Fdiscussions\u002Fcategories\u002Fq-a) or [Discord 💬](https:\u002F\u002Fdiscord.gg\u002F5WdJkjbAeE)\n\n\u003Cfont size='4'> **Example Community Efforts Built on Top of MiniGPT-4 ** \u003C\u002Ffont> \n  \n* \u003Ca href='https:\u002F\u002Fgithub.com\u002Fwaltonfuture\u002FInstructionGPT-4?tab=readme-ov-file'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa> **InstructionGPT-4**: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4 Lai Wei, Zihao Jiang, Weiran Huang, Lichao Sun, Arxiv, 2023\n\n* \u003Ca href='https:\u002F\u002Fopenaccess.thecvf.com\u002Fcontent\u002FICCV2023W\u002FCLVL\u002Fpapers\u002FAubakirova_PatFig_Generating_Short_and_Long_Captions_for_Patent_Figures_ICCVW_2023_paper.pdf'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa> **PatFig**: Generating Short and Long Captions for Patent Figures.\", Aubakirova, Dana, Kim Gerdes, and Lufei Liu, ICCVW, 2023 \n\n\n* \u003Ca href='https:\u002F\u002Fgithub.com\u002FJoshuaChou2018\u002FSkinGPT-4'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa> **SkinGPT-4**: An Interactive Dermatology Diagnostic System with Visual Large Language Model, Juexiao Zhou and Xiaonan He and Liyuan Sun and Jiannan Xu and Xiuying Chen and Yuetan Chu and Longxi Zhou and Xingyu Liao and Bin Zhang and Xin Gao,  Arxiv, 2023 \n\n\n* \u003Ca href='https:\u002F\u002Fhuggingface.co\u002FTyrannosaurus\u002FArtGPT-4'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa> **ArtGPT-4**: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4.\",  Yuan, Zhengqing, Huiwen Xue, Xinyi Wang, Yongming Liu, Zhuanzhe Zhao, and Kun Wang, Arxiv, 2023 \n\n\n\u003C\u002Ffont>\n\n## News\n[Oct.31 2023] We release the evaluation code of our MiniGPT-v2.  \n\n[Oct.24 2023] We release the finetuning code of our MiniGPT-v2.\n\n[Oct.13 2023] Breaking! We release the first major update with our MiniGPT-v2\n\n[Aug.28 2023] We now provide a llama 2 version of MiniGPT-4\n\n## Online Demo\n\nClick the image to chat with MiniGPT-v2 around your images\n[![demo](figs\u002Fminigpt2_demo.png)](https:\u002F\u002Fminigpt-v2.github.io\u002F)\n\nClick the image to chat with MiniGPT-4 around your images\n[![demo](figs\u002Fonline_demo.png)](https:\u002F\u002Fminigpt-4.github.io)\n\n\n## MiniGPT-v2 Examples\n\n![MiniGPT-v2 demos](figs\u002Fdemo.png)\n\n\n\n## MiniGPT-4 Examples\n  |   |   |\n:-------------------------:|:-------------------------:\n![find wild](figs\u002Fexamples\u002Fwop_2.png) |  ![write story](figs\u002Fexamples\u002Fad_2.png)\n![solve problem](figs\u002Fexamples\u002Ffix_1.png)  |  ![write Poem](figs\u002Fexamples\u002Frhyme_1.png)\n\nMore examples can be found in the [project page](https:\u002F\u002Fminigpt-4.github.io).\n\n\n\n## Getting Started\n### Installation\n\n**1. Prepare the code and the environment**\n\nGit clone our repository, creating a python environment and activate it via the following command\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FVision-CAIR\u002FMiniGPT-4.git\ncd MiniGPT-4\nconda env create -f environment.yml\nconda activate minigptv\n```\n\n\n**2. Prepare the pretrained LLM weights**\n\n**MiniGPT-v2** is based on Llama2 Chat 7B. For **MiniGPT-4**, we have both Vicuna V0 and Llama 2 version.\nDownload the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs.\n\n|                            Llama 2 Chat 7B                             |                                           Vicuna V0 13B                                           |                                          Vicuna V0 7B                                          |\n:------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:\n[Download](https:\u002F\u002Fhuggingface.co\u002Fmeta-llama\u002FLlama-2-7b-chat-hf\u002Ftree\u002Fmain) | [Downlad](https:\u002F\u002Fhuggingface.co\u002FVision-CAIR\u002Fvicuna\u002Ftree\u002Fmain) | [Download](https:\u002F\u002Fhuggingface.co\u002FVision-CAIR\u002Fvicuna-7b\u002Ftree\u002Fmain) \n\n\nThen, set the variable *llama_model* in the model config file to the LLM weight path.\n\n* For MiniGPT-v2, set the LLM path \n[here](minigpt4\u002Fconfigs\u002Fmodels\u002Fminigpt_v2.yaml#L15) at Line 14.\n\n* For MiniGPT-4 (Llama2), set the LLM path \n[here](minigpt4\u002Fconfigs\u002Fmodels\u002Fminigpt4_llama2.yaml#L15) at Line 15.\n\n* For MiniGPT-4 (Vicuna), set the LLM path \n[here](minigpt4\u002Fconfigs\u002Fmodels\u002Fminigpt4_vicuna0.yaml#L18) at Line 18\n\n**3. Prepare the pretrained model checkpoints**\n\nDownload the pretrained model checkpoints\n\n\n| MiniGPT-v2 (after stage-2) | MiniGPT-v2 (after stage-3) | MiniGPT-v2 (online developing demo)| \n|------------------------------|------------------------------|------------------------------|\n| [Download](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1Vi_E7ZtZXRAQcyz4f8E6LtLh2UXABCmu\u002Fview?usp=sharing) |[Download](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1HkoUUrjzFGn33cSiUkI-KcT-zysCynAz\u002Fview?usp=sharing) | [Download](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1aVbfW7nkCSYx99_vCRyP1sOlQiWVSnAl\u002Fview?usp=sharing) |\n\n\nFor **MiniGPT-v2**, set the path to the pretrained checkpoint in the evaluation config file \nin [eval_configs\u002Fminigptv2_eval.yaml](eval_configs\u002Fminigptv2_eval.yaml#L10) at Line 8.\n\n\n\n| MiniGPT-4 (Vicuna 13B) | MiniGPT-4 (Vicuna 7B) | MiniGPT-4 (LLaMA-2 Chat 7B) |\n|----------------------------|---------------------------|---------------------------------|\n| [Download](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1a4zLvaiDBr-36pasffmgpvH5P7CKmpze\u002Fview?usp=share_link) | [Download](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R\u002Fview?usp=sharing) | [Download](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F11nAPjEok8eAGGEG1N2vXo3kBLCg0WgUk\u002Fview?usp=sharing) |\n\nFor **MiniGPT-4**, set the path to the pretrained checkpoint in the evaluation config file \nin [eval_configs\u002Fminigpt4_eval.yaml](eval_configs\u002Fminigpt4_eval.yaml#L10) at Line 8 for Vicuna version or [eval_configs\u002Fminigpt4_llama2_eval.yaml](eval_configs\u002Fminigpt4_llama2_eval.yaml#L10) for LLama2 version.   \n\n\n\n### Launching Demo Locally\n\nFor MiniGPT-v2, run\n```\npython demo_v2.py --cfg-path eval_configs\u002Fminigptv2_eval.yaml  --gpu-id 0\n```\n\nFor MiniGPT-4 (Vicuna version), run\n\n```\npython demo.py --cfg-path eval_configs\u002Fminigpt4_eval.yaml  --gpu-id 0\n```\n\nFor MiniGPT-4 (Llama2 version), run\n\n```\npython demo.py --cfg-path eval_configs\u002Fminigpt4_llama2_eval.yaml  --gpu-id 0\n```\n\n\nTo save GPU memory, LLMs loads as 8 bit by default, with a beam search width of 1. \nThis configuration requires about 23G GPU memory for 13B LLM and 11.5G GPU memory for 7B LLM. \nFor more powerful GPUs, you can run the model\nin 16 bit by setting `low_resource` to `False` in the relevant config file:\n\n* MiniGPT-v2: [minigptv2_eval.yaml](eval_configs\u002Fminigptv2_eval.yaml#6) \n* MiniGPT-4 (Llama2): [minigpt4_llama2_eval.yaml](eval_configs\u002Fminigpt4_llama2_eval.yaml#6)\n* MiniGPT-4 (Vicuna): [minigpt4_eval.yaml](eval_configs\u002Fminigpt4_eval.yaml#6)\n\nThanks [@WangRongsheng](https:\u002F\u002Fgithub.com\u002FWangRongsheng), you can also run MiniGPT-4 on [Colab](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing)\n\n\n### Training\nFor training details of MiniGPT-4, check [here](MiniGPT4_Train.md).\n\nFor finetuning details of MiniGPT-v2, check [here](MiniGPTv2_Train.md)\n\n\n### Evaluation\nFor finetuning details of MiniGPT-v2, check [here](eval_scripts\u002FEVAL_README.md)  \n\n\n## Acknowledgement\n\n+ [BLIP2](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Ftransformers\u002Fmain\u002Fmodel_doc\u002Fblip-2) The model architecture of MiniGPT-4 follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!\n+ [Lavis](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FLAVIS) This repository is built upon Lavis!\n+ [Vicuna](https:\u002F\u002Fgithub.com\u002Flm-sys\u002FFastChat) The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!\n+ [LLaMA](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fllama) The strong open-sourced LLaMA 2 language model.\n\n\nIf you're using MiniGPT-4\u002FMiniGPT-v2 in your research or applications, please cite using this BibTeX:\n```bibtex\n\n\n@article{chen2023minigptv2,\n      title={MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning}, \n      author={Chen, Jun and Zhu, Deyao and Shen, Xiaoqian and Li, Xiang and Liu, Zechu and Zhang, Pengchuan and Krishnamoorthi, Raghuraman and Chandra, Vikas and Xiong, Yunyang and Elhoseiny, Mohamed},\n      year={2023},\n      journal={arXiv preprint arXiv:2310.09478},\n}\n\n@article{zhu2023minigpt,\n  title={MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models},\n  author={Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed},\n  journal={arXiv preprint arXiv:2304.10592},\n  year={2023}\n}\n```\n\n\n## License\nThis repository is under [BSD 3-Clause License](LICENSE.md).\nMany codes are based on [Lavis](https:\u002F\u002Fgithub.com\u002Fsalesforce\u002FLAVIS) with \nBSD 3-Clause License [here](LICENSE_Lavis.md).\n","MiniGPT-4 和 MiniGPT-v2 是用于多模态视觉-语言任务的大规模语言模型。该项目通过将大型语言模型与视觉处理相结合，实现了对图像和文本的联合理解和生成，支持多种跨模态应用如图像描述、问答等。其核心功能包括基于预训练模型的高效微调机制以及强大的多任务学习能力，使得模型能够适应广泛的下游任务。技术上，项目采用Python实现，并且开放了完整的代码库供研究者复现实验结果或进一步开发。适合需要整合视觉与自然语言处理能力的应用场景，例如智能客服、内容审核系统、辅助医疗诊断等领域。",2,"2026-06-11 02:48:55","top_language"]