[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72196":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},72196,"DeepSeek-VL","deepseek-ai\u002FDeepSeek-VL","deepseek-ai","DeepSeek-VL: Towards Real-World Vision-Language Understanding","https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002FDeepSeek-VL-7B",null,"Python",4126,592,37,44,0,5,9,21,15,76.92,"MIT License",false,"main",[26,27,28],"foundation-models","vision-language-model","vision-language-pretraining","2026-06-12 04:01:04","\u003C!-- markdownlint-disable first-line-h1 -->\n\u003C!-- markdownlint-disable html -->\n\u003C!-- markdownlint-disable no-duplicate-header -->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"images\u002Flogo.svg\" width=\"60%\" alt=\"DeepSeek LLM\" \u002F>\n\u003C\u002Fdiv>\n\u003Chr>\n\u003Cdiv align=\"center\">\n\n  \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\" target=\"_blank\">\n    \u003Cimg alt=\"Homepage\" src=\"images\u002Fbadge.svg\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002FDeepSeek-VL-7B\" target=\"_blank\">\n    \u003Cimg alt=\"Chat\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤖%20Chat-DeepSeek%20VL-536af5?color=536af5&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\" target=\"_blank\">\n    \u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n\n\u003Cdiv align=\"center\">\n\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTc7c45Zzu5\" target=\"_blank\">\n    \u003Cimg alt=\"Discord\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"images\u002Fqr.jpeg\" target=\"_blank\">\n    \u003Cimg alt=\"Wechat\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fdeepseek_ai\" target=\"_blank\">\n    \u003Cimg alt=\"Twitter Follow\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-deepseek_ai-white?logo=x&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n  \u003Ca href=\"LICENSE-CODE\">\n    \u003Cimg alt=\"Code License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode_License-MIT-f5de53?&color=f5de53\">\n  \u003C\u002Fa>\n  \u003Ca href=\"LICENSE-MODEL\">\n    \u003Cimg alt=\"Model License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel_License-Model_Agreement-f5de53?&color=f5de53\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#3-model-downloads\">\u003Cb>📥 Model Download\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"#4-quick-start\">\u003Cb>⚡ Quick Start\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"#5-license\">\u003Cb>📜 License\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"#6-citation\">\u003Cb>📖 Citation\u003C\u002Fb>\u003C\u002Fa> \u003Cbr>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.05525\">\u003Cb>📄 Paper Link\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2403.05525\">\u003Cb>🤗 Huggingface Paper Link\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002FDeepSeek-VL-7B\">\u003Cb>👁️ Demo\u003C\u002Fb>\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n## 1. Introduction\n\nIntroducing DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.\n\n[DeepSeek-VL: Towards Real-World Vision-Language Understanding](https:\u002F\u002Farxiv.org\u002Fabs\u002F2403.05525)\n\nHaoyu Lu*, Wen Liu*, Bo Zhang**, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan (*Equal Contribution, **Project Lead)\n\n![](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-VL\u002Fblob\u002Fmain\u002Fimages\u002Fsample.jpg)\n\n## 2. Release\n\n\u003Cdetails>\n\u003Csummary>✅ \u003Cb>2024-03-14\u003C\u002Fb>: Demo for DeepSeek-VL-7B available on \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002FDeepSeek-VL-7B\">Hugging Face\u003C\u002Fa>.\u003C\u002Fsummary>\n\u003Cbr>Check out the gradio demo of DeepSeek-VL-7B at \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002FDeepSeek-VL-7B\">https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002FDeepSeek-VL-7B\u003C\u002Fa>. Experience its capabilities firsthand!\n\u003C\u002Fdetails>\n\n\n\u003Cdetails>\n\u003Csummary>✅ \u003Cb>2024-03-13\u003C\u002Fb>: Support DeepSeek-VL gradio demo.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>✅ \u003Cb>2024-03-11\u003C\u002Fb>: DeepSeek-VL family released, including \u003Ccode>DeepSeek-VL-7B-base\u003C\u002Fcode>, \u003Ccode>DeepSeek-VL-7B-chat\u003C\u002Fcode>, \u003Ccode>DeepSeek-VL-1.3B-base\u003C\u002Fcode>, and \u003Ccode>DeepSeek-VL-1.3B-chat\u003C\u002Fcode>.\u003C\u002Fsummary>\n\u003Cbr>The release includes a diverse set of models tailored for various applications within the DeepSeek-VL family. The models come in two sizes: 7B and 1.3B parameters, each offering base and chat variants to cater to different needs and integration scenarios.\n\n\u003C\u002Fdetails>\n\n## 3. Model Downloads\n\nWe release the DeepSeek-VL family, including 1.3B-base, 1.3B-chat, 7b-base and 7b-chat models, to the public.\nTo support a broader and more diverse range of research within both academic and commercial communities.\nPlease note that the use of this model is subject to the terms outlined in [License section](#5-license). Commercial usage is\npermitted under these terms.\n\n### Huggingface\n\n| Model                 | Sequence Length | Download                                                                    |\n|-----------------------|-----------------|-----------------------------------------------------------------------------|\n| DeepSeek-VL-1.3B-base | 4096            | [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002Fdeepseek-vl-1.3b-base) |\n| DeepSeek-VL-1.3B-chat | 4096            | [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002Fdeepseek-vl-1.3b-chat) |\n| DeepSeek-VL-7B-base   | 4096            | [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002Fdeepseek-vl-7b-base)   |\n| DeepSeek-VL-7B-chat   | 4096            | [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002Fdeepseek-vl-7b-chat)   |\n\n\n\n## 4. Quick Start\n\n### Installation\n\nOn the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command:\n\n```shell\npip install -e .\n```\n\n### Simple Inference Example\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\n\nfrom deepseek_vl.models import VLChatProcessor, MultiModalityCausalLM\nfrom deepseek_vl.utils.io import load_pil_images\n\n\n# specify the path to the model\nmodel_path = \"deepseek-ai\u002Fdeepseek-vl-7b-chat\"\nvl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)\ntokenizer = vl_chat_processor.tokenizer\n\nvl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)\nvl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()\n\n## single image conversation example\nconversation = [\n    {\n        \"role\": \"User\",\n        \"content\": \"\u003Cimage_placeholder>Describe each stage of this image.\",\n        \"images\": [\".\u002Fimages\u002Ftraining_pipelines.jpg\"],\n    },\n    {\"role\": \"Assistant\", \"content\": \"\"},\n]\n\n## multiple images (or in-context learning) conversation example\n# conversation = [\n#     {\n#         \"role\": \"User\",\n#         \"content\": \"\u003Cimage_placeholder>A dog wearing nothing in the foreground, \"\n#                    \"\u003Cimage_placeholder>a dog wearing a santa hat, \"\n#                    \"\u003Cimage_placeholder>a dog wearing a wizard outfit, and \"\n#                    \"\u003Cimage_placeholder>what's the dog wearing?\",\n#         \"images\": [\n#             \"images\u002Fdog_a.png\",\n#             \"images\u002Fdog_b.png\",\n#             \"images\u002Fdog_c.png\",\n#             \"images\u002Fdog_d.png\",\n#         ],\n#     },\n#     {\"role\": \"Assistant\", \"content\": \"\"}\n# ]\n\n# load images and prepare for inputs\npil_images = load_pil_images(conversation)\nprepare_inputs = vl_chat_processor(\n    conversations=conversation,\n    images=pil_images,\n    force_batchify=True\n).to(vl_gpt.device)\n\n# run image encoder to get the image embeddings\ninputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)\n\n# run the model to get the response\noutputs = vl_gpt.language_model.generate(\n    inputs_embeds=inputs_embeds,\n    attention_mask=prepare_inputs.attention_mask,\n    pad_token_id=tokenizer.eos_token_id,\n    bos_token_id=tokenizer.bos_token_id,\n    eos_token_id=tokenizer.eos_token_id,\n    max_new_tokens=512,\n    do_sample=False,\n    use_cache=True\n)\n\nanswer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)\nprint(f\"{prepare_inputs['sft_format'][0]}\", answer)\n```\n\n### CLI Chat\n```bash\npython cli_chat.py --model_path \"deepseek-ai\u002Fdeepseek-vl-7b-chat\"\n\n# or local path\npython cli_chat.py --model_path \"local model path\"\n```\n\n### Gradio Demo\n```bash\npip install -e .[gradio]\n\npython deepseek_vl\u002Fserve\u002Fapp_deepseek.py\n```\n![](.\u002Fimages\u002Fgradio_demo.png)\n\nHave Fun!\n\n## 5. License\n\nThis code repository is licensed under [the MIT License](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-LLM\u002Fblob\u002FHEAD\u002FLICENSE-CODE). The use of DeepSeek-VL Base\u002FChat models is subject to [DeepSeek Model License](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-LLM\u002Fblob\u002FHEAD\u002FLICENSE-MODEL). DeepSeek-VL series (including Base and Chat) supports commercial use.\n\n## 6. Citation\n\n```\n@misc{lu2024deepseekvl,\n      title={DeepSeek-VL: Towards Real-World Vision-Language Understanding},\n      author={Haoyu Lu and Wen Liu and Bo Zhang and Bingxuan Wang and Kai Dong and Bo Liu and Jingxiang Sun and Tongzheng Ren and Zhuoshu Li and Hao Yang and Yaofeng Sun and Chengqi Deng and Hanwei Xu and Zhenda Xie and Chong Ruan},\n      year={2024},\n      eprint={2403.05525},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI}\n}\n```\n\n## 7. Contact\n\nIf you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).\n","DeepSeek-VL 是一个面向现实世界视觉和语言理解应用的开源模型。该项目具备处理逻辑图、网页、公式识别、科学文献、自然图像及复杂场景中的具身智能等多模态理解能力。基于 Python 开发，采用预训练技术提升模型在跨模态任务上的表现。适用于需要结合图像与文本信息进行综合分析的场景，如文档解析、在线教育辅助工具开发以及科研资料管理等。",2,"2026-06-11 03:40:49","high_star"]