[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72094":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},72094,"DeepSeek-VL2","deepseek-ai\u002FDeepSeek-VL2","deepseek-ai","DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding",null,"Python",5298,1810,81,101,0,3,19,9,75.17,"MIT License",false,"main",[],"2026-06-12 04:01:03","\u003C!-- markdownlint-disable first-line-h1 -->\n\u003C!-- markdownlint-disable html -->\n\u003C!-- markdownlint-disable no-duplicate-header -->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"images\u002Flogo.svg\" width=\"60%\" alt=\"DeepSeek AI\" \u002F>\n\u003C\u002Fdiv>\n\u003Chr>\n\u003Cdiv align=\"center\">\n\n  \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\" target=\"_blank\">\n    \u003Cimg alt=\"Homepage\" src=\"images\u002Fbadge.svg\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002Fdeepseek-vl2-small\" target=\"_blank\">\n    \u003Cimg alt=\"Chat\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤖%20Chat-DeepSeek%20VL-536af5?color=536af5&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\" target=\"_blank\">\n    \u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n\n\u003Cdiv align=\"center\">\n\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTc7c45Zzu5\" target=\"_blank\">\n    \u003Cimg alt=\"Discord\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"images\u002Fqr.jpeg\" target=\"_blank\">\n    \u003Cimg alt=\"Wechat\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fdeepseek_ai\" target=\"_blank\">\n    \u003Cimg alt=\"Twitter Follow\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-deepseek_ai-white?logo=x&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n  \u003Ca href=\"LICENSE-CODE\">\n    \u003Cimg alt=\"Code License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode_License-MIT-f5de53?&color=f5de53\">\n  \u003C\u002Fa>\n  \u003Ca href=\"LICENSE-MODEL\">\n    \u003Cimg alt=\"Model License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel_License-Model_Agreement-f5de53?&color=f5de53\">\n  \u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-VL2\u002Ftree\u002Fmain?tab=readme-ov-file#3-model-download\">\u003Cb>📥 Model Download\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-VL2\u002Ftree\u002Fmain?tab=readme-ov-file#4-quick-start\">\u003Cb>⚡ Quick Start\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-VL2\u002Ftree\u002Fmain?tab=readme-ov-file#5-license\">\u003Cb>📜 License\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-VL2\u002Ftree\u002Fmain?tab=readme-ov-file#6-citation\">\u003Cb>📖 Citation\u003C\u002Fb>\u003C\u002Fa> \u003Cbr>\n  \u003Ca href=\".\u002FDeepSeek_VL2_paper.pdf\">\u003Cb>📄 Paper Link\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.10302\">\u003Cb>📄 Arxiv Paper Link\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002Fdeepseek-vl2-small\">\u003Cb>👁️ Demo\u003C\u002Fb>\u003C\u002Fa>\n\u003C\u002Fp>\n\n## 1. Introduction\n\nIntroducing DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition,  document\u002Ftable\u002Fchart understanding, and visual grounding. Our model series is composed of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively.\nDeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters compared to existing open-source dense and MoE-based models.\n\n\n[DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding]()\n\nZhiyu Wu*, Xiaokang Chen*, Zizheng Pan*, Xingchao Liu*, Wen Liu**, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, Chong Ruan*** (* Equal Contribution, ** Project Lead, *** Corresponding author)\n\n![](.\u002Fimages\u002Fvl2_teaser.jpeg)\n\n## 2. Release\n✅ \u003Cb>2025-2-6\u003C\u002Fb>: Naive Implemented Gradio Demo on Huggingface Space [deepseek-vl2-small](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fdeepseek-ai\u002Fdeepseek-vl2-small).\n\n✅ \u003Cb>2024-12-25\u003C\u002Fb>: Gradio Demo Example, Incremental Prefilling and VLMEvalKit Support.\n\n✅ \u003Cb>2024-12-13\u003C\u002Fb>: DeepSeek-VL2 family released, including \u003Ccode>DeepSeek-VL2-tiny\u003C\u002Fcode>, \u003Ccode>DeepSeek-VL2-small\u003C\u002Fcode>, \u003Ccode>DeepSeek-VL2\u003C\u002Fcode>.\n\n## 3. Model Download\n\nWe release the DeepSeek-VL2 family, including \u003Ccode>DeepSeek-VL2-tiny\u003C\u002Fcode>, \u003Ccode>DeepSeek-VL2-small\u003C\u002Fcode>, \u003Ccode>DeepSeek-VL2\u003C\u002Fcode>.\nTo support a broader and more diverse range of research within both academic and commercial communities.\nPlease note that the use of this model is subject to the terms outlined in [License section](#5-license).\n\n### Huggingface\n\n| Model        | Sequence Length | Download                                                                    |\n|--------------|-----------------|-----------------------------------------------------------------------------|\n| DeepSeek-VL2-tiny | 4096            | [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002Fdeepseek-vl2-tiny) |\n| DeepSeek-VL2-small | 4096            | [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002Fdeepseek-vl2-small) |\n| DeepSeek-VL2 | 4096            | [🤗 Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002Fdeepseek-vl2)   |\n\n\n## 4. Quick Start\n\n### Installation\n\nOn the basis of `Python >= 3.8` environment, install the necessary dependencies by running the following command:\n\n```shell\npip install -e .\n```\n\n### Simple Inference Example with One Image\n\n**Note: You may need 80GB GPU memory to run this script with deepseek-vl2-small and even larger for deepseek-vl2.**\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\n\nfrom deepseek_vl2.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM\nfrom deepseek_vl2.utils.io import load_pil_images\n\n\n# specify the path to the model\nmodel_path = \"deepseek-ai\u002Fdeepseek-vl2-tiny\"\nvl_chat_processor: DeepseekVLV2Processor = DeepseekVLV2Processor.from_pretrained(model_path)\ntokenizer = vl_chat_processor.tokenizer\n\nvl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)\nvl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()\n\n## single image conversation example\n## Please note that \u003C|ref|> and \u003C|\u002Fref|> are designed specifically for the object localization feature. These special tokens are not required for normal conversations.\n## If you would like to experience the grounded captioning functionality (responses that include both object localization and reasoning), you need to add the special token \u003C|grounding|> at the beginning of the prompt. Examples could be found in Figure 9 of our paper.\nconversation = [\n    {\n        \"role\": \"\u003C|User|>\",\n        \"content\": \"\u003Cimage>\\n\u003C|ref|>The giraffe at the back.\u003C|\u002Fref|>.\",\n        \"images\": [\".\u002Fimages\u002Fvisual_grounding_1.jpeg\"],\n    },\n    {\"role\": \"\u003C|Assistant|>\", \"content\": \"\"},\n]\n\n# load images and prepare for inputs\npil_images = load_pil_images(conversation)\nprepare_inputs = vl_chat_processor(\n    conversations=conversation,\n    images=pil_images,\n    force_batchify=True,\n    system_prompt=\"\"\n).to(vl_gpt.device)\n\n# run image encoder to get the image embeddings\ninputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)\n\n# run the model to get the response\noutputs = vl_gpt.language.generate(\n    inputs_embeds=inputs_embeds,\n    attention_mask=prepare_inputs.attention_mask,\n    pad_token_id=tokenizer.eos_token_id,\n    bos_token_id=tokenizer.bos_token_id,\n    eos_token_id=tokenizer.eos_token_id,\n    max_new_tokens=512,\n    do_sample=False,\n    use_cache=True\n)\n\nanswer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=False)\nprint(f\"{prepare_inputs['sft_format'][0]}\", answer)\n```\n\nAnd the output is something like:\n```\n\u003C|User|>: \u003Cimage>\n\u003C|ref|>The giraffe at the back.\u003C|\u002Fref|>.\n\n\u003C|Assistant|>: \u003C|ref|>The giraffe at the back.\u003C|\u002Fref|>\u003C|det|>[[580, 270, 999, 900]]\u003C|\u002Fdet|>\u003C｜end▁of▁sentence｜>\n```\n\n### Simple Inference Example with Multiple Images\n\n**Note: You may need 80GB GPU memory to run this script with deepseek-vl2-small and even larger for deepseek-vl2.**\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\n\nfrom deepseek_vl2.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM\nfrom deepseek_vl2.utils.io import load_pil_images\n\n\n# specify the path to the model\nmodel_path = \"deepseek-ai\u002Fdeepseek-vl2-tiny\"\nvl_chat_processor: DeepseekVLV2Processor = DeepseekVLV2Processor.from_pretrained(model_path)\ntokenizer = vl_chat_processor.tokenizer\n\nvl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)\nvl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()\n\n# multiple images\u002Finterleaved image-text\nconversation = [\n    {\n        \"role\": \"\u003C|User|>\",\n        \"content\": \"This is image_1: \u003Cimage>\\n\"\n                   \"This is image_2: \u003Cimage>\\n\"\n                   \"This is image_3: \u003Cimage>\\n Can you tell me what are in the images?\",\n        \"images\": [\n            \"images\u002Fmulti_image_1.jpeg\",\n            \"images\u002Fmulti_image_2.jpeg\",\n            \"images\u002Fmulti_image_3.jpeg\",\n        ],\n    },\n    {\"role\": \"\u003C|Assistant|>\", \"content\": \"\"}\n]\n\n# load images and prepare for inputs\npil_images = load_pil_images(conversation)\nprepare_inputs = vl_chat_processor(\n    conversations=conversation,\n    images=pil_images,\n    force_batchify=True,\n    system_prompt=\"\"\n).to(vl_gpt.device)\n\n# run image encoder to get the image embeddings\ninputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)\n\n# run the model to get the response\noutputs = vl_gpt.language.generate(\n    inputs_embeds=inputs_embeds,\n    attention_mask=prepare_inputs.attention_mask,\n    pad_token_id=tokenizer.eos_token_id,\n    bos_token_id=tokenizer.bos_token_id,\n    eos_token_id=tokenizer.eos_token_id,\n    max_new_tokens=512,\n    do_sample=False,\n    use_cache=True\n)\n\nanswer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=False)\nprint(f\"{prepare_inputs['sft_format'][0]}\", answer)\n```\n\nAnd the output is something like:\n```\n\u003C|User|>: This is image_1: \u003Cimage>\nThis is image_2: \u003Cimage>\nThis is image_3: \u003Cimage>\n Can you tell me what are in the images?\n\n\u003C|Assistant|>: The images show three different types of vegetables. Image_1 features carrots, which are orange with green tops. Image_2 displays corn cobs, which are yellow with green husks. Image_3 contains raw pork ribs, which are pinkish-red with some marbling.\u003C｜end▁of▁sentence｜>\n```\n\n### Simple Inference Example with Incremental Prefilling\n\n**Note: We use incremental prefilling to inference within 40GB GPU using deepseek-vl2-small.**\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\n\nfrom deepseek_vl2.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM\nfrom deepseek_vl2.utils.io import load_pil_images\n\n\n# specify the path to the model\nmodel_path = \"deepseek-ai\u002Fdeepseek-vl2-small\"\nvl_chat_processor: DeepseekVLV2Processor = DeepseekVLV2Processor.from_pretrained(model_path)\ntokenizer = vl_chat_processor.tokenizer\n\nvl_gpt: DeepseekVLV2ForCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)\nvl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()\n\n# multiple images\u002Finterleaved image-text\nconversation = [\n    {\n        \"role\": \"\u003C|User|>\",\n        \"content\": \"This is image_1: \u003Cimage>\\n\"\n                   \"This is image_2: \u003Cimage>\\n\"\n                   \"This is image_3: \u003Cimage>\\n Can you tell me what are in the images?\",\n        \"images\": [\n            \"images\u002Fmulti_image_1.jpeg\",\n            \"images\u002Fmulti_image_2.jpeg\",\n            \"images\u002Fmulti_image_3.jpeg\",\n        ],\n    },\n    {\"role\": \"\u003C|Assistant|>\", \"content\": \"\"}\n]\n\n# load images and prepare for inputs\npil_images = load_pil_images(conversation)\nprepare_inputs = vl_chat_processor(\n    conversations=conversation,\n    images=pil_images,\n    force_batchify=True,\n    system_prompt=\"\"\n).to(vl_gpt.device)\n\nwith torch.no_grad():\n    # run image encoder to get the image embeddings\n    inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)\n\n    # incremental_prefilling when using 40G GPU for vl2-small\n    inputs_embeds, past_key_values = vl_gpt.incremental_prefilling(\n        input_ids=prepare_inputs.input_ids,\n        images=prepare_inputs.images,\n        images_seq_mask=prepare_inputs.images_seq_mask,\n        images_spatial_crop=prepare_inputs.images_spatial_crop,\n        attention_mask=prepare_inputs.attention_mask,\n        chunk_size=512 # prefilling size\n    )\n\n    # run the model to get the response\n    outputs = vl_gpt.generate(\n        inputs_embeds=inputs_embeds,\n        input_ids=prepare_inputs.input_ids,\n        images=prepare_inputs.images,\n        images_seq_mask=prepare_inputs.images_seq_mask,\n        images_spatial_crop=prepare_inputs.images_spatial_crop,\n        attention_mask=prepare_inputs.attention_mask,\n        past_key_values=past_key_values,\n\n        pad_token_id=tokenizer.eos_token_id,\n        bos_token_id=tokenizer.bos_token_id,\n        eos_token_id=tokenizer.eos_token_id,\n        max_new_tokens=512,\n\n        do_sample=False,\n        use_cache=True,\n    )\n\n    answer = tokenizer.decode(outputs[0][len(prepare_inputs.input_ids[0]):].cpu().tolist(), skip_special_tokens=False)\n\nprint(f\"{prepare_inputs['sft_format'][0]}\", answer)\n```\n\nAnd the output is something like:\n```\n\u003C|User|>: This is image_1: \u003Cimage>\nThis is image_2: \u003Cimage>\nThis is image_3: \u003Cimage>\n Can you tell me what are in the images?\n\n\u003C|Assistant|>: The first image contains carrots. The second image contains corn. The third image contains meat.\u003C｜end▁of▁sentence｜>\n```\n\nParse the bounding box coordinates, please refer to [parse_ref_bbox](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-VL2\u002Fblob\u002Fmain\u002Fdeepseek_vl2\u002Fserve\u002Fapp_modules\u002Futils.py#L270-L298).\n\n\n### Full Inference Example\n```shell\n# without incremental prefilling\nCUDA_VISIBLE_DEVICES=0 python inference.py --model_path \"deepseek-ai\u002Fdeepseek-vl2\"\n\n# with incremental prefilling, when using 40G GPU for vl2-small\nCUDA_VISIBLE_DEVICES=0 python inference.py --model_path \"deepseek-ai\u002Fdeepseek-vl2-small\" --chunk_size 512\n\n```\n\n\n### Gradio Demo\n\n* Install the necessary dependencies:\n```shell\npip install -e .[gradio]\n```\n\n* then run the following command:\n\n```shell\n# vl2-tiny, 3.37B-MoE in total, activated 1B, can be run on a single GPU \u003C 40GB\nCUDA_VISIBLE_DEVICES=2 python web_demo.py \\\n--model_name \"deepseek-ai\u002Fdeepseek-vl2-tiny\"  \\\n--port 37914\n\n\n# vl2-small, 16.1B-MoE in total, activated 2.4B\n# If run on A100 40GB GPU, you need to set the `--chunk_size 512` for incremental prefilling for saving memory and it might be slow.\n# If run on > 40GB GPU, you can ignore the `--chunk_size 512` for faster response.\nCUDA_VISIBLE_DEVICES=2 python web_demo.py \\\n--model_name \"deepseek-ai\u002Fdeepseek-vl2-small\"  \\\n--port 37914 \\\n--chunk_size 512\n\n# # vl27.5-MoE in total, activated 4.2B\nCUDA_VISIBLE_DEVICES=2 python web_demo.py \\\n--model_name \"deepseek-ai\u002Fdeepseek-vl2\"  \\\n--port 37914\n```\n\n* **Important**: This is a basic and native demo implementation without any deployment optimizations, which may result in slower performance. For production environments, consider using optimized deployment solutions, such as vllm, sglang, lmdeploy, etc. These optimizations will help achieve faster response times and better cost efficiency.\n\n## 5. License\n\nThis code repository is licensed under [MIT License](.\u002FLICENSE-CODE). The use of DeepSeek-VL2 models is subject to [DeepSeek Model License](.\u002FLICENSE-MODEL). DeepSeek-VL2 series supports commercial use.\n\n## 6. Citation\n\n```\n@misc{wu2024deepseekvl2mixtureofexpertsvisionlanguagemodels,\n      title={DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding},\n      author={Zhiyu Wu and Xiaokang Chen and Zizheng Pan and Xingchao Liu and Wen Liu and Damai Dai and Huazuo Gao and Yiyang Ma and Chengyue Wu and Bingxuan Wang and Zhenda Xie and Yu Wu and Kai Hu and Jiawei Wang and Yaofeng Sun and Yukun Li and Yishi Piao and Kang Guan and Aixin Liu and Xin Xie and Yuxiang You and Kai Dong and Xingkai Yu and Haowei Zhang and Liang Zhao and Yisong Wang and Chong Ruan},\n      year={2024},\n      eprint={2412.10302},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.10302},\n}\n```\n\n## 7. Contact\n\nIf you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).\n","DeepSeek-VL2 是一个先进的多模态理解模型，通过混合专家（MoE）架构显著提升了视觉-语言任务的处理能力。该项目的核心功能包括视觉问答、光学字符识别、文档\u002F表格\u002F图表理解和视觉定位等，适用于需要跨模态信息处理的应用场景。该系列模型提供了三种不同规模的版本：DeepSeek-VL2-Tiny (1.0B参数)、DeepSeek-VL2-Small (2.8B参数) 和 DeepSeek-VL2 (4.5B参数)，能够满足从轻量级到高性能的不同需求。基于Python开发，并采用MIT许可证开源，适合研究者和开发者在实际项目中快速集成与应用。",2,"2026-06-11 03:40:21","high_star"]