[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72336":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":16,"starSnapshotCount":16,"syncStatus":37,"lastSyncTime":38,"discoverSource":39},72336,"Skywork-R1V","SkyworkAI\u002FSkywork-R1V","SkyworkAI","Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.","https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.06167",null,"Python",3158,281,154,38,0,1,59.45,"MIT License",false,"main",[23,24,25,26,27,28,29,30,31,32,33],"deepseek-r1","grpo","llm","multimodal-r1","multimodal-understanding","r1v","reasoning","reinforcement-learning","skywork-r1v","vlm","vlm-r1","2026-06-12 04:01:04","\u003C!-- markdownlint-disable first-line-h1 -->\n\u003C!-- markdownlint-disable html -->\n\u003C!-- markdownlint-disable no-duplicate-header -->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FSkyworkAI\u002FSkywork-R1V\u002Fblob\u002Fmain\u002Fimgs\u002Fskywork_logo.png\" alt=\"Skywork Logo\" width=\"400\">\n  \u003Ch1>\u003Cstrong>Skywork-R1V4\u003C\u002Fstrong>\u003C\u002Fh1>\n\u003C\u002Fdiv>\n\n\u003Cfont size=7>\u003Cdiv align='center' >  [[📖 Skywork-R1V4 Report](https:\u002F\u002Fgithub.com\u002FSkyworkAI\u002FSkywork-R1V\u002Fblob\u002Fmain\u002FSkywork_R1V4.pdf)] \u003C\u002Fdiv>\u003C\u002Ffont>\n\nWelcome to the Skywork-R1V repository! Here, you'll find a series of state-of-the-art multimodal reasoning models with powerful agentic capabilities. From open-source versions with model weights and inference code to our latest closed-source offerings, the Skywork-R1V series delivers exceptional performance across vision understanding, code execution, and deep research tasks.\n\n## 🔥 News\n\n**💥 November 18, 2025**: We released **Skywork-R1V4-Lite**, a lightweight and ultra-fast closed-source multimodal reasoning model that achieves exceptional image understanding capabilities through code execution tools. R1V4-Lite features blazing-fast inference speed and can be integrated with search tools to enable deep research capabilities. Available now on [Skywork Platform](https:\u002F\u002Fdocs.skyworkmodel.ai\u002Fr1v4\u002Fapi-reference\u002Fcompletions.html), and coming soon to OpenRouter—stay tuned!\n\n**July 15, 2025**: We've released quantized versions of ​Skywork-R1V3​ for efficient inference:\n* AWQ Quantization: [🤗 Skywork-R1V3-38B-AWQ](https:\u002F\u002Fhuggingface.co\u002FSkywork\u002FSkywork-R1V3-38B-AWQ) -- Supports single-GPU inference (VRAM ≥ 30GB).\n* ​GGUF Quantization (4-bit & 8-bit)​: [🤗 Skywork-R1V3-38B-GGUF](https:\u002F\u002Fhuggingface.co\u002FSkywork\u002FSkywork-R1V3-38B-GGUF) -- Optimized for CPU-based inference.\n\n**July 9, 2025**: We released Skywork-R1V3-38B [[🤗 Skywork-R1V3-38B](https:\u002F\u002Fhuggingface.co\u002FSkywork\u002FSkywork-R1V3-38B)], the latest and most powerful open-source multimodal reasoning model in the Skywork series, pushing the boundaries of multimodal and cross-disciplinary intelligence. Mainly through RL algorithm in post-training, R1V3 significantly enhances multimodal reasoning ablity and achieves open-source state-of-the-art (SOTA) performance across multiple multimodal reasoning benchmarks, e.g. 76.0 on MMMU.\n\n**April 28, 2025**: We released awq quantized version of Skywork R1V2[[🤗 Skywork-R1V2-38B-AWQ](https:\u002F\u002Fhuggingface.co\u002FSkywork\u002FSkywork-R1V2-38B-AWQ)], supporting single-card (above 30GB) inference.\n\n **April 24, 2025**: We released **Skywork-R1V2**, an advanced open-source multimodal reasoning model that demonstrates strong performance across a range of multimodal reasoning benchmarks including MMMU, MMMU-Pro, MathVista, and OlympiadBench.[[🤗 Skywork-R1V2-38B](https:\u002F\u002Fhuggingface.co\u002FSkywork\u002FSkywork-R1V2-38B)][[📖R1V2 Report](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16656)] \n \n**April 9, 2025**: Our technical report is currently available on arxiv: [[Skywork-R1V: Pioneering Multimodal Reasoning with CoT](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.05599)].\n\n**Mar 26, 2025**: We released awq quantized version of Skywork R1V[[🤗 Skywork-R1V-38B-AWQ](https:\u002F\u002Fhuggingface.co\u002FSkywork\u002FSkywork-R1V-38B-AWQ)], supporting single-card (above 30GB) inference.\n\n**Mar 18, 2025**: We are thrilled to introduce Skywork R1V, the first industry open-sourced multimodal reasoning model with advanced visual chain-of-thought capabilities, pushing the boundaries of AI-driven vision and logical inference! 🚀\n\n\n## 📊 Evaluation\nSkywork-R1V4-Lite demonstrates state-of-the-art performance on various multimodal tasks, particularly excelling in perception and deep research capabilities.\n\n**Comparison of Skywork-R1V4 with Leading Multimodal Models**\n\n| Benchmark | Split | Skywork-R1V4\u003Cbr\u002F>30B(A3B) | Qwen3-VL\u003Cbr\u002F>30B(A3B) | Qwen3-VL\u003Cbr\u002F>235B(A22B) | Gemini 2.5 Flash | Gemini 2.5 Pro |\n|-----------|-------|:-------------------------:|:---------------------:|:-----------------------:|:----------------:|:--------------:|\n| **Perception** |\n| HIRbench-4K | FSP | **91.8** | 88.5 | 89.0 | 81.5 | 85.5 |\n| | FCP | 73.8 | 68.5 | **77.0** | 74.0 | 82.3 |\n| | Overall | **82.8** | 78.5 | 83.0 | 77.5 | 83.9 |\n| HIRbench-8K | FSP | **88.8** | 80.3 | 83.0 | 75.8 | 83.0 |\n| | FCP | 70.8 | 68.3 | **77.3** | 71.8 | 80.0 |\n| | Overall | **79.8** | 74.2 | 80.4 | 73.7 | 81.5 |\n| MME-Real | Perception | **73.4** | 70.4 | 74.3 | 62.3 | 73.1 |\n| | Reasoning | 56.4 | 47.7 | 52.5 | 51.0 | **58.2** |\n| | Overall | **71.4** | 67.7 | 71.6 | 60.9 | 71.3 |\n| MME-Real-CN | Perception | **76.3** | 72.6 | 76.0 | 65.8 | 74.5 |\n| | Reasoning | **59.4** | 45.0 | 53.8 | 51.3 | 58.3 |\n| | Overall | **70.8** | 63.7 | 68.8 | 61.2 | 69.3 |\n| MME-Real-Lite | Perception | **63.2** | 58.0 | 60.2 | 50.4 | 59.9 |\n| | Reasoning | **53.2** | 46.3 | 50.7 | 49.9 | 55.1 |\n| | Overall | **59.3** | 53.2 | 56.5 | 50.2 | 58.3 |\n| V* | Attribute | **90.4** | 81.7 | 79.1 | 77.3 | 86.8 |\n| | Spatial | **84.2** | 82.9 | 82.9 | 64.4 | 68.4 |\n| | Overall | **88.0** | 82.2 | 80.6 | 72.3 | 79.1 |\n| TreeBench | Overall | 48.4 | 42.7 | 49.6 | 45.9 | **54.6** |\n| Visual Probe | Hard | 42.4 | 30.1 | **42.4** | 28.3 | 33.9 |\n| | Medium | 42.9 | 35.8 | 39.1 | 31.3 | **35.4** |\n| | Easy | **66.7** | 65.2 | 65.9 | 45.3 | 49.6 |\n| **Deep Research** |\n| MMSearch | Overall | **66.1** | 18.7 | 48.0 | 64.9 | 71.9 |\n| FVQA | Overall | **67.2** | 53.3 | 54.4 | 60.7 | 72.0 |\n| BrowseComp-VL | Overall | 38.4 | 30.0 | 31.6 | 40.8 | **45.4** |\n\n**Key Highlights:**\n- 🏆 Skywork-R1V4 achieves **top performance** among 30B-class models across most perception benchmarks\n- 🚀 **Outstanding FSP scores** on HIRbench-4K (91.8) and HIRbench-8K (88.8), demonstrating exceptional high-resolution image understanding\n- 🔍 **Strong deep research capabilities** with competitive performance on MMSearch (66.1) and FVQA (67.2)\n\n \n## 🚀 How to Use Skywork-R1V4-Lite\n\nSkywork-R1V4-Lite is available as an API service. You can access it through [Skywork Platform](https:\u002F\u002Fplatform.skyworkmodel.ai) or [OpenRouter](https:\u002F\u002Fopenrouter.ai) (coming soon).\n\n### 1. Get API Access\n\nVisit [Skywork Platform](https:\u002F\u002Fplatform.skyworkmodel.ai) to obtain your API key.\n\n### 2. Quick Start with Python\n\n```python\nimport requests\nimport base64\n\ndef image_to_base64(image_path):\n    with open(image_path, \"rb\") as f:\n        image_data = f.read()\n        return base64.b64encode(image_data).decode(\"utf-8\")\n\n# API configuration\nbase_url = \"https:\u002F\u002Fapi.skyworkmodel.ai\"\napi_key = \"your_api_key_here\"\n\n# Prepare the request\nimage_base64 = image_to_base64(\"path\u002Fto\u002Fyour\u002Fimage.jpg\")\ncontent = [\n    {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image\u002Fjpeg;base64,{image_base64}\"}},\n    {\"type\": \"text\", \"text\": \"What's in this image?\"}\n]\n\n# Call the API\nresponse = requests.post(\n    f\"{base_url}\u002Fv1\u002Fchat\u002Fcompletions\",\n    headers={\n        \"Authorization\": f\"Bearer {api_key}\",\n        \"Content-Type\": \"application\u002Fjson\"\n    },\n    json={\n        \"model\": \"skywork\u002Fr1v4-lite\",\n        \"messages\": [{\"role\": \"user\", \"content\": content}],\n        \"stream\": False,\n        \"enable_search\": False  # Set to True for deep research capabilities\n    }\n)\n\nprint(response.json()[\"choices\"][0][\"message\"][\"content\"])\n```\n\n### 3. Batch Testing with Our Tool Suite\n\nWe provide a comprehensive testing toolkit in the `r1v4` folder for batch processing and result visualization.\n\n#### Clone and Setup\n\n```shell\ngit clone https:\u002F\u002Fgithub.com\u002FSkyworkAI\u002FSkywork-R1V.git\ncd Skywork-R1V\u002Fr1v4\npip install -r requirements.txt\n```\n\n#### Prepare Test Cases\n\nEdit `test_cases.jsonl` with your test cases (one JSON per line):\n\n```json\n{\"image\": \".\u002Fdemo_image\u002Fdemo_1.png\", \"question\": \"What's in this image?\"}\n{\"image\": \"\", \"question\": \"This is a text-only question\"}\n```\n\n#### Run Batch Tests\n\n```shell\n# Non-streaming mode (default)\npython3 batch_nonstream.py\n\n# Streaming mode\npython3 batch_stream.py\n\n# With custom input\u002Foutput files\npython3 batch_nonstream.py input.jsonl output.jsonl\n\n# Using planner model for task planning\npython3 batch_planner_nonstream.py\n```\n\n#### Visualize Results\n\n```shell\n# Start the web viewer\npython3 visual.py\n\n# Then open browser and input result file path (e.g., result_nonstream.jsonl)\n```\n\n#### Parse Structured Responses\n\n```python\nfrom parse_utils import parse_full_response\n\n# Parse the response to extract reasoning steps, tool calls, and observations\nparsed = parse_full_response(response_text)\n\n# Access structured data\nfor round_data in parsed['rounds']:\n    print(f\"Round {round_data['round_num']}\")\n    print(f\"Thinking: {round_data['think']}\")\n    print(f\"Tool: {round_data['tool_call']['name']}\")\n```\n\n### 4. Features\n\n- **Code Execution**: R1V4-Lite can write and execute Python code for complex tasks\n- **Deep Research**: Enable `enable_search=True` to integrate web search capabilities\n- **Multi-turn Reasoning**: Automatic multi-step reasoning with tool usage\n- **Streaming Support**: Real-time response streaming for better user experience\n\n## License\nThis code repository is licensed under [the MIT License](https:\u002F\u002Fgithub.com\u002FSkyworkAI\u002FSkywork-R1V\u002Fblob\u002Fmain\u002FLICENSE). \n\n✅ Commercial use permitted\n\n✅ Modification allowed\n\n✅ Distribution allowed\n\n❌ No liability\n\nSkywork-R1V4-Lite is based on [Qwen3-VL-30B-A3B-Instruct](https:\u002F\u002Fhuggingface.co\u002FQwen\u002FQwen3-VL-30B-A3B-Instruct) as the base model, which is licensed under the Apache 2.0 License.\n\n## Acknowledgments\n\nWe would like to express our gratitude to the following open-source projects that have been instrumental in our work:\n\n- [MS-SWIFT](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fswift): A powerful framework for model training and fine-tuning that greatly facilitated our model development process.\n- [VLMEvalKit](https:\u002F\u002Fgithub.com\u002Fopen-compass\u002FVLMEvalKit): A comprehensive evaluation toolkit for vision-language models that enabled our extensive benchmarking.\n\n## 🔮 Future Directions\n\nWe are excited to share our vision for the future development of the Skywork-R1V series:\n\n- **Skywork-R1V4-Pro**: We are developing a more powerful model with enhanced capabilities across all benchmarks. Stay tuned for the upcoming release!\n- **Reinforcement Learning Research**: We are actively exploring the application of reinforcement learning techniques to advance multimodal reasoning and agentic capabilities, pushing the boundaries of what's possible in vision-language AI.\n\n## ❤️Misc\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=SkyworkAI\u002FSkywork-R1V&type=Date)](https:\u002F\u002Fstar-history.com\u002F#SkyworkAI\u002FSkywork-R1V&Date)\n\n## Citation\nIf you use Skywork-R1V in your research, please cite:\n```\n@misc{zhang2025skyworkr1v4agenticmultimodalintelligence,\n      title={Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch}, \n      author={Yifan Zhang and Liang Hu and Haofeng Sun and Peiyu Wang and Yichen Wei and Shukang Yin and Jiangbo Pei and Wei Shen and Peng Xia and Yi Peng and Tianyidan Xie and Eric Li and Yang Liu and Xuchen Song and Yahui Zhou},\n      year={2025},\n      eprint={2512.02395},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.02395}, \n}\n```\n```\n@misc{shen2025skyworkr1v3technicalreport,\n      title={Skywork-R1V3 Technical Report}, \n      author={Wei Shen and Jiangbo Pei and Yi Peng and Xuchen Song and Yang Liu and Jian Peng and Haofeng Sun and Yunzhuo Hao and Peiyu Wang and Jianhao Zhang and Yahui Zhou},\n      year={2025},\n      eprint={2507.06167},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.06167}, \n}\n```\n```\n@misc{wang2025skyworkr1v2multimodalhybrid,\n      title={Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning}, \n      author={Peiyu Wang and Yichen Wei and Yi Peng and Xiaokun Wang and Weijie Qiu and Wei Shen and Tianyidan Xie and Jiangbo Pei and Jianhao Zhang and Yunzhuo Hao and Xuchen Song and Yang Liu and Yahui Zhou},\n      year={2025},\n      eprint={2504.16656},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.16656}, \n}\n```\n\n```\n@misc{peng2025skyworkr1vpioneeringmultimodal,\n      title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought}, \n      author={Yi Peng and Peiyu Wang and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou},\n      year={2025},\n      eprint={2504.05599},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.05599}, \n}\n```\n","Skywork-R1V 是由 Skywork AI 开发的一系列先进的多模态人工智能模型，专注于视觉-语言推理。其核心功能包括强大的多模态理解、代码执行和深度研究任务处理能力，采用强化学习算法在后训练阶段显著提升了模型的多模态推理性能。项目提供开源版本及闭源轻量级版本，支持单GPU或CPU高效推理，并通过量化技术优化了模型部署效率。Skywork-R1V 适用于需要高级图像理解和跨学科智能的应用场景，如科研辅助、自动化内容生成等。",2,"2026-06-11 03:41:25","high_star"]