[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74113":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},74113,"DeepSeek-OCR-2","deepseek-ai\u002FDeepSeek-OCR-2","deepseek-ai","Visual Causal Flow","",null,"Python",2947,258,19,50,0,16,43,117,48,109.24,"Apache License 2.0",false,"main",[],"2026-06-12 04:01:13","\u003C!-- markdownlint-disable first-line-h1 -->\n\u003C!-- markdownlint-disable html -->\n\u003C!-- markdownlint-disable no-duplicate-header -->\n\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo.svg\" width=\"60%\" alt=\"DeepSeek AI\" \u002F>\n\u003C\u002Fdiv>\n\n\n\u003Chr>\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\" target=\"_blank\">\n    \u003Cimg alt=\"Homepage\" src=\"assets\u002Fbadge.svg\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-OCR-2\" target=\"_blank\">\n    \u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTc7c45Zzu5\" target=\"_blank\">\n    \u003Cimg alt=\"Discord\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fdeepseek_ai\" target=\"_blank\">\n    \u003Cimg alt=\"Twitter Follow\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-deepseek_ai-white?logo=x&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-OCR-2\">\u003Cb>📥 Model Download\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-OCR-2\u002Fblob\u002Fmain\u002FDeepSeek_OCR2_paper.pdf\">\u003Cb>📄 Paper Link\u003C\u002Fb>\u003C\u002Fa> |\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2601.20552\">\u003Cb>📄 Arxiv Paper Link\u003C\u002Fb>\u003C\u002Fa> |\n\u003C\u002Fp>\n\n\u003Ch2>\n\u003Cp align=\"center\">\n  \u003Ca href=\"\">DeepSeek-OCR 2: Visual Causal Flow\u003C\u002Fa>\n\u003C\u002Fp>\n\u003C\u002Fh2>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"assets\u002Ffig1.png\" style=\"width: 600px\" align=center>\n\u003C\u002Fp>\n\u003Cp align=\"center\">\n\u003Ca href=\"\">Explore more human-like visual encoding.\u003C\u002Fa>       \n\u003C\u002Fp>\n\n\n## Contents\n- [Install](#install)\n- [vLLM Inference](#vllm-inference)\n- [Transformers Inference](#transformers-inference)\n  \n\n\n\n\n## Install\n>Our environment is cuda11.8+torch2.6.0.\n1. Clone this repository and navigate to the DeepSeek-OCR-2 folder\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-OCR-2.git\n```\n2. Conda\n```Shell\nconda create -n deepseek-ocr2 python=3.12.9 -y\nconda activate deepseek-ocr2\n```\n3. Packages\n\n- download the vllm-0.8.5 [whl](https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\u002Freleases\u002Ftag\u002Fv0.8.5) \n```Shell\npip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\npip install vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl\npip install -r requirements.txt\npip install flash-attn==2.7.3 --no-build-isolation\n```\n**Note:** if you want vLLM and transformers codes to run in the same environment, you don't need to worry about this installation error like: vllm 0.8.5+cu118 requires transformers>=4.51.1\n\n## vLLM-Inference\n- VLLM:\n>**Note:** change the INPUT_PATH\u002FOUTPUT_PATH and other settings in the DeepSeek-OCR2-master\u002FDeepSeek-OCR2-vllm\u002Fconfig.py\n```Shell\ncd DeepSeek-OCR2-master\u002FDeepSeek-OCR2-vllm\n```\n1. image: streaming output\n```Shell\npython run_dpsk_ocr2_image.py\n```\n2. pdf: concurrency (on-par speed with DeepSeek-OCR)\n```Shell\npython run_dpsk_ocr2_pdf.py\n```\n3. batch eval for benchmarks (i.e., OmniDocBench v1.5)\n```Shell\npython run_dpsk_ocr2_eval_batch.py\n```\n\n## Transformers-Inference\n- Transformers\n```python\nfrom transformers import AutoModel, AutoTokenizer\nimport torch\nimport os\nos.environ[\"CUDA_VISIBLE_DEVICES\"] = '0'\nmodel_name = 'deepseek-ai\u002FDeepSeek-OCR-2'\n\ntokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)\nmodel = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)\nmodel = model.eval().cuda().to(torch.bfloat16)\n\n# prompt = \"\u003Cimage>\\nFree OCR. \"\nprompt = \"\u003Cimage>\\n\u003C|grounding|>Convert the document to markdown. \"\nimage_file = 'your_image.jpg'\noutput_path = 'your\u002Foutput\u002Fdir'\n\nres = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 768, crop_mode=True, save_results = True)\n```\nor you can\n```Shell\ncd DeepSeek-OCR2-master\u002FDeepSeek-OCR2-hf\npython run_dpsk_ocr2.py\n```\n## Support-Modes\n- Dynamic resolution\n  - Default: (0-6)×768×768 + 1×1024×1024 — (0-6)×144 + 256 visual tokens ✅\n\n## Main Prompts\n```python\n# document: \u003Cimage>\\n\u003C|grounding|>Convert the document to markdown.\n# without layouts: \u003Cimage>\\nFree OCR.\n```\n\n\n\n\n## Acknowledgement\n\nWe would like to thank [DeepSeek-OCR](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-OCR\u002F), [Vary](https:\u002F\u002Fgithub.com\u002FUcas-HaoranWei\u002FVary\u002F), [GOT-OCR2.0](https:\u002F\u002Fgithub.com\u002FUcas-HaoranWei\u002FGOT-OCR2.0\u002F), [MinerU](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FMinerU), [PaddleOCR](https:\u002F\u002Fgithub.com\u002FPaddlePaddle\u002FPaddleOCR) for their valuable models.\n\nWe also appreciate the benchmark [OmniDocBench](https:\u002F\u002Fgithub.com\u002Fopendatalab\u002FOmniDocBench).\n\n## Citation\n\n```bibtex\n@article{wei2025deepseek,\n  title={DeepSeek-OCR: Contexts Optical Compression},\n  author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},\n  journal={arXiv preprint arXiv:2510.18234},\n  year={2025}\n}\n@article{wei2026deepseek,\n  title={DeepSeek-OCR 2: Visual Causal Flow},\n  author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},\n  journal={arXiv preprint arXiv:2601.20552},\n  year={2026}\n}\n","DeepSeek-OCR-2 是一个基于视觉因果流的光学字符识别（OCR）项目。它利用先进的深度学习技术，特别是Transformer架构，实现了对图像和PDF文档中文字内容的高效准确提取。该项目支持通过vLLM或Transformers库进行推理，并且在处理大规模数据集时表现出色，能够实现与现有解决方案相当甚至更快的速度。适用于需要高精度文本识别的应用场景，如文档数字化、信息检索系统等。其安装过程相对简单，依赖于Python环境及相关深度学习框架。",2,"2026-06-11 03:48:52","high_star"]