[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72580":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":14,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},72580,"MiMo","XiaomiMiMo\u002FMiMo","XiaomiMiMo","MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining","",null,"Python",2159,94,9,38,0,3,17,54,27.93,"Apache License 2.0",false,"main",[],"2026-06-12 02:03:05","\u003Cdiv align=\"center\">\n  \u003Cpicture>\n    \u003Csource srcset=\"https:\u002F\u002Fgithub.com\u002FXiaomiMiMo\u002FMiMo\u002Fraw\u002Fmain\u002Ffigures\u002FXiaomi_MiMo_darkmode.png?raw=true\" media=\"(prefers-color-scheme: dark)\">\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FXiaomiMiMo\u002FMiMo\u002Fraw\u002Fmain\u002Ffigures\u002FXiaomi_MiMo.png?raw=true\" width=\"60%\" alt=\"Xiaomi-MiMo\" \u002F>\n  \u003C\u002Fpicture>\n\u003C\u002Fdiv>\n\n\u003Ch3 align=\"center\">\n  \u003Cb>\n    \u003Cspan>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u003C\u002Fspan>\n    \u003Cbr\u002F>\n    Unlocking the Reasoning Potential of Language Model\u003Cbr\u002F>From Pretraining to Posttraining\n    \u003Cbr\u002F>\n    \u003Cspan>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u003C\u002Fspan>\n    \u003Cbr\u002F>\n  \u003C\u002Fb>\n\u003C\u002Fh3>\n\n\u003Cbr\u002F>\n\n\u003Cdiv align=\"center\" style=\"line-height: 1;\">\n  |\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FXiaomiMiMo\" target=\"_blank\">🤗 HuggingFace\u003C\u002Fa>\n  &nbsp;|\n  \u003Ca href=\"https:\u002F\u002Fwww.modelscope.cn\u002Forganization\u002FXiaomiMiMo\" target=\"_blank\">🤖️ ModelScope\u003C\u002Fa>\n  &nbsp;|\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.07608\" target=\"_blank\">📔 Technical Report\u003C\u002Fa>\n  &nbsp;|\n  \u003Cbr\u002F>\n\u003C\u002Fdiv>\n\n\u003Cbr\u002F>\n\n---\n\n## Updates\n\n[2025.05.30] We scaled the SFT dataset from approximately 500K to 6M instances and continuously expanding the RL training window size from 32K to 48K, the performance of [MiMo-7B-RL-0530](https:\u002F\u002Fhuggingface.co\u002FXiaomiMiMo\u002FMiMo-7B-RL-0530) on AIME24 can be continuously improved and eventually surpass that of DeepSeek R1 (79.8).\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Benchmark\u003C\u002Fth>\n      \u003Cth>MiMo-7B-RL\u003C\u002Fth>\n      \u003Cth>MiMo-7B-RL-0530\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd colspan=\"3\">\u003Cstrong>Mathematics\u003C\u002Fstrong>\u003C\u002Ftd>\n      \u003Cp align=\"center\">\n        \u003Ctd rowspan=\"11\">\u003Cimg width=\"80%\" src=\"https:\u002F\u002Fgithub.com\u002FXiaomiMiMo\u002FMiMo\u002Fraw\u002Fmain\u002Ffigures\u002Flength.jpg?raw=true\">\u003C\u002Ftd>\n      \u003C\u002Fp>\n    \u003C\u002Ftr>\n    \u003Ctr>\u003Ctd>MATH500\u003Cbr\u002F>(Pass@1)\u003C\u002Ftd>\u003Ctd>95.8\u003C\u002Ftd>\u003Ctd>97.2\u003C\u002Ftd>\u003C\u002Ftr>\n    \u003Ctr>\u003Ctd>AIME 2024\u003Cbr\u002F>(Pass@1)\u003C\u002Ftd>\u003Ctd>68.2\u003C\u002Ftd>\u003Ctd>80.1\u003C\u002Ftd>\u003C\u002Ftr>\n    \u003Ctr>\u003Ctd>AIME 2025\u003Cbr\u002F>(Pass@1)\u003C\u002Ftd>\u003Ctd>55.4\u003C\u002Ftd>\u003Ctd>70.2\u003C\u002Ftd>\u003C\u002Ftr>\n    \u003Ctr>\u003Ctd colspan=\"3\">\u003Cstrong>Code\u003C\u002Fstrong>\u003C\u002Ftd>\u003C\u002Ftr>\n    \u003Ctr>\u003Ctd>LiveCodeBench v5\u003Cbr\u002F>(Pass@1)\u003C\u002Ftd>\u003Ctd>57.8\u003C\u002Ftd>\u003Ctd>60.9\u003C\u002Ftd>\u003C\u002Ftr>\n    \u003Ctr>\u003Ctd>LiveCodeBench v6\u003Cbr\u002F>(Pass@1)\u003C\u002Ftd>\u003Ctd>49.3\u003C\u002Ftd>\u003Ctd>52.2\u003C\u002Ftd>\u003C\u002Ftr>\n    \u003Ctr>\u003Ctd colspan=\"3\">\u003Cstrong>STEM\u003C\u002Fstrong>\u003C\u002Ftd>\u003C\u002Ftr>\n    \u003Ctr>\u003Ctd>GPQA-Diamond\u003Cbr\u002F>(Pass@1)\u003C\u002Ftd>\u003Ctd>54.4\u003C\u002Ftd>\u003Ctd>60.6\u003C\u002Ftd>\u003C\u002Ftr>\n    \u003Ctr>\u003Ctd colspan=\"3\">\u003Cstrong>General\u003C\u002Fstrong>\u003C\u002Ftd>\u003C\u002Ftr>\n    \u003Ctr>\u003Ctd>Alignbench1.1\u003Cbr\u002F>(Evaluated by GPT4.1)\u003C\u002Ftd>\u003Ctd>6.9\u003C\u002Ftd>\u003Ctd>7.4\u003C\u002Ftd>\u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n---\n\n## I. Introduction\n\nCurrently, most successful RL works, including open-source research, rely on relatively large base models, e.g., 32B models, particularly for enhancing code reasoning capabilities. Moreover, it was widely considered that achieving uniform and simultaneous improvements in both mathematical and code capabilities within a small model is challenging. Nonetheless, we believe that the effectiveness of the RL trained reasoning model relies on the inherent reasoning potential of the base model. To fully unlock the reasoning potential of language models, efforts must focus not only on post-training but also on pre-training strategies tailored to reasoning.\n\nIn this work, we present MiMo-7B, a series of models trained from scratch and born for reasoning tasks. Our RL experiments from MiMo-7B-Base show that our model possesses extraordinary reasoning potential, even surpassing much larger 32B models. Additionally, we perform RL training on a cold-started SFT model, resulting in MiMo-7B-RL, which demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1-mini.\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"80%\" src=\"https:\u002F\u002Fgithub.com\u002FXiaomiMiMo\u002FMiMo\u002Fraw\u002Fmain\u002Ffigures\u002Fcurve.png?raw=true\">\n\u003C\u002Fp>\n\nWe open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model.\nWe believe this report along with the models will provide valuable insights to develop powerful reasoning LLMs that benefit the larger community.\n\n### 🌟 Highlights\n\n- **Pre-Training: Base Model Born for Reasoning**\n  - We optimize the data preprocessing pipeline, enhancing text extraction toolkits and applying multi-dimensional data filtering to increase reasoning pattern density in pre-training data. We also employ multiple strategies to generate massive diverse synthetic reasoning data.\n  - We adopt a three-stage data mixture strategy for pre-training. Overall, MiMo-7B-Base is pre-trained on approximately 25 trillion tokens.\n  - We incorporate Multiple-Token Prediction as an additional training objective, which enhances model performance and accelerates inference.\n\n- **Post-Training Recipe: Pioneering Reasoning Model**\n    - We curate 130K mathematics and code problems as RL training data, which can be verified by rule-based verifiers. Each problem undergoes careful cleaning and difficulty assessment to ensure quality. We employ only rule-based accuracy rewards to avoid potential reward hacking.\n    - To mitigate the sparse reward issue for challenging code problems, we introduce a test difficulty driven code reward. By assigning fine-grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.\n    - We implement a data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, particularly in the later phases of RL training.\n\n- **RL Infrastructure**\n    - We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\\times$ faster training and $1.96\\times$ faster validation.\n    - We support MTP in vLLM and enhance the robustness of the inference engine in the RL system.\n\n## II. Model Details\n\nThe MTP layers of MiMo-7B is tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.\n\n\u003Cp align=\"center\">\n  \u003Cimg width=\"80%\" src=\"https:\u002F\u002Fgithub.com\u002FXiaomiMiMo\u002FMiMo\u002Fraw\u002Fmain\u002Ffigures\u002Farchitecture.png?raw=true\">\n\u003C\u002Fp>\n\n> Models are available at [https:\u002F\u002Fhuggingface.co\u002FXiaomiMiMo](https:\u002F\u002Fhuggingface.co\u002FXiaomiMiMo) and [https:\u002F\u002Fwww.modelscope.cn\u002Forganization\u002FXiaomiMiMo](https:\u002F\u002Fwww.modelscope.cn\u002Forganization\u002FXiaomiMiMo)\n\n|    **Model**    |                                **Description**                                |                            **Download (HuggingFace)**                             |                                  **Download (ModelScope)**                                  |\n| :-------------: | :---------------------------------------------------------------------------: | :-------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------: |\n|  MiMo-7B-Base   |               Base model with extraordinary reasoning potential               |    [🤗 XiaomiMiMo\u002FMiMo-7B-Base](https:\u002F\u002Fhuggingface.co\u002FXiaomiMiMo\u002FMiMo-7B-Base)    |    [🤖️ XiaomiMiMo\u002FMiMo-7B-Base](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FXiaomiMiMo\u002FMiMo-7B-Base)    |\n| MiMo-7B-RL-Zero |                       RL model trained from base model                        | [🤗 XiaomiMiMo\u002FMiMo-7B-RL-Zero](https:\u002F\u002Fhuggingface.co\u002FXiaomiMiMo\u002FMiMo-7B-RL-Zero) | [🤖️ XiaomiMiMo\u002FMiMo-7B-RL-Zero](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FXiaomiMiMo\u002FMiMo-7B-RL-Zero) |\n|   MiMo-7B-SFT   |                       SFT model trained from base model                       |     [🤗 XiaomiMiMo\u002FMiMo-7B-SFT](https:\u002F\u002Fhuggingface.co\u002FXiaomiMiMo\u002FMiMo-7B-SFT)     |     [🤖️ XiaomiMiMo\u002FMiMo-7B-SFT](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FXiaomiMiMo\u002FMiMo-7B-SFT)     |\n|   MiMo-7B-RL    | RL model trained from SFT model, superior performance matching OpenAI o1-mini |      [🤗 XiaomiMiMo\u002FMiMo-7B-RL](https:\u002F\u002Fhuggingface.co\u002FXiaomiMiMo\u002FMiMo-7B-RL)      |      [🤖️ XiaomiMiMo\u002FMiMo-7B-RL](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FXiaomiMiMo\u002FMiMo-7B-RL)      |\n\n## III. Evaluation Results\n\n| Benchmark                     | GPT-4o-0513 | Claude-3.5-Sonnet-1022 | OpenAI o1-mini | QwQ-32B-Preview | R1-Distill-Qwen-14B | R1-Distill-Qwen-7B | MiMo-7B-RL |\n| ----------------------------- | :---------: | :--------------------: | :------------: | :-------------: | :-----------------: | :----------------: | :--------: |\n| **General**                   |             |                        |                |                 |                     |                    |            |\n| GPQA Diamond\u003Cbr\u002F>(Pass@1)     |    49.9     |          65.0          |      60.0      |      54.5       |        59.1         |        49.1        |    54.4    |\n| SuperGPQA\u003Cbr\u002F>(Pass@1)        |    42.4     |          48.2          |      45.2      |      43.6       |        40.6         |        28.9        |    40.5    |\n| DROP\u003Cbr\u002F>(3-shot F1)          |    83.7     |          88.3          |      83.9      |      71.2       |        85.5         |        77.0        |    78.7    |\n| MMLU-Pro\u003Cbr\u002F>(EM)             |    72.6     |          78.0          |      80.3      |      52.0       |        68.8         |        53.5        |    58.6    |\n| IF-Eval\u003Cbr\u002F>(Prompt Strict)   |    84.3     |          86.5          |      84.8      |      40.4       |        78.3         |        60.5        |    61.0    |\n| **Mathematics**               |             |                        |                |                 |                     |                    |            |\n| MATH-500\u003Cbr\u002F>(Pass@1)         |    74.6     |          78.3          |      90.0      |      90.6       |        93.9         |        92.8        |    95.8    |\n| AIME 2024\u003Cbr\u002F>(Pass@1)        |     9.3     |          16.0          |      63.6      |      50.0       |        69.7         |        55.5        |    68.2    |\n| AIME 2025\u003Cbr\u002F>(Pass@1)        |    11.6     |          7.4           |      50.7      |      32.4       |        48.2         |        38.8        |    55.4    |\n| **Code**                      |             |                        |                |                 |                     |                    |            |\n| LiveCodeBench v5\u003Cbr\u002F>(Pass@1) |    32.9     |          38.9          |      53.8      |      41.9       |        53.1         |        37.6        |    57.8    |\n| LiveCodeBench v6\u003Cbr\u002F>(Pass@1) |    30.9     |          37.2          |      46.8      |      39.1       |        31.9         |        23.9        |    49.3    |\n\nMiMo-7B series\n\n| Benchmark                     | MiMo-7B-Base | MiMo-7B-RL-Zero | MiMo-7B-SFT | MiMo-7B-RL |\n| ----------------------------- | :----------: | :-------------: | :---------: | :--------: |\n| **Mathematics**               |              |                 |             |            |\n| MATH500\u003Cbr\u002F>(Pass@1)          |     37.4     |      93.6       |    93.0     |    95.8    |\n| AIME 2024\u003Cbr\u002F>(Pass@1)        |     32.9     |      56.4       |    58.7     |    68.2    |\n| AIME 2025\u003Cbr\u002F>(Pass@1)        |     24.3     |      46.3       |    44.3     |    55.4    |\n| **Code**                      |              |                 |             |            |\n| LiveCodeBench v5\u003Cbr\u002F>(Pass@1) |     32.9     |      49.1       |    52.3     |    57.8    |\n| LiveCodeBench v6\u003Cbr\u002F>(Pass@1) |     29.1     |      42.9       |    45.5     |    49.3    |\n\n> [!IMPORTANT]\n> The evaluations are conducted with `temperature=0.6`.\n> \n> AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond and IF-Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.\n\n## IV. Deployment\n\n### SGLang Inference\n\nThanks to the [MiMo model support](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Fpull\u002F5921) and [MTP](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang\u002Fpull\u002F6059) from the SGLang team, we supported MiMo in SGLang mainstream.\n\nExample Script\n\n```bash\n# Install the latest SGlang from main branch\npython3 -m uv pip install \"sglang[all] @ git+https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang.git\u002F@main#egg=sglang&subdirectory=python\"\n\n# Launch SGLang Server\npython3 -m sglang.launch_server --model-path XiaomiMiMo\u002FMiMo-7B-RL-0530 --host 0.0.0.0 --trust-remote-code\n\n# Launch MTP Server\npython3 -m sglang.launch_server --model-path XiaomiMiMo\u002FMiMo-7B-RL-0530 --trust-remote-code \\\n--speculative-algorithm EAGLE --speculative-num-steps 1 --speculative-eagle-topk 1 \\\n--speculative-num-draft-tokens 2  --mem-fraction 0.5\n```\n\nDetailed usage can be found in [SGLang documents](https:\u002F\u002Fdocs.sglang.ai\u002Fbackend\u002Fsend_request.html).\n\n### vLLM inference\n\n1. [Recommended] We officially support inference with MiMo-MTP using [our fork of vLLM](https:\u002F\u002Fgithub.com\u002FXiaomiMiMo\u002Fvllm\u002Ftree\u002Ffeat_mimo_mtp_stable_073).\n\nExample script\n\n```py\nfrom vllm import LLM, SamplingParams\n\nmodel_path = \"\u002Fpath\u002Fto\u002FMiMo\"\nllm = LLM(\n    model=model_path,\n    trust_remote_code=True,\n    num_speculative_tokens=1,\n    disable_log_stats=False\n)\nsampling_params = SamplingParams(temperature=0.6)\n\nconversation = [\n    {\n        \"role\": \"system\",\n        \"content\": \"\"\n    },\n    {\n        \"role\": \"user\",\n        \"content\": \"Write an essay about the importance of higher education.\",\n    },\n]\n\noutputs = llm.chat(conversation,\n                   sampling_params=sampling_params,\n                   use_tqdm=False)\n\nfor output in outputs:\n    prompt = output.prompt\n    generated_text = output.outputs[0].text\n    print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n\nprint(\"=\" * 80)\n```\n\n2. Or, you can register a vLLM loader for MiMo without loading MTP parameters.\n\nYou can copy the [`registry\u002Fregister_mimo_in_vllm.py`](https:\u002F\u002Fgithub.com\u002FXiaomiMiMo\u002FMiMo\u002Fblob\u002Fmain\u002Fregistry\u002Fregister_mimo_in_vllm.py) to your directory and import it with\n\n```py\nimport register_mimo_in_vllm\n\nfrom vllm import LLM, SamplingParams\n\nmodel_path = \"\u002Fpath\u002Fto\u002FMiMo\"\nllm = LLM(\n    model=model_path,\n    trust_remote_code=True,\n    # num_speculative_tokens=1,\n    disable_log_stats=False\n)\nsampling_params = SamplingParams(temperature=0.6)\n```\n\n### HuggingFace inference\n\nExample script\n\n```py\nfrom transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer\n\nmodel_id = \"XiaomiMiMo\u002FMiMo-7B-RL-0530\"\nmodel = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)\ntokenizer = AutoTokenizer.from_pretrained(model_id)\ninputs = tokenizer([\"Today is\"], return_tensors='pt')\noutput = model.generate(**inputs, max_new_tokens = 100)\nprint(tokenizer.decode(output.tolist()[0]))\n```\n\n### Recommended environment and prompts\n\n- We recommend using [our fork of vLLM](https:\u002F\u002Fgithub.com\u002FXiaomiMiMo\u002Fvllm\u002Ftree\u002Ffeat_mimo_mtp_stable_073) which is developed based on vLLM 0.7.3.\n- We recommend using empty system prompt.\n\n> We haven't verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo 💻.\n\n## V. Citation\n\n```bibtex\n@misc{coreteam2025mimounlockingreasoningpotential,\n      title={MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining}, \n      author={LLM-Core-Team Xiaomi},\n      year={2025},\n      eprint={2505.07608},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.07608}, \n}\n```\n\n\n## VI. Contact\n\nPlease contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.\n","MiMo项目旨在解锁语言模型的推理潜力，从预训练到后训练全面提升模型性能。它通过扩展监督微调（SFT）数据集规模和增强强化学习（RL）训练窗口大小来优化模型，特别是在数学、编程及STEM领域的问题解决能力上取得了显著进步。技术上，MiMo使用Python开发，并且在多个基准测试中展示了其7B参数模型能够超越更大规模模型的表现。该项目适合需要强大逻辑推理能力的应用场景，如教育辅助软件、自动代码生成工具等。",2,"2026-06-11 03:42:39","high_star"]