[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79923":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":21,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},79923,"SU-01","Simplified-Reasoning\u002FSU-01","Simplified-Reasoning","SU-01: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling","https:\u002F\u002Fsimplified-reasoning.github.io\u002FSU-01\u002F",null,"Python",90,5,1,0,3,2.33,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:03:55","\u003Cdiv align=\"center\">\n\n\u003Ch1 style=\"display: flex; justify-content: center; align-items: center; gap: 10px; margin: 0;\">\n  SU-01: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling\n\u003C\u002Fh1>\n\u003Cp align=\"center\">\u003Cem>A compact 30B-A3B reasoning model for rigorous mathematical and scientific olympiad problem solving.\u003C\u002Fem>\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"page\u002Fsource_html\u002Fsimplex-pipeline-hires.png\" alt=\"SU-01 training and inference pipeline\" style=\"width: 88%; height: auto;\">\n\u003C\u002Fdiv>\n\n[![Technical Report](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTechnical_Report-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](http:\u002F\u002Farxiv.org\u002Fabs\u002F2605.13301)\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-4285F4?style=for-the-badge&logo=googlechrome&logoColor=white)](https:\u002F\u002Fsimplified-reasoning.github.io\u002FSU-01\u002F)\n[![Github](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSU--01-000000?style=for-the-badge&logo=github&logoColor=white)](https:\u002F\u002Fgithub.com\u002FSimplified-Reasoning\u002FSU-01)\n[![Hugging Face Model](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSU--01-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https:\u002F\u002Fhuggingface.co\u002FSimplified-Reasoning\u002FSU-01)\n[![Hugging Daily Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF--Daily--Paper-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2605.13301)\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\" style=\"font-family: Arial, sans-serif;\">\n  \u003Cp>\n    \u003Ca href=\"#news\" style=\"text-decoration: none; font-weight: bold;\">📢 News\u003C\u002Fa> •\n    \u003Ca href=\"#introduction\" style=\"text-decoration: none; font-weight: bold;\">📖 Introduction\u003C\u002Fa> •\n    \u003Ca href=\"#key-highlights\" style=\"text-decoration: none; font-weight: bold;\">🏆 Key Highlights\u003C\u002Fa> •\n    \u003Ca href=\"#released-model\" style=\"text-decoration: none; font-weight: bold;\">🤗 Released Model\u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp>\n    \u003Ca href=\"#getting-started\" style=\"text-decoration: none; font-weight: bold;\">🚀 Getting Started\u003C\u002Fa> •\n    \u003Ca href=\"#training-code\" style=\"text-decoration: none; font-weight: bold;\">🔧 Training Code\u003C\u002Fa> •\n    \u003Ca href=\"#test-time-scaling\" style=\"text-decoration: none; font-weight: bold;\">🧪 Test-Time Scaling\u003C\u002Fa> •\n    \u003Ca href=\"#evaluation\" style=\"text-decoration: none; font-weight: bold;\">📊 Evaluation\u003C\u002Fa>\n  \u003C\u002Fp>\n  \u003Cp>\n    \u003Ca href=\"#acknowledgement\" style=\"text-decoration: none; font-weight: bold;\">✨ Acknowledgement\u003C\u002Fa> •\n    \u003Ca href=\"#citation\" style=\"text-decoration: none; font-weight: bold;\">📝 Citation\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n---\n\n\u003Ca id=\"news\">\u003C\u002Fa>\n# 📢 News\n\n- **[2026\u002F05\u002F15]** Technical report is available at [arxiv](http:\u002F\u002Farxiv.org\u002Fabs\u002F2605.13301) and [Huggingface Daily Paper](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2605.13301).\n- **[2026\u002F05\u002F13]** Project page is available at [https:\u002F\u002Fsimplified-reasoning.github.io\u002FSU-01\u002F](https:\u002F\u002Fsimplified-reasoning.github.io\u002FSU-01\u002F).\n- **[2026\u002F05\u002F13]** SU-01 model weights are available on [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FSimplified-Reasoning\u002FSU-01).\n\n---\n\n\u003Ca id=\"introduction\">\u003C\u002Fa>\n# 📖 Introduction\n\n**SU-01** is a 30B-A3B olympiad reasoning model trained with a simple and unified post-training recipe for mathematical and scientific problem solving. The goal is to turn a broadly capable post-trained reasoning backbone into a rigorous long-horizon proof solver without relying on external tools, code execution, or dedicated symbolic solvers.\n\nThe recipe first applies **reverse-perplexity curriculum SFT** on roughly **338K sub-8K-token** trajectories to install explicit, proof-oriented reasoning behavior. It then uses **200 steps of two-stage reinforcement learning** to improve both answer-seeking ability and complete-proof quality. Finally, SU-01 uses a multi-round **generate-verify-revise** loop at inference time, enabling coherent natural-language reasoning trajectories beyond **100K tokens** on difficult olympiad problems.\n\nIn competition-style evaluations, test-time scaling brings SU-01 to **35 points on IMO 2025** and **35 points on USAMO 2026**, reaching gold-medal-level performance. SU-01 also exceeds the gold cutoff on **IPhO 2024\u002F2025** and substantially improves over similarly sized models on proof-level benchmarks such as **IMO-ProofBench**.\n\n---\n\n\u003Ca id=\"key-highlights\">\u003C\u002Fa>\n# 🏆 Key Highlights\n\n- **Reverse-perplexity curriculum SFT**: sorts long-CoT training examples by descending PPL within each epoch, exposing the model first to teacher trajectories most mismatched with the current policy.\n- **Two-stage RL**: starts with verifiable-reward training for answer-seeking behavior, then shifts to proof-quality optimization with self-refinement and experience replay.\n- **Long-horizon proof repair**: uses iterative generation, verification, issue localization, and refinement to produce complete olympiad-style solutions.\n- **Gold-medal-level results**: reaches 35 points on both IMO 2025 and USAMO 2026 with test-time scaling, and passes IPhO 2024\u002F2025 gold lines.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"page\u002Fsource_png\u002Fproofbench_overall.png\" alt=\"SU-01 ProofBench overview\" style=\"width: 88%; height: auto;\">\n\u003C\u002Fdiv>\n\n## Gold-Medal Competition Results\n\n### IMO 2025\n\n| **Model** | **P1** | **P2** | **P3** | **P4** | **P5** | **P6** | **Total** |\n|-----------|-------:|-------:|-------:|-------:|-------:|-------:|----------:|\n| SU-01 | 1 | 7 | 1 | 6 | 6 | 0 | 21 |\n| **SU-01 w\u002F TTS** | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **0**\u003Csup>*\u003C\u002Fsup> | **35**\u003Csup>*\u003C\u002Fsup> 🥇 |\n\n### USAMO 2026\n\n| **Model** | **P1** | **P2** | **P3** | **P4** | **P5** | **P6** | **Total** |\n|-----------|-------:|-------:|-------:|-------:|-------:|-------:|----------:|\n| SU-01 | 7 | 0 | 0 | 7 | 0 | 1 | 15 |\n| **SU-01 w\u002F TTS** | **7**\u003Csup>*\u003C\u002Fsup> | **0**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **35**\u003Csup>*\u003C\u002Fsup> 🥇 |\n\n`*` denotes results graded by human experts. Medal lines for IMO 2025 are 35\u002F28\u002F19 points for gold\u002Fsilver\u002Fbronze, and medal lines for USAMO 2026 are 25\u002F18\u002F11 points.\n\n---\n\n\u003Ca id=\"released-model\">\u003C\u002Fa>\n# 🤗 Released Model\n\n| **Model** | **Hugging Face** | **Base \u002F Backbone** | **Notes** |\n|-----------|-------------------|----------------------|-----------|\n| SU-01 | [Simplified-Reasoning\u002FSU-01](https:\u002F\u002Fhuggingface.co\u002FSimplified-Reasoning\u002FSU-01) | P1-30B-A3B backbone | Final SU-01 release model trained with SFT, coarse RL, refined RL, and evaluated with optional TTS. |\n\n---\n\n\u003Ca id=\"getting-started\">\u003C\u002Fa>\n# 🚀 Getting Started\n\n## Installation\n\nWe use the slime Docker image [slimerl\u002Fslime:nightly-dev-20260202c](https:\u002F\u002Fhub.docker.com\u002Flayers\u002Fslimerl\u002Fslime\u002Fnightly-dev-20260202c).\n\n```bash\ndocker pull slimerl\u002Fslime:nightly-dev-20260202c\n\ndocker run --gpus all --ipc=host --network=host -it \\\n  -v \"$PWD\":\u002Fworkspace\u002FSU-01 \\\n  -w \u002Fworkspace\u002FSU-01\u002Fsu01-train-slime \\\n  slimerl\u002Fslime:nightly-dev-20260202c \\\n  \u002Fbin\u002Fbash\n```\n\nInside the container, install the local training package:\n\n```bash\npip install -e . --no-deps --no-index --disable-pip-version-check --no-build-isolation\n```\n\nAdjust cluster mounts, model paths, data paths, Ray environment variables, and reward-server URLs according to your infrastructure.\n\n---\n\n\u003Ca id=\"training-code\">\u003C\u002Fa>\n# 🔧 Training Code\n\nThe released training code contains the three major training stages used by SU-01:\n\n```text\nsu01-train-slime\u002Fscripts\n├── sft.sh          # Stage 1: reverse-perplexity curriculum SFT\n├── coarse_rl.sh    # Stage 2: coarse RL with verifiable rewards\n└── refined_rl.sh   # Stage 3: refined RL with proof rewards, self-refinement, and experience replay\n```\n\n## Stage 1: SFT\n\nThe SFT stage reshapes the backbone toward explicit, disciplined, proof-oriented long-form reasoning. It uses a filtered mixture of mathematical, scientific, instruction-following, coding, self-verification, and self-refinement trajectories. Training is implemented with slime, uses four epochs, batch size 128, Adam with learning rate `1e-5`, cosine decay to `1e-6`, and rollout shuffling disabled to preserve the curriculum order.\n\n```bash\ncd su01-train-slime\nbash scripts\u002Fsft.sh\n```\n\n## Stage 2: Coarse RL\n\nCoarse RL trains on verifiable prompts with reinforcement learning from verifiable rewards. The stage uses Group Sequence Policy Optimization (GSPO), complete-response-level reward assignment, dynamic sampling, partial rollout, trajectory importance sampling, and answer verification through a layered reward pipeline.\n\n```bash\ncd su01-train-slime\nbash scripts\u002Fcoarse_rl.sh\n```\n\n## Stage 3: Refined RL\n\nRefined RL shifts optimization from final-answer correctness to proof quality. It mixes verifiable prompts, proof-reward prompts, self-refinement prompts, and replayed successful proof trajectories. The stage uses process-level proof rewards, a self-refinement ratio of `0.2`, and an experience replay ratio of `0.25`.\n\n```bash\ncd su01-train-slime\nbash scripts\u002Frefined_rl.sh\n```\n\n---\n\n\u003Ca id=\"test-time-scaling\">\u003C\u002Fa>\n# 🧪 Test-Time Scaling\n\nSU-01 uses a model-internal verification-and-refinement loop using the method in this [repo](https:\u002F\u002Fgithub.com\u002Flyang36\u002FIMO25):\n\n1. Generate an initial complete solution.\n2. Verify the full proof and produce a structured critique or bug report.\n3. Refine the solution conditioned on the critique.\n4. Repeat until the solution is accepted or the refinement budget is exhausted.\n\nThis expands the model's own natural-language proof-search computation rather than calling an external theorem prover, symbolic solver, or code executor. In the reported USAMO 2026 TTS traces, initial solution generations have a median length of approximately **106K tokens**, while refinement stages have a median length of approximately **83K tokens**.\n\nThe released TTS implementation is in `su01-eval\u002Fdecode`, including direct decoding, TTS decoding, batch decoding, and SGLang server helpers. The shared prompt template is provided in [`su01-eval\u002Fdecode\u002Fgeneral_prompt.txt`](su01-eval\u002Fdecode\u002Fgeneral_prompt.txt). See [`su01-eval\u002Fdecode\u002FREADME.md`](su01-eval\u002Fdecode\u002FREADME.md) for launch commands, input layout, decoding options, and smoke tests.\n\n### SU-01 Prompt Template (Direct Decoding)\n\nRendered for readability, the default prompt is:\n\n```text\nPlease solve the following mathematical olympiad problem. Show your complete reasoning and proof.\n1. Please use LaTeX format to represent the variables and formulas used in the solution process and results.\n2. If the problem asks you to find specific values, please put the final answer(s) in \\boxed{}.\n3. If the problem requires a proof, present a clear and rigorous argument.\n```\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"page\u002Fsource_png\u002Ftts_action_length_distribution_1.png\" alt=\"Test-time scaling action length distribution\" style=\"width: 70%; height: auto;\">\n\u003C\u002Fdiv>\n\n---\n\n\u003Ca id=\"evaluation\">\u003C\u002Fa>\n# 📊 Evaluation\n\nEvaluation code is released under `su01-eval`. Use `su01-eval\u002Fdecode` to generate direct or TTS predictions, and use `su01-eval\u002Fverifiable_bench` to score answer-verifiable benchmarks and FrontierScience Olympiad predictions. See [`su01-eval\u002Fdecode\u002FREADME.md`](su01-eval\u002Fdecode\u002FREADME.md) and [`su01-eval\u002Fverifiable_bench\u002FREADME.md`](su01-eval\u002Fverifiable_bench\u002FREADME.md) for commands, input formats, output formats, and configuration options.\n\n## Table 1: Performance on Answer-Verifiable Reasoning Tasks\n\nAnswerBench, AMO-Bench, AIME 25\u002F26, and FrontierScience-Olympiad are averaged over 4, 8, 8, and 4 runs, respectively. Avg. is the mean of AnswerBench, AMO-Bench, AIME 2025, AIME 2026, and FrontierScience-Olympiad.\n\n| **Model** | **AnswerBench** | **AMO-Bench** | **AIME 25\u002F26** | **FS-O Physics** | **FS-O Chemistry** | **FS-O Biology** | **FS-O Overall** | **Avg.** |\n|-----------|----------------:|--------------:|---------------:|-----------------:|-------------------:|-----------------:|-----------------:|---------:|\n| P1-30B-A3B | 69.3% | 41.3% | 90.4%\u002F89.6% | 57.5% | 57.5% | \u003Cu>27.5%\u003C\u002Fu> | 54.5% | 69.0% |\n| GLM-4.7-Flash | 73.8% | 53.8% | 91.3%\u002F88.3% | 54.5% | 60.0% | 17.5% | 53.0% | 72.0% |\n| Nemotron-Cascade-2 | **80.5%** | 40.8% | \u003Cu>94.2%\u003C\u002Fu>\u002F90.0% | 56.0% | 56.3% | **30.0%** | 53.5% | 71.8% |\n| Qwen3.6-35B-A3B | \u003Cu>78.0%\u003C\u002Fu> | \u003Cu>58.8%\u003C\u002Fu> | 92.5%\u002F\u003Cu>92.9%\u003C\u002Fu> | \u003Cu>65.5%\u003C\u002Fu> | **74.4%** | 25.0% | **65.0%** | **77.4%** |\n| Gemma-4-31B | 74.0% | 39.3% | 88.8%\u002F91.3% | **69.0%** | 61.9% | \u003Cu>27.5%\u003C\u002Fu> | 61.0% | 70.9% |\n| **SU-01** | 77.5% | **59.8%** | **94.6%**\u002F**93.3%** | 62.5% | \u003Cu>69.4%\u003C\u002Fu> | 25.0% | \u003Cu>61.5%\u003C\u002Fu> | \u003Cu>77.3%\u003C\u002Fu> |\n\n## Table 2: Performance on Non-Verifiable Benchmarks\n\nFrontierScience-Research refers to the research subset of FrontierScience. For SU-01, `x\u002Fy` reports scores without and with TTS on IMO-ProofBench.\n\n| **Model** | **ProofBench Basic** | **ProofBench Advanced** | **ProofBench Overall** | **FS-R Physics** | **FS-R Chemistry** | **FS-R Biology** | **FS-R Overall** |\n|-----------|---------------------:|------------------------:|-----------------------:|-----------------:|-------------------:|-----------------:|-----------------:|\n| Gemini 3.1 Pro Thinking | \u003Cu>95.2%\u003C\u002Fu> | \u003Cu>50.0%\u003C\u002Fu> | \u003Cu>72.6%\u003C\u002Fu> | 0.0% | \u003Cu>30.0%\u003C\u002Fu> | 10.0% | 13.3% |\n| GPT-5.5-High | **96.7%** | **64.8%** | **80.7%** | **25.0%** | **40.0%** | **45.0%** | **36.7%** |\n| DeepSeek-V3.2-Speciale | 77.6% | 34.3% | 56.0% | \u003Cu>10.0%\u003C\u002Fu> | 20.0% | \u003Cu>15.0%\u003C\u002Fu> | \u003Cu>15.0%\u003C\u002Fu> |\n| P1-30B-A3B | 33.8% | 6.2% | 20.0% | 0.0% | **10.0%** | 0.0% | 3.3% |\n| GLM-4.7-Flash | 51.0% | 16.7% | 33.8% | 0.0% | 0.0% | 0.0% | 0.0% |\n| Nemotron-Cascade-2 | \u003Cu>77.1%\u003C\u002Fu> | 28.6% | 52.9% | \u003Cu>5.0%\u003C\u002Fu> | 5.0% | **20.0%** | \u003Cu>10.0%\u003C\u002Fu> |\n| Qwen3.6-35B-A3B | 39.1% | 7.1% | 23.1% | 0.0% | 5.0% | 10.0% | 5.0% |\n| Gemma-4-31B | 46.7% | 16.2% | 31.4% | 0.0% | **10.0%** | 5.0% | 5.0% |\n| **SU-01** | \u003Cu>77.1%\u003C\u002Fu>\u002F**91.0%** | \u003Cu>38.1%\u003C\u002Fu>\u002F**49.5%** | \u003Cu>57.6%\u003C\u002Fu>\u002F**70.2%** | **10.0%** | **10.0%** | \u003Cu>15.0%\u003C\u002Fu> | **11.7%** |\n\n## Table 3: Performance on Olympiad Competition Problems\n\nFor IPhO, `x\u002Fy` reports scores without and with TTS. Gold lines for IPhO 2024\u002F2025 are 20.8\u002F19.7 points. Medal lines for IMO 2025 are 35\u002F28\u002F19 points, and medal lines for USAMO 2026 are 25\u002F18\u002F11 points.\n\n### IPhO 2024\u002F2025\n\n| **Model** | **IPhO 2024** | **IPhO 2025** |\n|-----------|--------------:|--------------:|\n| P1-30B-A3B | 23.1 | 17.7 |\n| GLM-4.7-Flash | 22.2 | 19.5 |\n| Nemotron-Cascade-2 | 21.2 | 16.7 |\n| Qwen3.6-35B-A3B | 24.3 | 19.9 |\n| Gemma-4-31B | \u003Cu>24.4\u003C\u002Fu> | \u003Cu>20.3\u003C\u002Fu> |\n| **SU-01** | 23.5 \u002F **25.3** | \u003Cu>20.3\u003C\u002Fu> \u002F **21.7** |\n\n### IMO 2025\n\n| **Model** | **P1** | **P2** | **P3** | **P4** | **P5** | **P6** | **Total** |\n|-----------|-------:|-------:|-------:|-------:|-------:|-------:|----------:|\n| SU-01 | 1 | 7 | 1 | 6 | 6 | 0 | 21 |\n| **SU-01 w\u002F TTS** | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **0**\u003Csup>*\u003C\u002Fsup> | **35**\u003Csup>*\u003C\u002Fsup> 🥇 |\n\n### USAMO 2026\n\n| **Model** | **P1** | **P2** | **P3** | **P4** | **P5** | **P6** | **Total** |\n|-----------|-------:|-------:|-------:|-------:|-------:|-------:|----------:|\n| SU-01 | 7 | 0 | 0 | 7 | 0 | 1 | 15 |\n| **SU-01 w\u002F TTS** | **7**\u003Csup>*\u003C\u002Fsup> | **0**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **7**\u003Csup>*\u003C\u002Fsup> | **35**\u003Csup>*\u003C\u002Fsup> 🥇 |\n\n`*` denotes TTS results graded by human experts.\n\n---\n\n\u003Ca id=\"acknowledgement\">\u003C\u002Fa>\n# ✨ Acknowledgement\n\nThis work was supported by the Shanghai Artificial Intelligence Laboratory.\n\nWe thank the authors and maintainers of prior open research and infrastructure that made this work possible. In particular, we are grateful to DeepSeek for open-sourcing strong reasoning policies and generative reward models, which provided an important reference point for our work. IMO-Bench, AMO-Bench, and FrontierScience helped guide the overall system optimization by offering challenging mathematical and scientific reasoning benchmarks and evaluation protocols.\n\nWe also thank prior data efforts that supported our SFT and RL data curation, including DeepMath, NaturalReasoning, Eurus, OpenCodeReasoning, P1, and OPC, as well as the many public problem sources and communities that cannot all be listed here. We further acknowledge the broader open-source infrastructure ecosystem, including slime for training and SGLang for efficient inference and serving.\n\n---\n\n\u003Ca id=\"citation\">\u003C\u002Fa>\n# 📝 Citation\n\nIf you find SU-01 useful, please cite the project:\n\n```bibtex\n@misc{su012026,\n  title={Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling},\n  author={Yafu Li and Runzhe Zhan and Haoran Zhang and Shunkai Zhang and Yizhuo Li and Zhilin Wang and Jiacheng Chen and Futing Wang and Xuyang Hu and Yuchen Fan and Bangjie Xu and Yucheng Su and Xinmiao Han and Chenxi Li and Haodi Lei and Yufeng Zhao and Zejin Lin and Qianjia Cheng and Tong Zhu and Xiaoye Qu and Ganqu Cui and Peng Ye and Yun Luo and Zhouchen Lin and Yu Qiao and Bowen Zhou and Ning Ding and Yu Cheng},\n  year={2026},\n  url={http:\u002F\u002Farxiv.org\u002Fabs\u002F2605.13301}\n}\n```\n","SU-01 是一个用于解决数学和科学奥林匹克竞赛问题的推理模型。该项目通过简单的统一扩展方法，实现了金牌级别的推理能力，其核心是一个紧凑的30B-A3B模型。该模型使用Python开发，并且在训练和推理过程中采用了高效的流水线设计。SU-01特别适合需要高精度逻辑推理的应用场景，例如教育领域的智能辅导系统、自动解题工具等。此外，项目提供了详细的文档和技术报告，便于开发者理解和应用。",2,"2026-06-11 03:58:34","CREATED_QUERY"]