[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72299":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":27,"discoverSource":28},72299,"SkyThought","NovaSky-AI\u002FSkyThought","NovaSky-AI","Sky-T1: Train your own O1 preview model within $450","https:\u002F\u002Fnovasky-ai.github.io\u002F",null,"Python",3390,345,39,21,0,2,11,29.62,"Apache License 2.0",false,"main",[],"2026-06-12 02:03:01","\u003Cdiv align=\"center\">\n\n# SkyThought\n\n[![Github](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSkyThought-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white)](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyThought) [![Twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNovaSky-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white)](https:\u002F\u002Fx.com\u002FNovaSkyAI) [![Hugging Face Collection](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNovaSky-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor)](https:\u002F\u002Fhuggingface.co\u002FNovaSky-AI) [![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNovaSky-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002FkexQXy5yA3)\n\n\n\u003Cdiv align=\"center\" style=\"font-family: Arial, sans-serif;\">\n  \u003Cp>\n    \u003Ca href=\"#news\" style=\"text-decoration: none; font-weight: bold;\">News\u003C\u002Fa> •\n    \u003Ca href=\"#links\" style=\"text-decoration: none; font-weight: bold;\">Links\u003C\u002Fa> •\n    \u003Ca href=\"#getting-started\" style=\"text-decoration: none; font-weight: bold;\">Getting Started\u003C\u002Fa> •\n    \u003Ca href=\"#evaluation\" style=\"text-decoration: none; font-weight: bold;\">Evaluation\u003C\u002Fa> •\n    \u003Ca href=\"#citation\" style=\"text-decoration: none; font-weight: bold;\">Citation\u003C\u002Fa> •\n    \u003Ca href=\"#acknowledgement\" style=\"text-decoration: none; font-weight: bold;\">Acknowledgement\u003C\u002Fa> \n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003C\u002Fdiv>\n\n\n# News\n- **[2025\u002F02\u002F21]** 🎉 We released S*: Test time scaling for code generation ([paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2502.14382), [code](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyThought\u002Ftree\u002Fmain\u002Fskythought\u002Ftest-time-scaling)), a simple and extensible test time scaling framework for code generation.\n- **[2025\u002F02\u002F11]** 🎉 We released Sky-T1-7B ([model](https:\u002F\u002Fhuggingface.co\u002FNovaSky-AI\u002FSky-T1-7B)) and Sky-T1-mini ([model](https:\u002F\u002Fhuggingface.co\u002FNovaSky-AI\u002FSky-T1-mini)) to demonstrate the potential of RL in further enhancing model's capability beyond distillation.\n- **[2025\u002F01\u002F23]** ⚡️ We released Sky-T1-32B-Flash ([model](https:\u002F\u002Fhuggingface.co\u002FNovaSky-AI\u002FSky-T1-32B-Flash), [data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FNovaSky-AI\u002FSky-T1_preference_data_10k)) to tackle overthinking and reduce reasoning sequence lengths while maintaining accuracy.\n- **[2025\u002F01\u002F19]** 🎉 [Chat demo](http:\u002F\u002F164.152.23.196:3000\u002F) for Sky-T1-32B-Preview is alive! Please check it out!\n- **[2025\u002F01\u002F10]** 🎉 We have released our Sky-T1-32B-Preview [model](https:\u002F\u002Fhuggingface.co\u002FNovaSky-AI\u002FSky-T1-32B-Preview) and [data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FNovaSky-AI\u002FSky-T1_data_17k) through [HuggingFace](https:\u002F\u002Fhuggingface.co\u002FNovaSky-AI)!\n\n\n# Links\n\n- 📜 [Sky-T1-7B and Sky-T1-mini Blog Post](https:\u002F\u002Fnovasky-ai.github.io\u002Fposts\u002Fsky-t1-7B\u002F)\n- 📜 [Sky-T1-32B-Flash Blog Post](https:\u002F\u002Fnovasky-ai.github.io\u002Fposts\u002Freduce-overthinking\u002F)\n- 📜 [Sky-T1-32B-Preview model Blog Post](https:\u002F\u002Fnovasky-ai.github.io\u002Fposts\u002Fsky-t1\u002F)\n- 🤗 [Sky-T1-32B-Preview model](https:\u002F\u002Fhuggingface.co\u002FNovaSky-AI)\n\n# Getting Started\n\nWe open source the code and scripts we used for data curation, training, and evaluation for Sky-T1-32B-Preview, you can find more details in each directory.\n- [`recipes`](.\u002Frecipes\u002F): Recipes - data curation steps and training strategies - for building our models `Sky-T1-32B-Flash`, `Sky-T1-32B-Preview` and `Sky-T1-7B` series. \n- [`skythought\u002Fevals`](.\u002Fskythought\u002Fevals\u002F): Our data generation and evaluation library. We provide a convenient CLI for evaluation as well as a `Scorer` API for scoring during data curation and training ([example](.\u002Fexamples\u002Fscoring.ipynb)). \n- [`skythought\u002Ftrain`](.\u002Fskythought\u002Ftrain\u002F): Training scripts for Sky-T1. We use [Llama-Factory](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLLaMA-Factory) to perform training. \n- [`skythought\u002Fskythought-rl`](.\u002Fskythought\u002Fskythought-rl\u002F): RL training code for Sky-T1-7B and Sky-T1-mini.\n\n# Evaluation\n\n## Usage\n\nYou can install the latest release from PyPI or from [source](#installing-from-source):\n\n```shell\npip install skythought\n```\n\n### Installing from source\n\n```shell\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyThought.git\ncd SkyThought\n\n# Create and activate a virtual environment (using uv here)\nuv venv --python 3.10\nsource .venv\u002Fbin\u002Factivate\n\n# Install the package in editable mode\nuv pip install -e .\n```\n\nRunning evaluation is as simple as: \n\n```bash\nskythought evaluate --model NovaSky-AI\u002FSky-T1-32B-Preview --task aime24\n```\n\nWe support a wide variety of datasets in mathematics, science and coding:\n\n- AIME'24\n- MATH500\n- GPQADiamond\n- MMLU\n- ARC-Challenge\n- OlympiadBench\n- AMC'23 \n- TACO \n- APPS\n- LiveCodeBench\n- MMLU Pro\n- MinervaMath\n- GSM8K\n- AIME'25\n\nFor more details, please refer to our [evaluation guide](examples\u002Fevaluate.ipynb) and the [evaluation README](skythought\u002Fevals\u002FREADME.md).\n\n\n### Evaluation results\nFollowing, we show our evaluation results for the Sky-T1-32B-Preview model across math, coding, and science benchmarks.\n\n| Metric                | Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ   | o1-preview |\n|-----------------------|---------------------|--------|-------|------------|\n| Math500              | 86.4                    | 81.4    | 92.2 | 81.4       |\n| AIME2024             | 43.3                    | 16.7    | 50.0  | 40.0       |\n| LiveCodeBench-Easy   | 86.3                    | 84.6   | 90.7  | 92.9       |\n| LiveCodeBench-Medium | 56.8                    | 40.8   | 56.3  | 54.9       |\n| LiveCodeBench-Hard   | 17.9                    | 9.8   | 17.1  | 16.3       |\n| GPQA-Diamond         | 56.8                    | 45.5   | 52.5  | 75.2       |\n| OlympiadBench (Math, EN)    | 59.79\t           | 46.74\t| 62.17\t | 59.2      | \n\n#### Results on non-reasoning benchmarks\n\nWe also evaluate on non-reasoning benchmarks (these are benchmarks for instruction-following, QA, etc) to test whether the model has traded-off capability in other domains for better performance in reasoning-related benchmarks. \n\n\n| Metric | Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ-32B-Preview | Eval Implementation |\n|---------|-------------------|---------------------|-----------------|-------------------|\n| MMLU (0 shot; no CoT) | **78.36** | 74.14 | 71.23 | [lm_eval](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) |\n| MMLU (5 shot; no CoT) | 82.46 | **82.62** | 82.32 | [lm_eval](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) |\n| ARC-C (0 shot; no CoT) | **49.49** | 49.4 | 49.66 | [lm_eval](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) |\n| IFEval | 75.79 | **78.74** | 42.51 | [lm_eval](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) |\n| LLM-as-a-Judge | 9.12\t| **9.19** | 8.30 | [fastchat](https:\u002F\u002Fgithub.com\u002Flm-sys\u002FFastChat\u002Ftree\u002Fmain\u002Ffastchat\u002Fllm_judge) |\n| MGSM (0 shot; `direct`) | 33 | **42.3** | 19.07 | [lm_eval](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) |\n| MGSM (8-shot; `direct`) | 58.4 | **61.47** | 58.5 | [lm_eval](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness) |\n| BFCL-v3 | 53.18 | **58.92** | 17.41 | [BFCL](https:\u002F\u002Fgithub.com\u002FShishirPatil\u002Fgorilla\u002Ftree\u002Fmain\u002Fberkeley-function-call-leaderboard) |\n| Arena-Hard | **74.79** | 66.51 | 52.6 | [Arena-Hard-Auto](https:\u002F\u002Fgithub.com\u002Flmarena\u002Farena-hard-auto) |\n\nFor more details, refer [here](.\u002Fskythought\u002Fevals\u002Fbase_instruct_evals.md).\n\n# Fully Open-source: Driving Progress Together\nWe believe that open-source collaboration drives progress, and with Sky-T1-32B-Preview, we are fully committed to empowering the community. We open-source all details (i.e., data, codes, model weights) to enable the community to replicate and improve on our results *easily*:\n\n\u003Ctable>\n  \u003Cthead>\n    \u003Ctr>\n      \u003Cth>Model\u003C\u002Fth>\n      \u003Cth style=\"background-color: #f2f2f2;\">\u003Cdiv align=\"center\">Sky-T1-32B-Preview\u003C\u002Fdiv>\u003C\u002Fth>\n      \u003Cth>\u003Cdiv align=\"center\">STILL-2\u003C\u002Fdiv>\u003C\u002Fth>\n      \u003Cth>\u003Cdiv align=\"center\">Journey\u003C\u002Fdiv>\u003C\u002Fth>\n      \u003Cth>\u003Cdiv align=\"center\">QwQ\u003C\u002Fdiv>\u003C\u002Fth>\n      \u003Cth>\u003Cdiv align=\"center\">o1\u003C\u002Fdiv>\u003C\u002Fth>\n    \u003C\u002Ftr>\n  \u003C\u002Fthead>\n  \u003Ctbody>\n    \u003Ctr>\n      \u003Ctd>Data\u003C\u002Ftd>\n      \u003Ctd style=\"background-color: #f2f2f2;\">\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>Code\u003C\u002Ftd>\n      \u003Ctd style=\"background-color: #f2f2f2;\">\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>Report\u003C\u002Ftd>\n      \u003Ctd style=\"background-color: #f2f2f2;\">\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>Math domain\u003C\u002Ftd>\n      \u003Ctd style=\"background-color: #f2f2f2;\">\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>Coding domain\u003C\u002Ftd>\n      \u003Ctd style=\"background-color: #f2f2f2;\">\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n    \u003Ctr>\n      \u003Ctd>Model Weights\u003C\u002Ftd>\n      \u003Ctd style=\"background-color: #f2f2f2;\">\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">✅\u003C\u002Fdiv>\u003C\u002Ftd>\n      \u003Ctd>\u003Cdiv align=\"center\">❌\u003C\u002Fdiv>\u003C\u002Ftd>\n    \u003C\u002Ftr>\n  \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\n# Citation\nThe code in this repository is mostly described in the post below. Please consider citing this work if you find the repository helpful. \n\n```bibtex\n@misc{sky_t1_2025,\n  author       = {NovaSky Team},\n  title        = {Sky-T1: Train your own O1 preview model within $450},\n  howpublished = {https:\u002F\u002Fnovasky-ai.github.io\u002Fposts\u002Fsky-t1},\n  note         = {Accessed: 2025-01-09},\n  year         = {2025}\n}\n```\n\n# Acknowledgement\nThis work is done at [Berkeley Sky Computing Lab](https:\u002F\u002Fsky.cs.berkeley.edu\u002F), with the amazing compute support from [Lambda Labs](https:\u002F\u002Flambdalabs.com\u002Fservice\u002Fgpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5), [Anyscale](https:\u002F\u002Fwww.anyscale.com\u002F), and [Databricks](https:\u002F\u002Fwww.databricks.com\u002F). We would like to express our gratitude for the valuable academic feedback and support from the [Still-2 Team](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2412.09413), and Junyang Lin from the [Qwen Team](https:\u002F\u002Fqwenlm.github.io\u002F).\n\n\n","SkyThought 是一个允许用户以较低成本训练自己的O1预览模型的项目。其核心功能是通过强化学习技术增强模型能力，同时提供多种规模的模型选项（如7B、32B等），并支持代码生成等功能。该项目采用Python语言编写，具有良好的扩展性和实用性，适合需要定制化AI模型但预算有限的研究者或开发者使用。此外，SkyThought还提供了详细的文档和社区支持，便于用户快速上手。","2026-06-11 03:41:14","high_star"]