[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72534":3},{"id":4,"name":5,"fullName":6,"owner":5,"repo":5,"description":7,"homepage":8,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":15,"starSnapshotCount":15,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},72534,"open-thoughts","open-thoughts\u002Fopen-thoughts","Fully open data curation for reasoning models","https:\u002F\u002Fopen-thoughts.ai",null,"Python",2274,189,30,6,0,3,7,18,9,28.84,"Apache License 2.0",false,"main",[25,26],"open-data","reasoning","2026-06-12 02:03:04","\u003C!-- markdownlint-disable first-line-h1 -->\n\u003C!-- markdownlint-disable html -->\n\u003C!-- markdownlint-disable no-duplicate-header -->\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"images\u002Fopen_thoughts.png\" width=\"60%\" alt=\"Open Thoughts GitHub Repository\" \u002F>\n\u003C\u002Fdiv>\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fopen-thoughts.ai\">\n    \u003Cimg alt=\"Static Badge\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHome-open--thoughts.ai-blue?style=flat&link=https%3A%2F%2Fopen-thoughts.ai\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\">\n    \u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Open%20Thoughts-blue?color=ffc107&logoColor=white&style=flat&link=https%3A%2F%2Fhuggingface.co\u002Fopen-thoughts\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002F9DsKjFtdXp\">\n    \u003Cimg alt=\"Discord\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Join%20Community-7289DA?style=flat&logo=discord&logoColor=white\">\n  \u003C\u002Fa>\n  \u003Cbr>\n  \u003Ci>Curating the best open reasoning datasets\u003C\u002Fi>\u003Cbr> \n  A collaboration led by \u003Ca href=\"https:\u002F\u002Fbespokelabs.ai\u002F\">Bespoke Labs\u003C\u002Fa> and the \u003Ca href=\"https:\u002F\u002Fwww.datacomp.ai\u002F\">DataComp\u003C\u002Fa> community\n\n\u003C\u002Fp>\n\u003Chr>\n\nOur first goal is to curate a reasoning dataset to train state-of-the-art small reasoning models that surpass [DeepSeek-R1-Distill-Qwen-32B](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-R1-Distill-Qwen-32B) and [DeepSeek-R1-Distill-Qwen-7B](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-R1-Distill-Qwen-7B) on math and code reasoning benchmarks.\n\n\n# News\n- **[2025\u002F06\u002F10]** 🎉 [OpenThoughts3-1.2M dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts3-1.2M) is the #1 trending dataset on Hugging Face. \n- **[2025\u002F06\u002F04]** 🎉🎉🎉 We release our OpenThoughts [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04178)!\n- **[2025\u002F06\u002F04]** 🎉🎉🎉 [OpenThinker3](https:\u002F\u002Fwww.openthoughts.ai\u002Fblog\u002Fot3) is released! \n- **[2025\u002F05\u002F09]** 🎉 Join our [Discord community](https:\u002F\u002Fdiscord.gg\u002F9DsKjFtdXp) to discuss OpenThoughts and connect with other users!\n- **[2025\u002F04\u002F07]** 🎉 [OpenThoughts2-1M dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts2-1M) is the #1 trending dataset on Hugging Face.\n- **[2025\u002F04\u002F03]** 🎉 [OpenThinker2](https:\u002F\u002Fwww.open-thoughts.ai\u002Fblog\u002Fthinkagain) has arrived: [OpenThoughts2-1M](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts2-1M), [OpenThinker2-7B](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\u002FOpenThinker2-7B), [OpenThinker2-32B](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\u002FOpenThinker2-32B).\n- **[2025\u002F03\u002F13]** 🎉 We release [an analysis of reasoning models](https:\u002F\u002Fwww.open-thoughts.ai\u002Fblog\u002Faiw) on [Alice in Wonderland](https:\u002F\u002Fgithub.com\u002FLAION-AI\u002FAIW).\n- **[2025\u002F02\u002F16]** 🎉 [OpenThinker on Ollama](https:\u002F\u002Follama.com\u002Flibrary\u002Fopenthinker) reaches 400k downloads.\n- **[2025\u002F02\u002F14]** 🎉 Chat with OpenThinker in the [online playground](https:\u002F\u002Fplayground.bespokelabs.ai\u002F).\n- **[2025\u002F02\u002F13]** 🎉 OpenThinker is now [available on Ollama](https:\u002F\u002Follama.com\u002Flibrary\u002Fopenthinker) for easy local inference.\n- **[2025\u002F02\u002F12]** 🎉 We release [OpenThinker-32B](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\u002FOpenThinker-32B), the [best open-data reasoning model](https:\u002F\u002Fwww.open-thoughts.ai\u002Fblog\u002Fscale).\n- **[2025\u002F02\u002F02]** 🎉 [OpenThoughts-114k dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k) is the #1 trending dataset on Hugging Face.\n- **[2025\u002F01\u002F30]** 🎉 Reasoning benchmarks are added to [Evalchemy](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002FEvalchemy) and [compared](https:\u002F\u002Fwww.open-thoughts.ai\u002Fblog\u002Fmeasure) to publicly reported scores.\n- **[2025\u002F01\u002F28]** 🎉 [Open Thoughts](https:\u002F\u002Fwww.open-thoughts.ai\u002F) launches with [OpenThoughts-114k dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k) and [OpenThinker-7B model](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\u002FOpenThinker-7B).\n- **[2025\u002F01\u002F27]** 🎉 [Bespoke-Stratos-17k dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbespokelabs\u002FBespoke-Stratos-17k) is the #2 trending dataset on Hugging Face.\n- **[2025\u002F01\u002F22]** 🎉 [Bespoke-Stratos-17k dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbespokelabs\u002FBespoke-Stratos-17k) and [Bespoke-Stratos-32B model](https:\u002F\u002Fhuggingface.co\u002Fbespokelabs\u002FBespoke-Stratos-32B) are [announced](https:\u002F\u002Fwww.bespokelabs.ai\u002Fblog\u002Fbespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation).\n\n# Results\nOur [OpenThinker3-7B](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\u002FOpenThinker3-7B) model trained on [OpenThoughts3-1.2M](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts3-1.2M) is the state-of-the-art open-data 7B reasoning model.\nThe numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002FEvalchemy).\n\n| Model                                                                                           | Data  | AIME24 | AIME25 |  AMC23 | MATH500 | HMMT O2\u002F25 | LCB 06\u002F24-01\u002F25 | CodeElo | CodeForces | GPQA-D | JEEBench |\n| ----------------------------------------------------------------------------------------------- | ----- | ------ | ------ | ------ | ------- | ---------- | --------------- | ------- | ---------- | ------ | -------- |\n| [OpenThinker-7B](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\u002FOpenThinker-7B)                           | ✅    |  30.7  |  22.0  |  72.5  |   82.8  |   15.7     |    26.1         |  11.1   |  14.9      |  38.6  |  45.3    |\n| [OpenThinker2-7B](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\u002FOpenThinker2-7B)                         | ✅    |  60.7  |  38.7  |  89.8  |   87.6  |   24.7     |    40.6         |  22.8   |  26.6      |  47.0  |  65.1    |\n| **[OpenThinker3-7B](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\u002FOpenThinker3-7B)**                     | ✅    |**69.0**|**53.3**|**93.5**| **90.0**|   **42.7** |    **51.7**     |  31.0   |**32.2**    |  53.7  |**72.4**  |\n| [DeepSeek-R1-Distill-Qwen-32B](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-R1-Distill-Qwen-32B) | ❌    |  51.3  |  38.0  |  92.0  |   88.0  |   25.0     |    34.5         |  19.9   |  21.1      |  33.2  |  50.4    |\n| [OpenR1-Distill-7B](https:\u002F\u002Fhuggingface.co\u002Fopen-r1\u002FOpenR1-Distill-7B)                           | ✅    |  57.7  |  39.7  |  87.0  |   88.0  |   25.7     |    30.7         |  30.1   |  29.3      |**58.9**|  68.7    |\n| [Llama-3.1-Nemotron-Nano-8B-v1](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FLlama-3.1-Nemotron-Nano-8B-v1)    | ✅    |  62.0  |  48.0  |**94.0**|   89.4  |   26.7     |    **50.9**     |  30.9   |**32.9**    |  52.9  |  70.7    |\n| [AceReason-Nemotron-7B](https:\u002F\u002Fhuggingface.co\u002Fnvidia\u002FAceReason-Nemotron-7B)                    | ✅    |**71.0**|  50.7  |**93.8**|   89.8  |   33.3     |    44.3         |**32.9** |**30.9**    |  52.9  |  64.3    |\n\n\nTo mitigate variance in evaluation accuracy, we compute average scores over multiple evaluation runs with different seeds. More details can be found in our OpenThoughts [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04178).\n\n\nWe are fully open-source. Our [model weights](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts), [datasets](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts), [data generation code](https:\u002F\u002Fgithub.com\u002Fopen-thoughts\u002Fopen-thoughts), [evaluation code](https:\u002F\u002Fgithub.com\u002Fmlfoundations\u002FEvalchemy), and [training code](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLLaMA-Factory) are all publicly available. \n\n# Installation\n```\nmake install\npoetry shell\n```\nSet the DeepSeek API key:\n```\nexport DEEPSEEK_API_KEY=your_api_key\n```\n\nSet HF_ORG to your organization id. Set HF_PRIVATE=true if you want to push to a private repo.\n```\nexport HF_ORG=your_org_id\nexport HF_PRIVATE=false\n```\n# OpenThoughts3-1.2M Data Generation\nThe [OpenThoughts3-1.2M](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts3-1.2M) dataset consists of 850,000 math questions, 250,000 code questions, and 100,000 science questions. As opposed to previous OpenThoughts models that used R1 annotations, OpenThoughts3's reasoning traces are generated with QwQ-32B. This dataset is the result of 1000+ experiments to test out various design choices involved in dataset curation. More details can be found in our [OpenThoughts paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04178). \n\n\u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: light)\" width=\"100%\" srcset=\"images\u002Fopenthoughts3-diagram.png\">\n    \u003Cimg alt=\"Data Curation Recipe\" width=\"100%\" src=\"images\u002Fopenthoughts3-diagram_dark.png\">\n\u003C\u002Fpicture>\n\n\n# OpenThoughts2-1M Data Generation\nThe [OpenThoughts2-1M](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts2-1M) dataset is a combination of [OpenThoughts-114k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k), [OpenR1-Math](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-r1\u002FOpenR1-Math-Raw), and our newly generated math and code reasoning data. We generate the additional math and code data by ablating on 26 different question generation methodologies and sampling from the highest performing ones.\n\nThe recipe is outlined below:\n\u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: light)\" width=\"100%\" srcset=\"images\u002Fopenthoughts2-diagram.png\">\n    \u003Cimg alt=\"Data Curation Recipe\" width=\"100%\" src=\"images\u002Fopenthoughts2-diagram_dark.png\">\n\u003C\u002Fpicture>\n\nMore details can be found in our [blog post](https:\u002F\u002Fwww.open-thoughts.ai\u002Fblog\u002Fthinkagain). \n\n\n# OpenThoughts-114k Data Generation\n\nFor OpenThoughts-114k, we generate data for the following domains:\n1. Code\n2. Math\n3. Science\n4. Puzzle\n\nThe recipe is outlined below:\n\u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: light)\" width=\"100%\" srcset=\"images\u002Fdiagram.png\">\n    \u003Cimg alt=\"Data Curation Recipe\" width=\"100%\" src=\"images\u002Fdiagram_dark.png\">\n\u003C\u002Fpicture>\n\nMore instructions are in [open_thoughts\u002FREADME.md](open_thoughts\u002FREADME.md).\n\n\n# Training and Evaluation\nTraining and evaluation code coming soon.\n\n# Links\n- 📝 [OpenThoughts Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04178)\n- 📊 [OpenThoughts3-1.2M and OpenThinker3-7B Blog Post](https:\u002F\u002Fwww.open-thoughts.ai\u002Fblog\u002Fot3)\n- 💻 [Open Thoughts GitHub Repository](https:\u002F\u002Fgithub.com\u002Fopen-thoughts\u002Fopen-thoughts)\n- 🧠 [OpenThoughts3-1.2M dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts3-1.2M)\n- 🤖 [OpenThinker3-7B model](https:\u002F\u002Fhuggingface.co\u002Fopen-thoughts\u002FOpenThinker3-7B)\n\n\n# About Us\n\nWe are a team of researchers and engineers from [Bespoke Labs](https:\u002F\u002Fwww.bespokelabs.ai\u002F), Stanford, University of California Berkeley, University of Washington, UT Austin, Juelich Supercomputing Center (JSC), LAION, UCLA, UNC Chapel Hill, UT Austin, and Toyota Research Institute united around building the best datasets (and thus the best models). See our previous works at [datacomp.ai](https:\u002F\u002Fwww.datacomp.ai\u002F) and [mlfoundations](https:\u002F\u002Fgithub.com\u002Fmlfoundations).\n\n# Sponsors\nOpen Thoughts is supported by \n- [Bespoke Labs](https:\u002F\u002Fwww.bespokelabs.ai\u002F)\n- [Toyota Research Institute](https:\u002F\u002Fwww.tri.global)\n- [Lambda Labs](https:\u002F\u002Flambdalabs.com\u002F)\n- [NSF IFML](https:\u002F\u002Fwww.ifml.institute\u002F)\n- [UT Austin Machine Learning Lab](https:\u002F\u002Fml.utexas.edu\u002F)\n- [Juelich Supercomputing Center](https:\u002F\u002Fwww.fz-juelich.de\u002Fen\u002Fias\u002Fjsc)\n\n\n# Community\n[Make an edit](https:\u002F\u002Fgithub.com\u002Fopen-thoughts\u002Fopen-thoughts\u002Fedit\u002Fmain\u002FREADME.md) to add your project!\n\nJoin our [Discord community](https:\u002F\u002Fdiscord.gg\u002F9DsKjFtdXp) to discuss OpenThoughts and connect with other users!\n\nWhat the open source community is building with OpenThoughts:\n\n- [Light-R1-SFT](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fqihoo360\u002FLight-R1-SFTData) includes examples from [OpenThoughts-114k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k) and is used to train [Light-R1-14B-DS](https:\u002F\u002Fhuggingface.co\u002Fqihoo360\u002FLight-R1-14B-DS), [Light-R1-32B](https:\u002F\u002Fhuggingface.co\u002Fqihoo360\u002FLight-R1-32B), [Light-R1-7B-DS](https:\u002F\u002Fhuggingface.co\u002Fqihoo360\u002FLight-R1-7B-DS), [Light-R1-32B-DS](https:\u002F\u002Fhuggingface.co\u002Fqihoo360\u002FLight-R1-32B-DS)\n- [Traceback-12B](https:\u002F\u002Fhuggingface.co\u002Fsecemp9\u002FTraceBack-12b) is a reasoning model trained on a [dataset](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fsecemp9\u002Finstruction_solution_thought) that includes [OpenThoughts-114k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k) and [Bespoke-Stratos-17k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbespokelabs\u002FBespoke-Stratos-17k)\n- [190+ public models on Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fmodels?dataset=dataset:open-thoughts\u002FOpenThoughts-114k) have been trained using [OpenThoughts-114k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k)\n- [100+ public models on Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fmodels?dataset=dataset:bespokelabs\u002FBespoke-Stratos-17k) have been trained using [Bespoke-Stratos-17k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbespokelabs\u002FBespoke-Stratos-17k)\n- [Sky-T1](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.07374) uses [Bespoke-Stratos-17k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbespokelabs\u002FBespoke-Stratos-17k) for their R1 SFT experiments\n- Ollama has [created quantized versions](https:\u002F\u002Follama.com\u002Flibrary\u002Fopenthinker) of the OpenThinker-7B and OpenThinker-32B models, for running locally on your laptop\n- [CuratedThoughts](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbethgelab\u002FCuratedThoughts) is a filtered version of [OpenThoughts-114k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k) to make it suitable for RL training\n- [OpenThoughts-114k-math](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-r1\u002FOpenThoughts-114k-math) is a filtered version of the math subset in [OpenThoughts-114k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k) using [Math-Verify](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002FMath-Verify) verification on top of our LLM Judge with GT verification\n- [SmallThoughts](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FSmallDoge\u002FSmallThoughts) regenerates a 50k version of [OpenThoughts-114k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k) using a [fork](https:\u002F\u002Fgithub.com\u002FSmallDoges\u002Fsmall-thoughts) of this repo\n- [AM-DeepSeek-R1-Distilled-1.4M](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fa-m-team\u002FAM-DeepSeek-R1-Distilled-1.4M) is a state of the art reasoning dataset mix containing [OpenThoughts-114k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fopen-thoughts\u002FOpenThoughts-114k) and [Bespoke-Stratos-17k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbespokelabs\u002FBespoke-Stratos-17k)\n- [Marin 8B](https:\u002F\u002Fhuggingface.co\u002Fmarin-community\u002Fmarin-8b-instruct) of the Stanford Marin Project, a collaborative effort to develop open-source foundation models, is trained on [Bespoke-Stratos-17k](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fbespokelabs\u002FBespoke-Stratos-17k).\n\n\n# Citation\n```\n@misc{guha2025openthoughtsdatarecipesreasoning,\n  title={OpenThoughts: Data Recipes for Reasoning Models}, \n  author={Etash Guha and Ryan Marten and Sedrick Keh and Negin Raoof and Georgios Smyrnis and Hritik Bansal and Marianna Nezhurina and Jean Mercat and Trung Vu and Zayne Sprague and Ashima Suvarna and Benjamin Feuer and Liangyu Chen and Zaid Khan and Eric Frankel and Sachin Grover and Caroline Choi and Niklas Muennighoff and Shiye Su and Wanjia Zhao and John Yang and Shreyas Pimpalgaonkar and Kartik Sharma and Charlie Cheng-Jie Ji and Yichuan Deng and Sarah Pratt and Vivek Ramanujan and Jon Saad-Falcon and Jeffrey Li and Achal Dave and Alon Albalak and Kushal Arora and Blake Wulfe and Chinmay Hegde and Greg Durrett and Sewoong Oh and Mohit Bansal and Saadia Gabriel and Aditya Grover and Kai-Wei Chang and Vaishaal Shankar and Aaron Gokaslan and Mike A. Merrill and Tatsunori Hashimoto and Yejin Choi and Jenia Jitsev and Reinhard Heckel and Maheswaran Sathiamoorthy and Alexandros G. Dimakis and Ludwig Schmidt},\n  year={2025},\n  eprint={2506.04178},\n  archivePrefix={arXiv},\n  primaryClass={cs.LG},\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04178}, \n}\n```\n","Open Thoughts 是一个致力于为推理模型提供完全开放数据整理的项目。它通过收集和整理高质量的开放推理数据集，支持训练出超越现有模型（如DeepSeek-R1-Distill-Qwen-32B和7B）的小型推理模型，尤其在数学和代码推理基准测试上表现优异。项目采用Python语言开发，并遵循Apache License 2.0开源许可协议。适合需要高质量推理数据集的研究人员、开发者以及任何对改进AI推理能力感兴趣的个人或团队使用。",2,"2026-06-11 03:42:29","high_star"]