[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11085":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":16,"lastSyncTime":26,"discoverSource":27},11085,"HeavySkill","wjn1996\u002FHeavySkill","wjn1996","HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness",null,"Python",122,16,30,0,1,2,35,3,3.69,false,"main",[],"2026-06-12 02:02:29","# HeavySkill\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.02396-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.02396)\n[![PDF](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-PDF-green.svg)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2605.02396)\n\n**Heavy Thinking as the Inner Skill in Agentic Harness**\n\nHeavySkill is a test-time scaling technique that decomposes complex reasoning into two stages:\n1. **Parallel Reasoning** — Generate K independent reasoning trajectories concurrently\n2. **Sequential Deliberation** — Synthesize trajectories through critical analysis into a superior final answer\n\nThis repository provides two modes of use:\n\n| Mode | Description | Use Case |\n|------|-------------|----------|\n| **Workflow** | Python async pipeline with CLI | Batch evaluation, research experiments, custom deployments |\n| **Skill** | Pure prompt file for Claude Code \u002F agentic harness | Interactive reasoning in AI-native IDEs |\n\n## Key Results\n\n- Heavy thinking consistently outperforms Best-of-N (majority voting) strategies\n- Stronger LLMs can approach Pass@N performance through deliberation\n- The depth (iterations) and width (K) of heavy thinking are scalable via RLVR\n\n## Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fwjn1996\u002FHeavySkill.git\ncd HeavySkill\npip install -e .\n```\n\n## Quick Start\n\n### Mode 1: Workflow (Python Pipeline)\n\n```bash\npython scripts\u002Frun_heavyskill.py \\\n    --query \"Find the number of paths of length 16 on an 8x8 grid that change direction exactly four times.\" \\\n    --model \"deepseek-r1\" \\\n    --api_base \"http:\u002F\u002Flocalhost:8080\" \\\n    --reason_k 8 \\\n    --summary_k 4 \\\n    --prompt_type \"stem\" \\\n    --output \"outputs\u002Fresult.json\" \\\n    --verbose\n```\n\n**Parameters:**\n- `--reason_k`: Number of parallel reasoning trajectories (default: 8)\n- `--summary_k`: Number of deliberation samples (default: 4)\n- `--iterations`: Iterative deliberation rounds (default: 1)\n- `--prompt_type`: `\"general\"` or `\"stem\"`\n- `--language`: `\"en\"` or `\"cn\"`\n\n**Using a separate deliberation model:**\n```bash\npython scripts\u002Frun_heavyskill.py \\\n    --query \"Your problem here\" \\\n    --model \"r1-distill-qwen-7b\" \\\n    --api_base \"http:\u002F\u002Flocalhost:8080\" \\\n    --summary_model \"qwen3-32b\" \\\n    --summary_api_base \"http:\u002F\u002Flocalhost:8081\" \\\n    --reason_k 16 \\\n    --summary_k 4\n```\n\n**Batch mode:**\n```bash\npython scripts\u002Frun_heavyskill.py \\\n    --input_file \"examples\u002Fexample_math.json\" \\\n    --model \"deepseek-r1\" \\\n    --api_base \"http:\u002F\u002Flocalhost:8080\" \\\n    --output \"outputs\u002Fbatch_result.json\"\n```\n\n### Mode 2: Skill (Claude Code \u002F Agentic Harness)\n\nCopy the skill file into your Claude Code skills directory:\n\n```bash\ncp skill\u002Fheavyskill.md ~\u002F.claude\u002Fskills\u002Fheavyskill.md\n```\n\nThen in Claude Code, the heavy thinking protocol will be available for complex reasoning tasks. The skill instructs the model to:\n1. Spawn multiple independent reasoning agents in parallel\n2. Collect diverse reasoning trajectories\n3. Perform critical meta-analysis and deliberation\n4. Output the synthesized final answer\n\n## Project Structure\n\n```\nHeavySkill\u002F\n├── workflow\u002F                    # Mode 1: Python async pipeline\n│   ├── config.py               # Configuration dataclass\n│   ├── parallel_reasoning.py   # Stage 1: Parallel trajectory generation\n│   ├── sequential_deliberation.py  # Stage 2: Synthesis & deliberation\n│   ├── memory_cache.py         # Trajectory storage & selection\n│   ├── prompts.py              # Prompt templates (general, STEM, CN\u002FEN)\n│   ├── pipeline.py             # Full pipeline orchestration\n│   ├── utils.py                # Utilities (clipping, extraction, etc.)\n│   └── agent\u002F\n│       ├── base.py             # Abstract agent interface\n│       └── openai_compatible.py # OpenAI-compatible async API client\n├── scripts\u002F\n│   ├── run_heavyskill.py       # CLI entry point\n│   ├── run_heavyskill.sh       # Example shell script\n│   └── evaluate.py             # Simple accuracy evaluation\n├── skill\u002F\n│   └── heavyskill.md           # Pure prompt skill for agentic harness\n├── examples\u002F\n│   └── example_math.json       # Example input data\n├── paper\u002F\n│   └── heavyskill.pdf          # Paper\n├── requirements.txt\n└── pyproject.toml\n```\n\n## How It Works\n\n```\n┌─────────────────────────────────────────────────────────┐\n│                      User Query                          │\n└─────────────────────┬───────────────────────────────────┘\n                      │\n                      ▼\n┌─────────────────────────────────────────────────────────┐\n│            Stage 1: Parallel Reasoning                   │\n│                                                         │\n│   ┌──────────┐ ┌──────────┐ ┌──────────┐    ┌──────┐  │\n│   │ Thinker 1│ │ Thinker 2│ │ Thinker 3│ ...│  K   │  │\n│   └────┬─────┘ └────┬─────┘ └────┬─────┘    └──┬───┘  │\n│        │             │             │             │      │\n└────────┼─────────────┼─────────────┼─────────────┼──────┘\n         │             │             │             │\n         ▼             ▼             ▼             ▼\n┌─────────────────────────────────────────────────────────┐\n│                    Memory Cache                          │\n│         (Store & organize K trajectories)                │\n└─────────────────────┬───────────────────────────────────┘\n                      │\n                      ▼\n┌─────────────────────────────────────────────────────────┐\n│          Stage 2: Sequential Deliberation                │\n│                                                         │\n│   - Analyze answer distribution across trajectories      │\n│   - Cross-validate reasoning chains                      │\n│   - Identify logical errors & correct approaches         │\n│   - Synthesize final answer with critical thinking       │\n│                                                         │\n│              ┌─── Iterative Update (optional) ◄──┐      │\n│              └───────────────────────────────────┘      │\n└─────────────────────┬───────────────────────────────────┘\n                      │\n                      ▼\n┌─────────────────────────────────────────────────────────┐\n│                    Final Answer                          │\n└─────────────────────────────────────────────────────────┘\n```\n\n## API Compatibility\n\nThe workflow supports any OpenAI-compatible API endpoint:\n- **vLLM** serving (`--api_base http:\u002F\u002Flocalhost:8000`)\n- **DeepSeek API** (`--api_base https:\u002F\u002Fapi.deepseek.com`)\n- **Together AI** (`--api_base https:\u002F\u002Fapi.together.xyz`)\n- **OpenRouter** (`--api_base https:\u002F\u002Fopenrouter.ai\u002Fapi`)\n- **Local Ollama** (`--api_base http:\u002F\u002Flocalhost:11434`)\n\n## Citation\n\n```bibtex\n@article{wang2026heavyskill,\n  title={HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness},\n  author={Wang, Jianing and Guo, Linsen and Chen, Zhengyu and Guo, Qi and Zang, Hongyu and Shi, Wenjie and Ma, Haoxiang and Xi, Xiangyu and Li, Xiaoyu and Wang, Wei and Cai, Xunliang},\n  journal={arXiv preprint arXiv:2605.02396},\n  year={2026},\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.02396}\n}\n```\n\n## License\n\nApache-2.0\n","HeavySkill 是一种测试时扩展技术，旨在通过两阶段分解复杂推理过程：并行推理和顺序审议，以生成更优的最终答案。该项目的核心功能包括并发生成多个独立的推理轨迹，并通过批判性分析综合这些轨迹。技术特点上，它支持自定义部署、批处理评估及研究实验，并可通过调整参数如推理轨迹数量和审议轮次来优化性能。适用于需要在AI原生IDE中进行交互式推理或在批量任务中提升大型语言模型推理质量的场景。","2026-06-11 03:31:10","CREATED_QUERY"]