[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-4672":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},4672,"Ctx2Skill","S1s-Z\u002FCtx2Skill","S1s-Z","Code for \"From Context to Skills: Can Language Models Learn from Context Skillfully? \"","https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.27660",null,"Python",291,25,32,1,0,24,37,183,72,4.24,false,"main",[],"2026-06-12 02:01:03","# Ctx2Skill: From Context to Skills\n\n\n> **Can Language Models Learn from Context Skillfully?**\n\nThe code of our paper \"**From Context to Skills: Can Language Models Learn from Context Skillfully?** \".\n\nCtx2Skill is a self-evolving framework that autonomously discovers, refines, and selects context-specific skills from complex contexts, requiring **no human annotation** and **no external feedback**. The resulting natural-language skills can be plugged into any language model at inference time to enhance context learning capability.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fintro.png\" alt=\"Ctx2Skill Intro\" width=\"80%\">\n\u003C\u002Fp>\n\n## Overview\n\nMany real-world tasks require language models to reason over complex contexts (e.g., technical documents, research papers, code repositories) that lie outside their parametric knowledge. An intuitive solution is **inference-time skill augmentation** — extracting rules and procedures from the context into explicit, natural-language skills. However, constructing such skills faces two fundamental challenges:\n\n1. **Prohibitive cost** of manual skill annotation for long, technically dense contexts\n2. **Lack of external feedback** for automated skill construction in context learning scenarios\n\nCtx2Skill addresses both challenges through a **multi-agent self-play loop**:\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Foverall.png\" alt=\"Ctx2Skill Overview\" width=\"90%\">\n\u003C\u002Fp>\n\n## Method\n\n### Multi-Agent Self-Play Loop\n\nThe core of Ctx2Skill is a self-play loop comprising five frozen-LM agent roles:\n\n| Agent | Role |\n|-------|------|\n| **Challenger** | Generates probing tasks and rubrics based on the context and its own evolving skill set |\n| **Reasoner** | Attempts to solve tasks guided by the context and its current skill set |\n| **Judge** | Provides binary per-rubric verdicts and partitions tasks into solved\u002Ffailed sets |\n| **Proposer** (one per side) | Diagnoses failure\u002Fsuccess patterns and synthesizes high-level skill update proposals |\n| **Generator** (one per side) | Materializes proposals into concrete skill set updates |\n\nBoth the Challenger and Reasoner co-evolve through accumulated natural-language skills: failed cases drive Reasoner skill updates, while easily solved cases drive Challenger skill updates, maintaining sustained adversarial pressure.\n\n### Cross-Time Replay Mechanism\n\nA key risk in self-play is **adversarial collapse** — the Challenger generates increasingly extreme tasks while the Reasoner's skills over-specialize. To address this, the Cross-Time Replay mechanism:\n\n- Collects representative hard\u002Feasy probe tasks during self-play\n- Re-evaluates all historical skill set candidates on these probes\n- Selects the skill set that maximizes the product of hard-set and easy-set solving rates, ensuring robust generalization\n\n## Results\n\nEvaluated on four context learning tasks from CL-bench, Ctx2Skill consistently improves solve rates across backbone models:\n\n| Model | Without Skills | With Ctx2Skill | Improvement |\n|-------|---------------|----------------|-------------|\n| GPT-4.1 | 11.1% | 16.5% | +5.4% |\n| GPT-5.1 | 21.2% | 25.8% | +4.6% |\n| GPT-5.2 | 18.2% | 21.4% | +3.2% |\n\nWe conduct our experiments using newapi for GPT-4.1, and azure-api for GPT-5.1 and GPT-5.2. We provide the logs and generated responses in this [link](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fssz1111\u002FCtx2Skill) for reproducibility and analysis. We recommend **GPT-5.2** for reproduction, as it yields the most consistent results during our early experiments. We also provide our generated skills in this [link](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fssz1111\u002FCtx2Skill-Skills).\n\n## Quick Start\n\n### Prerequisites\n\n- Python 3.8+\n- OpenAI-compatible API access\n\n### Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FS1s-Z\u002FCtx2Skill.git\ncd Ctx2Skill\n```\n\n### Data Preparation\n\nDownload the CL-Bench dataset from this [link](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fssz1111\u002FCtx2Skill) files and place them in the project root:\n- `CL-bench-context-dedup.jsonl` — deduplicated contexts (used for skill generation)\n- `CL-bench-with-task-delimiter.jsonl` — tasks with delimiters (used for evaluation)\n- Evaluation logs and responses from GPT-4.1, GPT-5.1, and GPT-5.2.\n\n### Running the Self-Play Loop\n\n```bash\n# Configure API\nexport OPENAI_BASE_URL=\"your-api-base-url\"\nexport OPENAI_API_KEY=\"your-api-key\"\n\n# Run the self-play skill discovery loop\npython selfplay_loop.py \\\n    --challenger-model gpt-5.2 \\\n    --reasoner-model gpt-5.2 \\\n    --judge-model gpt-5.1 \\\n    --proposer-model gpt-5.2 \\\n    --generator-model gpt-5.2 \\\n    --input .\u002FCL-bench-context-dedup.jsonl \\\n    --output outputs\u002Floop_data\u002Floop_output.jsonl \\\n    --num-iterations 5 \\\n    --num-tasks 5 \\\n    --skills-dir skills-output \\\n    --workers 32\n```\n\n### Inference with Discovered Skills\n\n```bash\npython infer.py \\\n    --model gpt-5.2 \\\n    --input .\u002FCL-bench-with-task-delimiter.jsonl \\\n    --workers 32 \\\n    --skills-dir skills-output\u002Freasoner \\\n    --output outputs\u002Finference_output.jsonl\n```\n\n### Evaluation\n\n```bash\npython eval_ignore_none.py \\\n    --input outputs\u002Finference_output.jsonl \\\n    --judge-model gpt-5.1 \\\n    --workers 32\n```\n\n## Project Structure\n\n```\nCtx2Skill\u002F\n├── selfplay_loop.py       # Main self-play loop with all five agents\n├── challenger.py           # Challenger agent implementation\n├── infer.py                # Inference script with skill augmentation\n├── eval.py                 # Evaluation script\n├── eval_ignore_none.py     # Evaluation script (ignoring None responses)\n├── prompts\u002F                # Prompt templates for each agent role\n│   ├── challenger.txt\n│   ├── challenger_generator.txt\n│   ├── challenger_proposer.txt\n│   ├── reasoner_generator.txt\n│   └── reasoner_proposer.txt\n└── run.sh                  # Example run script\n```\n\n## Citation\n\n```bibtex\n@misc{si2026contextskillslanguagemodels,\n      title={From Context to Skills: Can Language Models Learn from Context Skillfully?}, \n      author={Shuzheng Si and Haozhe Zhao and Yu Lei and Qingyi Wang and Dingwei Chen and Zhitong Wang and Zhenhailong Wang and Kangyang Luo and Zheng Wang and Gang Chen and Fanchao Qi and Minjia Zhang and Maosong Sun},\n      year={2026},\n      eprint={2604.27660},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.27660}, \n}\n```\n\n\n## License\n\nThis project is released under the MIT License.\n","Ctx2Skill 是一个旨在从复杂上下文中自主发现、精炼和选择特定技能的框架，无需人工标注或外部反馈。其核心功能包括多智能体自博弈循环和跨时间重放机制，通过五个固定语言模型代理角色协作完成技能的生成与优化，确保技能集在面对挑战性任务时能够稳健地泛化。此项目特别适用于需要语言模型理解并处理超出其参数知识范围的复杂情境，例如技术文档、研究论文及代码库等，能有效提升语言模型在推理阶段对上下文的学习能力。",2,"2026-06-11 03:00:06","CREATED_QUERY"]