[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9793":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":41,"readmeContent":42,"aiSummary":43,"trendingCount":15,"starSnapshotCount":15,"syncStatus":44,"lastSyncTime":45,"discoverSource":46},9793,"Chronos","Kodezi\u002FChronos","Kodezi","Kodezi Chronos is a debugging-first language model that achieves state-of-the-art results on SWE-bench Lite (80.33%) and 67% real-world fix accuracy, over six times better than GPT-4. Built with Adaptive Graph-Guided Retrieval and Persistent Debug Memory. Model available Q1 2026 via Kodezi OS.","https:\u002F\u002Fchronos.so\u002F",null,"Java",4935,215,38,0,1,8,60.3,"Other",false,"main",[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40],"artificial-intelligence","autonomous-debugging","benchmark","benchmark-report","bug-fixing","chronos","code","code-analysis","code-analysis-tool","code-debugger","code-understanding","debugging","developer-tools","kodezi","language-model","machine-learning","program-repair","software-engineering","2026-06-12 04:00:46","\u003Cdiv align=\"center\">\n\n# Kodezi Chronos\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"results\u002Ffigures\u002Fchronos_hero.png\" alt=\"Introducing Kodezi Chronos-1\" width=\"100%\">\n\u003C\u002Fp>\n\n## The World's First Debugging-First Language Model for Repository-Scale Code Understanding\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2507.12482-b31b1b.svg?style=for-the-badge)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12482)\n[![Model Access](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModel-Chronos%20Waitlist-4B7BFF.svg?style=for-the-badge)](https:\u002F\u002Fchronos.so)\n[![Research](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FResearch-Paper-orange.svg?style=for-the-badge)](paper\u002Fchronos-research.md)\n[![Benchmark](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBenchmark-SWE--bench%20Lite-purple.svg?style=for-the-badge)](evaluation\u002Flite\u002F)\n[![Leaderboard](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLeaderboard-Results-gold.svg?style=for-the-badge)](LEADERBOARD.md)\n\n### Performance Badges\n\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSWE--bench%20Lite-80.33%25-gold?style=for-the-badge\" alt=\"SWE-bench Lite\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDebug%20Success-67.3%25-brightgreen?style=for-the-badge\" alt=\"Debug Success Rate\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuman%20Preference-89%25-blue?style=for-the-badge\" alt=\"Human Preference\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FImprovement-4--5x-yellow?style=for-the-badge\" alt=\"Improvement over GPT-4.1\">\n\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTime%20Reduction-40%25-orange?style=for-the-badge\" alt=\"Time Reduction\">\n\n### Key Achievements\n\n**80.33% SWE-bench Lite** • **67.3% Autonomous Debugging** • **89% Human Preference** • **40% Time Reduction**\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"results\u002Ffigures\u002Farchitecture_overview.svg\" alt=\"Chronos Architecture\" width=\"800\">\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n---\n\n## Table of Contents\n\n- [State-of-the-Art Results](#state-of-the-art-results)\n  - [SWE-bench Lite Performance](#swe-bench-lite-performance)\n  - [The Debugging Gap](#the-debugging-gap)\n  - [Repository-Specific Results](#repository-specific-results)\n- [MRR Benchmark Results](#mrr-benchmark-results)\n- [Key Innovations](#key-innovations)\n- [Architecture](#architecture)\n- [Benchmarks & Evaluation](#benchmarks--evaluation)\n- [Research Paper](#research-paper)\n- [Getting Started](#getting-started)\n- [Repository Structure](#repository-structure)\n- [Research Highlights](#research-highlights)\n- [Detailed Performance](#detailed-performance)\n- [Documentation](#documentation)\n- [Contributing](#contributing)\n- [Citation](#citation)\n- [License](#license)\n\n---\n\n## Model Access Notice\n\n\u003Cdiv align=\"center\">\n\n**Chronos is proprietary and available exclusively through Kodezi OS**\n\n| Timeline | Access | Details |\n|:--------:|:------:|:-------:|\n| **Q4 2025** | Beta | Limited enterprise access |\n| **Q1 2026** | GA | Via [Kodezi OS](https:\u002F\u002Fkodezi.com\u002Fos) |\n\n**This repository contains research paper, benchmarks, and evaluation results only.**\n\n**[Get Early Access](https:\u002F\u002Fchronos.so)** • **[Read Paper](paper\u002Fchronos-research.md)** • **[View Leaderboard](LEADERBOARD.md)** • **[Documentation](docs\u002F)**\n\n\u003C\u002Fdiv>\n\n---\n\n## 🏅 State-of-the-Art Results\n\n### 📈 SWE-bench Lite Performance\n\n\u003Cdiv align=\"center\">\n\n**Industry-Standard Benchmark Results**\n\n| Rank | System | Success Rate | Instances | Lead | Year |\n|:----:|:-------|:------------:|:---------:|:----:|:----:|\n| **1** | **Kodezi Chronos** | **80.33%** | **241\u002F300** | **+20.0pp** | **2025** |\n| 2 | ExpeRepair-v1.0 + Claude 4.5 Sonnet | 60.33% | 181\u002F300 | - | 2025 |\n| 3 | Claude 4.5 Sonnet (Bash Only) | ~14% | ~42\u002F300 | -66.3pp | 2025 |\n| 4 | Claude 4.1 Opus (Bash Only) | 14.2% | 43\u002F300 | -66.1pp | 2025 |\n| 5 | GPT-4.1 | 13.8% | 41\u002F300 | -66.5pp | 2025 |\n| 6 | Gemini 2.0 Pro | 13.4% | 40\u002F300 | -67.0pp | 2025 |\n\n**20 percentage point absolute lead over second place**\n\n\u003C\u002Fdiv>\n\n### The Debugging Gap\n\n\u003Cdiv align=\"center\">\n\n**General-Purpose Models: Code Generation vs Debugging Performance**\n\n| Model | SWE-bench Full\u003Cbr\u002F>(Code Gen) | SWE-bench Lite\u003Cbr\u002F>(Debugging) | Performance Gap |\n|:------|:-----------------------------:|:------------------------------:|:---------------:|\n| Claude 4.5 Sonnet | 72.7% | ~14% | **-58.7pp** |\n| Claude 4.1 Opus | 72.5% | 14.2% | **-58.3pp** |\n| Claude 4.1 Opus (Bash) | 67.60% | 14.2% | **-53.4pp** |\n| GPT-4.1 | 54.6% | 13.8% | **-40.8pp** |\n| **Kodezi Chronos** | **N\u002FA** | **80.33%** | **Specialized** |\n\n**Key Insight**: Even models achieving 70%+ on code generation drop to \u003C15% on debugging tasks, revealing a 50+ percentage point gap. **Chronos, purpose-built for debugging, achieves 80.33%**—demonstrating that debugging requires specialized architectures, not just larger context windows.\n\n\u003C\u002Fdiv>\n\n### Repository-Specific Results\n\n\u003Cdiv align=\"center\">\n\n**SWE-bench Lite: Domain-Specific Performance**\n\n| Repository | Domain | Chronos Success | Instances | Significance |\n|:-----------|:-------|:---------------:|:---------:|:-------------|\n| **sympy** | Symbolic Mathematics | **96.1%** | 51\u002F53 | Near-perfect mathematical reasoning |\n| **sphinx** | Documentation Systems | **93.8%** | 60\u002F64 | Exceptional doc generation bugs |\n| **django** | Web Frameworks | **90.4%** | 104\u002F115 | Complex framework debugging |\n| **Overall** | Mixed Domains | **80.33%** | **241\u002F300** | **State-of-the-art** |\n\n\u003C\u002Fdiv>\n\n---\n\n## 🔬 MRR Benchmark Results\n\n\u003Cdiv align=\"center\">\n\n### 📊 Overall Performance (5,000 Multi-Random Retrieval Scenarios - Sample Dataset of 500 Available)\n\n| Metric | **Chronos** | GPT-4.1 | Claude 4.1 Opus | Gemini 2.0 Pro | Improvement |\n|:-------|:-----------:|:-------:|:---------------:|:--------------:|:-----------:|\n| **Debug Success Rate** | **67.3% ± 2.1%** | 13.8% | 14.2% | 15.0% | **4.5x** |\n| **Root Cause Accuracy** | **89%*** | 12.3% ± 1.8% | 11.7% ± 2.0% | 15.8% ± 1.5% | **5.6-7.6x** |\n| **Retrieval Precision** | **92%*** | 68% ± 2.3% | 67% ± 2.4% | 74% ± 1.8% | **1.2-1.4x** |\n| **Retrieval Recall** | **85%** | 32% ± 2.1% | 34% ± 2.0% | 42% ± 1.9% | **2.0-2.7x** |\n| **Avg Fix Iterations** | **7.8** | 1-2 | 1-2 | 1-2 | More thorough |\n| **Time Reduction** | **40%** | - | - | - | 40% faster |\n\n***p \u003C 0.001 compared to best baseline (two-tailed t-test, n=5,000)** • Sample dataset (n=500) available now, full benchmark Q1 2026*\n\n\u003C\u002Fdiv>\n\n### 🐛 Performance by Bug Category\n\n\u003Cdiv align=\"center\">\n\n| Bug Category | Chronos | GPT-4.1 | Claude 4.1 Opus | Gemini 2.0 Pro | Chronos Advantage |\n|:-------------|:-------:|:-------:|:---------------:|:--------------:|:-----------------:|\n| **Syntax Errors** | 94.2% | 82.3% | 79.8% | 85.1% | 1.1x |\n| **Logic Bugs** | 72.8% | 12.1% | 10.7% | 15.3% | **6.0x** |\n| **Concurrency Issues** | 58.3% | 3.2% | 2.8% | 4.1% | **18.2x** |\n| **Memory Problems** | 61.7% | 5.7% | 4.3% | 6.9% | **10.8x** |\n| **API Misuse** | 79.1% | 18.9% | 16.2% | 22.4% | **4.2x** |\n| **Performance Bugs** | 65.4% | 7.4% | 6.1% | 9.8% | **8.8x** |\n\n\u003C\u002Fdiv>\n\n### 📏 Repository Scale Performance\n\n\u003Cdiv align=\"center\">\n\n| Repository Size | Chronos Success | Best Baseline | Baseline Model | Improvement |\n|:---------------:|:---------------:|:-------------:|:--------------:|:-----------:|\n| **\u003C10K LOC** | 71.2% ± 2.8% | 21.3% ± 3.5% | Gemini 2.0 Pro | **3.3x** |\n| **10K-100K LOC** | 68.9% ± 2.5% | 14.7% ± 3.2% | Gemini 2.0 Pro | **4.7x** |\n| **100K-1M LOC** | 64.3% ± 2.9% | 8.9% ± 2.8% | Gemini 2.0 Pro | **7.2x** |\n| **>1M LOC** | 59.7% ± 3.1% | 3.8% ± 1.9% | Gemini 2.0 Pro | **15.7x** |\n\n\u003C\u002Fdiv>\n\n---\n\n## 💡 Key Innovations\n\n### 1. **Debugging-First Architecture**\n- Trained on **42.5M real debugging examples** (not code completion)\n- Specialized for **root cause analysis** and **multi-file patches**\n- **89% root cause accuracy** vs 15.8% best baseline\n- **7-layer architecture** optimized for debugging workflows\n\n### 2. **Persistent Debug Memory (PDM)**\n- Repository-specific learning from **15M+ debugging sessions**\n- Improves from **35% → 65%** success rate over time\n- Cross-session pattern recognition and learning\n- **87% cache hit rate** for similar bugs\n- Temporal pattern learning across project lifecycles\n\n### 3. **Adaptive Graph-Guided Retrieval (AGR)**\n- **O(k log d)** complexity with dynamic k-hop expansion\n- **92% precision, 85% recall** on multi-file context\n- Handles **unlimited repository scale** intelligently\n- Multi-hop traversal with confidence-based termination\n- **3.8x faster** than traditional retrieval methods\n\n### 4. **Output-Optimized Design**\n- Optimized for **~3K output tokens** (fixes, tests, docs)\n- **47.2% output entropy density** vs 12.8% for completion models\n- Designed for **complex patch generation**\n- Template-aware generation for consistency\n- Confidence-guided output strategy\n\n### 5. **Autonomous Debugging Loop**\n- Average **7.8 iterations** to successful fix\n- **Propose → Test → Analyze → Refine** cycles\n- **67.3% fully autonomous** success rate\n- Execution sandbox with real-time feedback\n- Iterative refinement until validation succeeds\n\n---\n\n## 🏗️ Architecture\n\n### Seven-Layer System Design\n\n```\n┌─────────────────────────────────────────────┐\n│   7. Explainability Layer                   │  Human-readable root cause analysis\n├─────────────────────────────────────────────┤\n│   6. Execution Sandbox                      │  Isolated test validation\n├─────────────────────────────────────────────┤\n│   5. Persistent Debug Memory (PDM)          │  Repository-specific learning\n├─────────────────────────────────────────────┤\n│   4. Orchestration Controller               │  Autonomous debugging loop\n├─────────────────────────────────────────────┤\n│   3. Debug-Tuned LLM Core                   │  42.5M debugging examples\n├─────────────────────────────────────────────┤\n│   2. Adaptive Retrieval Engine (AGR)        │  Dynamic k-hop graph traversal\n├─────────────────────────────────────────────┤\n│   1. Multi-Source Input Layer               │  Code, logs, traces, tests, docs\n└─────────────────────────────────────────────┘\n```\n\n### Layer Descriptions\n\n1. **Multi-Source Input Layer**: Processes code, logs, traces, tests, docs simultaneously\n2. **Adaptive Retrieval Engine (AGR)**: Dynamic k-hop graph traversal (92% precision)\n3. **Debug-Tuned LLM Core**: 42.5M debugging examples, not code completion\n4. **Orchestration Controller**: Autonomous debugging loop management\n5. **Persistent Debug Memory (PDM)**: Repository-specific learning (35% → 65% improvement)\n6. **Execution Sandbox**: Isolated test validation environment\n7. **Explainability Layer**: Human-readable root cause analysis\n\n**[View Detailed Architecture Documentation →](architecture\u002FREADME.md)**\n\n---\n\n## 🧪 Benchmarks & Evaluation\n\n### 📋 Available Benchmarks\n\n| Benchmark | Type | Instances | Purpose | Results |\n|:----------|:-----|:---------:|:--------|:-------:|\n| **SWE-bench Lite** | Industry Standard | 300 | Real-world debugging | [**80.33%**](evaluation\u002Flite\u002F) |\n| **MRR Benchmark** | Custom | 5,000 (500 sample) | Multi-random retrieval | [**67.3%**](benchmarks\u002Fmulti-random-retrieval\u002F) |\n| **Repository Scale** | Custom | Varied | Large codebase testing | [**59.7-71.2%**](benchmarks\u002F) |\n| **Bug Categories** | Custom | 4,400+ | Bug type specialization | [**58.3-94.2%**](benchmarks\u002F) |\n\n### 🏆 SWE-bench Lite Evaluation Results\n\n**[View Complete SWE-bench Lite Submission →](evaluation\u002Flite\u002F20251111_kodezi_chronos_1\u002F)**\n\nThe evaluation directory contains:\n- **README.md**: Detailed submission results and methodology\n- **metadata.yaml**: Submission metadata and configuration\n- **all_preds.jsonl**: All 300 instance predictions\n- **Kodezi Chronos-1.hybrid_eval.json**: Complete evaluation metrics\n- **logs\u002F**: Execution logs for all instances\n- **results\u002F**: Per-instance results and analysis\n- **trajs\u002F**: Debugging trajectories and fix attempts\n\n### 🎯 Multi-Random Retrieval (MRR) Benchmark\n\n**MRR simulates real-world debugging complexity:**\n- **Spatial Distribution**: Bug context scattered across 10-50 files\n- **Temporal Dispersion**: Relevant information from 3-12 months of history\n- **Obfuscation Levels**: Low\u002Fmedium\u002Fhigh code complexity\n- **5,000 Scenarios**: Comprehensive evaluation across languages (sample dataset of 500 available now, full benchmark Q1 2026)\n\n| Metric | Chronos | GPT-4.1+RAG | Claude 4.1+VectorDB | Gemini 2.0+Graph |\n|:-------|:-------:|:-----------:|:-------------------:|:----------------:|\n| **Precision@10** | 92% | 42.3% | 48.1% | 51.7% |\n| **Recall@10** | 85% | 31.7% | 36.2% | 41.8% |\n| **Fix Accuracy** | 67.3% | 8.9% | 11.2% | 14.6% |\n| **Context Efficiency** | 0.71 | 0.23 | 0.28 | 0.31 |\n\n**[View Complete Benchmark Documentation →](benchmarks\u002FREADME.md)**\n\n---\n\n## 📚 Research Paper\n\n### Published Research\n\n**Title**: Kodezi Chronos: A Debugging-First Language Model for Repository-Scale Code Understanding\n\n**Authors**: Ishraq Khan, Assad Chowdary, Sharoz Haseeb, Urvish Patel, Yousuf Zaii\n\n**Institution**: Kodezi Inc.\n\n**Publication**: arXiv:2507.12482 (2025)\n\n### Paper Resources\n\n| Resource | Description | Link |\n|:---------|:------------|:----:|\n| **arXiv Paper** | Official publication | [View](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12482) |\n| **Full Paper (Markdown)** | Complete paper in markdown | [View](paper\u002Fchronos-research.md) |\n| **2025 Updates** | Latest research findings | [View](paper\u002Fchronos-research-2025.md) |\n| **Abstract** | Executive summary | [View](paper\u002Fabstract.md) |\n| **Methodology** | Research methodology | [View](paper\u002Fmethodology.md) |\n| **Related Work** | Literature review | [View](paper\u002Frelated_work.md) |\n| **Future Work** | Research directions | [View](paper\u002Ffuture_work.md) |\n\n### Key Contributions\n\n1. **Debugging-Specific Architecture**: First LM trained specifically on debugging workflows (42.5M examples)\n2. **Adaptive Graph-Guided Retrieval (AGR)**: Novel multi-hop retrieval with O(k log d) complexity\n3. **Persistent Debug Memory (PDM)**: Cross-session learning system for repository-specific patterns\n4. **Comprehensive Evaluation**: 12,500 real-world bugs across multiple benchmarks\n5. **State-of-the-Art Results**: 80.33% on SWE-bench Lite (20pp lead over second place)\n\n---\n\n## 🚀 Getting Started\n\n### Prerequisites\n\n```bash\n# Python 3.8+ required\npython --version\n\n# Git for cloning\ngit --version\n```\n\n### Quick Start: Running Benchmarks\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002Fkodezi\u002Fchronos-research.git\ncd chronos-research\n\n# Install dependencies\npip install -r requirements.txt\n\n# Run MRR benchmark on your model\npython benchmarks\u002Frun_mrr_benchmark_2025.py \\\n  --model your_model \\\n  --scenarios 100  # Start with subset for testing\n\n# Run full sample evaluation (500 scenarios available)\npython benchmarks\u002Frun_mrr_benchmark_2025.py \\\n  --model your_model \\\n  --scenarios 500\n\n# Analyze results\npython benchmarks\u002Fanalyze_results.py \\\n  --results_dir results\u002Fyour_model\n```\n\n### Model Access\n\n**The Chronos model is NOT included in this repository**\n\nThis repository contains:\n- Research paper and documentation\n- Benchmark suite and evaluation framework\n- Performance results and analysis\n- Chronos model (proprietary - NOT included)\n\n**To access Chronos model**:\n\n| Access Method | Availability | Details |\n|:--------------|:-------------|:--------|\n| **Kodezi OS** | Q4 2025 (Beta) | Enterprise beta access |\n| **Kodezi OS** | Q1 2026 (GA) | General availability |\n| **API Access** | Q1 2026 | API endpoints |\n\n**[Join Waitlist →](https:\u002F\u002Fchronos.so)** | **[Contact Sales →](mailto:sales@kodezi.com)**\n\n---\n\n## 📁 Repository Structure\n\n```\nchronos-research\u002F\n│\n├── benchmarks\u002F                    # Benchmark Suite\n│   ├── multi-random-retrieval\u002F      # 5,000 scenario MRR benchmark (500 sample available)\n│   ├── comprehensive_benchmarks\u002F    # Extended test scenarios\n│   ├── debug_categories\u002F            # Bug type categorization (6 types)\n│   ├── evaluation_metrics\u002F          # Custom metrics implementation\n│   ├── run_mrr_benchmark_2025.py    # Main benchmark runner\n│   └── analyze_results.py           # Results analysis tools\n│\n├── evaluation\u002F                    # Evaluation Results\n│   └── lite\u002F                        # SWE-bench Lite results (80.33%)\n│       └── 20251111_kodezi_chronos_1\u002F  # Official submission\n│           ├── all_preds.jsonl      # All 300 predictions\n│           ├── logs\u002F                # 300+ execution logs\n│           ├── results\u002F             # Per-instance results\n│           └── trajs\u002F               # Debugging trajectories\n│\n├── paper\u002F                         # Research Paper\n│   ├── chronos-research.md          # Full paper (arXiv:2507.12482)\n│   ├── chronos-research-2025.md     # 2025 updates\n│   ├── abstract.md                  # Executive summary\n│   ├── methodology.md               # Research methodology\n│   └── figures\u002F                     # Visualizations\n│\n├── architecture\u002F                  # Architecture Documentation\n│   ├── README.md                    # Architecture overview\n│   ├── AGR_ALGORITHM.md             # Adaptive Graph-Guided Retrieval\n│   ├── memory_engine.md             # Persistent Debug Memory (PDM)\n│   └── debugging_loop.md            # Autonomous loop design\n│\n├── results\u002F                       # Performance Data\n│   ├── figures\u002F                     # 15+ SVG visualizations\n│   ├── ablation_studies\u002F            # Component impact analysis\n│   ├── case_studies\u002F                # Real-world debugging examples\n│   └── raw_data\u002F                    # Benchmark outputs (CSV\u002FJSON)\n│\n├── reference_implementations\u002F     # Algorithm Reference Code\n│   ├── algorithms\u002F                  # AGR, PDM reference implementations\n│   └── NOTICE.md                    # Proprietary notice\n│\n├── docs\u002F                          # Documentation\n│   ├── getting_started.md           # Quick start guide\n│   ├── API_DOCUMENTATION.md         # API reference (Q1 2026)\n│   ├── faq.md                       # Frequently asked questions\n│   └── limitations.md               # Known constraints\n│\n├── LEADERBOARD.md                 # Performance rankings\n├── CITATION.cff                   # Citation information (BibTeX)\n├── CONTRIBUTING.md                # Contribution guidelines\n├── LICENSE                        # MIT License + proprietary notice\n└── requirements.txt               # Python dependencies\n```\n\n**Key Directories:**\n- **benchmarks\u002F**: 5,000 scenario MRR benchmark (500 sample available), multi-language support, automated evaluation\n- **evaluation\u002F**: SWE-bench Lite results (80.33%, 241\u002F300 instances)\n- **paper\u002F**: Complete research paper and documentation (arXiv:2507.12482)\n- **architecture\u002F**: 7-layer system design, AGR\u002FPDM documentation\n- **results\u002F**: 12,500+ bug resolutions, visualizations, statistical analysis\n- **reference_implementations\u002F**: Algorithm reference code (NOT the actual model)\n---\n\n## 🔬 Research Highlights\n\n### Training Dataset Composition\n\n| Data Source | Volume | Description |\n|:------------|:------:|:------------|\n| **Debugging Examples** | 42.5M | Complete debugging workflows |\n| **GitHub Issues** | 15M | Issues with verified fixes |\n| **Stack Traces** | 8M | Error traces with resolutions |\n| **CI\u002FCD Logs** | 3M | Build and deployment debugging |\n| **Production Sessions** | 2.5M | Real-world production bugs |\n| **Curated Benchmarks** | 14M | Defects4J, SWE-bench, BugsInPy |\n\n**Total Training Data**: 42.5M debugging-specific examples (not code completion)\n\n### AGR Performance by Depth\n\n| Retrieval Strategy | Success Rate | Avg Time (s) | Use Case |\n|:-------------------|:------------:|:------------:|:---------|\n| k=1 hop | 58.2% | 12.3 | Simple bugs |\n| k=2 hops | 72.4% | 18.7 | Multi-file bugs |\n| k=3 hops | 83.1% | 24.5 | Complex dependencies |\n| k=adaptive | **87.1%** | 23.4 | **Optimal strategy** |\n| Flat retrieval | 23.4% | 45.2 | Baseline comparison |\n\n### PDM Learning Curve\n\n| Sessions | Success Rate | Token Efficiency | Memory Size |\n|:--------:|:------------:|:----------------:|:-----------:|\n| Initial | 35% | 1.0x | 0 GB |\n| 100 sessions | 52% | 3.2x | 2.1 GB |\n| 500 sessions | **65%** | **7.3x** | 8.7 GB |\n| 1000+ sessions | 67% | 8.1x | 15.2 GB |\n\n**Key Insight**: PDM enables continuous improvement through cross-session learning\n\n---\n\n## 📊 Detailed Performance\n\n### Language-Specific Performance\n\n\u003Cdiv align=\"center\">\n\n| Language | Chronos | GPT-4.1 | Claude 4.1 Opus | Gemini 2.0 Pro | Test Cases |\n|:--------:|:-------:|:-------:|:---------------:|:--------------:|:----------:|\n| **Python** | 68.7% ± 2.1% | 11.2% ± 2.8% | 10.3% ± 2.9% | 14.6% ± 2.6% | 1,823 bugs |\n| **JavaScript** | 64.2% ± 2.3% | 7.8% ± 2.5% | 6.9% ± 2.6% | 10.1% ± 2.4% | 1,547 bugs |\n| **Java** | 63.9% ± 2.2% | 6.3% ± 2.2% | 5.7% ± 2.3% | 9.2% ± 2.1% | 1,630 bugs |\n| **Go** | 66.8% ± 2.4% | 9.1% ± 2.6% | 8.4% ± 2.7% | 12.3% ± 2.5% | 892 bugs |\n| **C++** | 61.2% ± 2.6% | 5.2% ± 2.1% | 4.8% ± 2.2% | 7.9% ± 2.0% | 1,108 bugs |\n| **Rust** | 59.8% ± 2.7% | 4.1% ± 1.9% | 3.7% ± 2.0% | 6.3% ± 1.8% | 687 bugs |\n\n\u003C\u002Fdiv>\n\n### Debugging Cycle Efficiency\n\n\u003Cdiv align=\"center\">\n\n| Iteration | Chronos Success | GPT-4.1 Success | Time Saved | Cumulative |\n|:---------:|:---------------:|:---------------:|:----------:|:----------:|\n| 1st Attempt | 42.3% | 3.2% | -87% | 42.3% |\n| 2nd Attempt | +16.4% (58.7%) | +1.9% (5.1%) | -83% | 58.7% |\n| 3rd Attempt | +6.6% (65.3%) | +1.7% (6.8%) | -79% | 65.3% |\n| 4th+ Attempts | +2.0% (67.3%) | +1.7% (8.5%) | -74% | 67.3% |\n\n**Note**: Chronos performs more thorough iterations (7.8 avg) vs competitors (1-2 avg)\n\n\u003C\u002Fdiv>\n\n### Context Window Efficiency\n\n\u003Cdiv align=\"center\">\n\n| Model | Context Size | Debug Success | Cost per Bug | Note |\n|:------|:------------:|:-------------:|:------------:|:-----|\n| GPT-4.1 (32K) | 32K tokens | 7.2% | $5.53 | More context ≠ better debugging |\n| Claude 4.1 (200K) | 200K tokens | 9.8% | $4.89 | Attention dilution at scale |\n| Gemini 2.0 Pro (1M) | 1M tokens | 14.3% | $4.25 | Best traditional model |\n| **Chronos** | **Unlimited*** | **71.2%** | **$1.36** | *Via intelligent retrieval |\n\n\u003C\u002Fdiv>\n\n### Ablation Studies\n\n\u003Cdiv align=\"center\">\n\n| Configuration | Debug Success | Precision | Recall | Impact |\n|:--------------|:-------------:|:---------:|:------:|:-------|\n| **Full Chronos** | **67.3%** | **92%** | **85%** | Complete system |\n| w\u002Fo AGR (Flat Retrieval) | 28.7% | 42% | 31% | **-56%** (critical) |\n| w\u002Fo PDM (Static Memory) | 40.1% | 67% | 58% | **-39%** (major) |\n| w\u002Fo Orchestration Loop | 42.5% | 71% | 62% | **-35%** (major) |\n| w\u002Fo Multi-Code Association | 35.8% | 54% | 47% | **-45%** (critical) |\n| w\u002Fo Execution Sandbox | 48.2% | 78% | 69% | **-27%** (significant) |\n\n\u003C\u002Fdiv>\n\n---\n\n## 📖 Documentation\n\n\u003Cdiv align=\"center\">\n\n### Core Documentation\n\n| [Getting Started](docs\u002Fgetting_started.md) | [Architecture](architecture\u002FREADME.md) | [Benchmarks](benchmarks\u002FREADME.md) | [API Reference](docs\u002Fapi_reference.md) |\n|:---:|:---:|:---:|:---:|\n| Quick start guide | System design details | Evaluation methodology | Future API docs |\n\n### Performance & Analysis\n\n| [Performance](performance.md) | [Case Studies](results\u002Fcase_studies\u002F) | [FAQ](docs\u002Ffaq.md) | [Limitations](docs\u002Flimitations.md) |\n|:---:|:---:|:---:|:---:|\n| Detailed metrics | Real-world examples | Common questions | Known constraints |\n\n### Results & Rankings\n\n| [Leaderboard](LEADERBOARD.md) | [Evaluation Results](evaluation\u002Flite\u002F) | [Analysis](results\u002Fanalysis\u002F) | [Benchmarks](benchmarks\u002F) |\n|:---:|:---:|:---:|:---:|\n| Performance rankings | SWE-bench Lite | Statistical analysis | Full test suite |\n\n\u003C\u002Fdiv>\n\n---\n\n## 🤝 Contributing\n\nWe welcome contributions to the evaluation framework and benchmarks!\n\n### How to Contribute\n\n```bash\n# 1. Fork and clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002F[your-username]\u002Fchronos-research.git\ncd chronos-research\n\n# 2. Create a feature branch\ngit checkout -b feature\u002Fyour-contribution\n\n# 3. Make your changes\n# - Add new benchmarks\n# - Improve documentation\n# - Fix bugs in evaluation scripts\n\n# 4. Run tests\npython -m pytest tests\u002F\n\n# 5. Commit your changes\ngit add .\ngit commit -m \"feat: description of your changes\"\n\n# 6. Push and create PR\ngit push origin feature\u002Fyour-contribution\n```\n\n### Contribution Guidelines\n\n- Add tests for new features\n- Follow existing code style\n- Update documentation\n- Add benchmarks for new capabilities\n- Include performance analysis\n\nSee **[CONTRIBUTING.md](CONTRIBUTING.md)** for detailed guidelines.\n\n---\n\n## 📝 Citation\n\nIf you use this research in your work, please cite:\n\n```bibtex\n@article{khan2025chronos,\n  title={Kodezi Chronos: A Debugging-First Language Model for\n         Repository-Scale Code Understanding},\n  author={Khan, Ishraq and Chowdary, Assad and\n          Haseeb, Sharoz and Patel, Urvish and Zaii, Yousuf},\n  journal={arXiv preprint arXiv:2507.12482},\n  year={2025},\n  url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12482},\n  note={State-of-the-art: 80.33\\% on SWE-bench Lite}\n}\n```\n\n---\n\n## 🏢 About Kodezi\n\n\u003Cdiv align=\"center\">\n\n[Kodezi](https:\u002F\u002Fkodezi.com) is building the future of autonomous software maintenance. Our mission is to empower developers with AI that truly understands code at scale.\n\n### Our Products\n\n| Product | Description | Availability |\n|:--------|:------------|:------------:|\n| **[Kodezi Code Web-IDE](https:\u002F\u002Fkodezi.com\u002Fcode)** | AI-powered web-based code editor with real-time debugging | Available Now |\n| **[Kodezi Create](https:\u002F\u002Fkodezi.com\u002Fcreate)** | Generate full applications from natural language | Available Now |\n| **[Kodezi CLI](https:\u002F\u002Fkodezi.com\u002Fcli)** | Command-line interface for automated code analysis and fixes | Available Now |\n| **[Kodezi OS](https:\u002F\u002Fkodezi.com\u002Fos)** | Autonomous software maintenance platform with Chronos integration | Q4 2025 (Beta) |\n| **Chronos** | Debugging-first language model (80.33% SWE-bench Lite) | Via Kodezi OS |\n| **Enterprise API** | API access for teams and enterprise deployment | Q1 2026 |\n\n\u003C\u002Fdiv>\n\n---\n\n## 📧 Contact & Community\n\n\u003Cdiv align=\"center\">\n\n### Connect With Us\n\n[![Website](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-kodezi.com-blue?style=for-the-badge)](https:\u002F\u002Fkodezi.com)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv:2507.12482-red?style=for-the-badge)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12482)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-@Kodezi-1DA1F2?style=for-the-badge&logo=twitter)](https:\u002F\u002Ftwitter.com\u002Fkodezi)\n[![LinkedIn](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinkedIn-Kodezi-0077B5?style=for-the-badge&logo=linkedin)](https:\u002F\u002Flinkedin.com\u002Fcompany\u002Fkodezi)\n[![Email](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEmail-research@kodezi.com-D14836?style=for-the-badge&logo=gmail)](mailto:research@kodezi.com)\n\n### For Enterprise\n\n**Sales**: [sales@kodezi.com](mailto:sales@kodezi.com)\n**Support**: [support@kodezi.com](mailto:support@kodezi.com)\n**Partnerships**: [partnerships@kodezi.com](mailto:partnerships@kodezi.com)\n\n\u003C\u002Fdiv>\n\n---\n\n## 📄 License\n\n© Kodezi Inc. All rights reserved.\nUse is subject to Kodezi's Terms of Service.\n\n### MIT License\n\n**Copyright (c) 2025 Kodezi Inc.**\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and\u002For sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n\n### ⚠️ Important Notice\n\n**This license applies ONLY to the research paper, benchmarks, evaluation frameworks, and documentation contained in this repository.**\n\nThe **Kodezi Chronos model itself is proprietary technology** owned by Kodezi Inc. and is **NOT included** in this repository or covered by this license.\n\n### 📦 What's Included Under MIT License\n\n- **Research Paper**: arXiv publication and markdown versions\n- **Benchmark Suite**: MRR and evaluation frameworks\n- **Evaluation Results**: SWE-bench Lite results and analysis\n- **Documentation**: Architecture docs, guides, and references\n- **Reference Implementations**: Algorithm reference code (NOT the actual model)\n\n### 🔒 Proprietary Components\n\n- **Chronos Model**: NOT included in this repository\n- **Kodezi OS Integration**: Proprietary platform components\n- **Production APIs**: Enterprise deployment infrastructure\n\n### 🚀 Chronos Model Access\n\nThe Chronos model is available exclusively through Kodezi OS:\n- **Q4 2025**: Enterprise beta access\n- **Q1 2026**: General availability\n- **Learn more**: [chronos.so](https:\u002F\u002Fchronos.so)\n- **Early access**: [kodezi.com\u002Fos](https:\u002F\u002Fkodezi.com\u002Fos)\n\n---\n\n\u003Cdiv align=\"center\">\n\n### Research & Resources\n\n**[Join Waitlist →](https:\u002F\u002Fchronos.so)** | **[Read Paper →](https:\u002F\u002Farxiv.org\u002Fabs\u002F2507.12482)** | **[View Results →](LEADERBOARD.md)** | **[Learn More →](https:\u002F\u002Fkodezi.com)**\n\n---\n\n\u003Csub>Last Updated: November 2025 | Version: 2.0.0\u003C\u002Fsub>\n\n\u003C\u002Fdiv>\n","Kodezi Chronos 是一款专注于调试的语言模型，旨在提高代码库规模下的代码理解能力。它在SWE-bench Lite基准测试中取得了80.33%的准确率，并且在实际修复中的准确率达到67%，比GPT-4高出六倍以上。Chronos采用自适应图引导检索和持久调试内存技术，能够显著提升自动调试效率与准确性。该模型特别适合于软件开发过程中的代码理解和错误修复场景，帮助开发者更高效地定位和解决问题。预计2026年第一季度通过Kodezi OS提供访问。",2,"2026-06-11 03:24:46","top_topic"]