[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2037":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":8,"rankLanguage":8,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":15,"lastSyncTime":27,"discoverSource":28},2037,"I-DLM","Introspective-Diffusion\u002FI-DLM","Introspective-Diffusion",null,"Python",148,16,3,1,0,2,5,12,6,3.69,"BSD 3-Clause \"New\" or \"Revised\" License",false,"main",[],"2026-06-12 02:00:36","# I-DLM: Introspective Diffusion Language Models\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"http:\u002F\u002Fintrospective-diffusion.github.io\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHomepage-Project-orange.svg?logo=googlehome\" alt=\"Project\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.11035\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-2604.11035-b31b1b.svg?logo=arXiv\" alt=\"arXiv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fyifanyu\u002Fintrospective-diffusion-language-models-i-dlm\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-Models-blue.svg?logo=huggingface\" alt=\"Hugging Face\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n**The first diffusion LLM to match same-scale AR model quality across 15 benchmarks, while achieving up to 3.8x higher serving throughput at large batch sizes.**\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fteaser.png\" width=\"100%\" alt=\"I-DLM Overview\">\n\u003C\u002Fp>\n\n\u003Cdetails open>\n  \u003Csummary>\u003Cb>Demo: Quality + Speed Comparison\u003C\u002Fb>\u003C\u002Fsummary>\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F5d72f3b7-468c-4087-a8f1-b551ec9622ec\n\n  \u003Cp align=\"center\">\u003Ci>I-DLM generates 3.8x more tokens than SDAR in the same wall-clock time while maintaining equivalent quality.\u003C\u002Fi>\u003C\u002Fp>\n\u003C\u002Fdetails>\n\n---\n\n## News\n\n- **2025-04-12**: Initial code release with training and inference support.\n- **2025-04-12**: Released I-DLM-8B, I-DLM-32B, and I-DLM-8B-LoRA on HuggingFace.\n\n## Highlights\n\n- **AR-quality diffusion LLM** — First diffusion LLM to match same-scale AR model quality across 15 benchmarks (knowledge, math, code, instruction following)\n- **Introspective Strided Decoding (ISD)** — Single-pass generation + verification algorithm with p\u002Fq acceptance criterion that mathematically guarantees AR-distribution output\n- **3.8x throughput over SDAR** — At concurrency=32 on a single H100, I-DLM achieves ~5,900 tok\u002Fs vs SDAR's ~1,600 tok\u002Fs\n- **AR-compatible serving** — Reuses standard AR inference stacks (paged KV cache, continuous batching, CUDA graphs) via SGLang integration\n- **Efficient training** — Only 4.5B tokens on 8 H100 GPUs to convert Qwen3-8B into I-DLM-8B\n\n---\n\n## Results\n\n### Quality (I-DLM-8B vs baselines)\n\n| Benchmark | I-DLM-8B | Qwen3-8B (AR) | LLaDA-2.1-mini (16B) | SDAR-8B |\n|-----------|----------|----------------|----------------------|---------|\n| ARC-C | 95.8 | 95.8 | 90.2 | 91.9 |\n| MMLU | 82.4 | 83.5 | 74.5 | 78.6 |\n| MMLU-Pro | 73.1 | 75.1 | 64.8 | 56.9 |\n| GPQA-D | 55.6 | 58.9 | 46.0 | 40.2 |\n| GPQA | 54.9 | 55.4 | 53.3 | - |\n| GSM8K | 95.0 | 96.0 | 89.0 | 91.7 |\n| MATH-500 | 96.8 | 95.8 | 85.0 | 78.6 |\n| MathBench | 89.1 | 93.1 | 84.2 | 76.9 |\n| AIME-24 | 69.6 | 73.1 | 43.3 | 10.0 |\n| AIME-25 | 60.8 | 65.4 | 43.3 | 10.0 |\n| HumanEval | 93.3 | 95.1 | 86.0 | 78.7 |\n| MBPP | 92.2 | 93.4 | 82.1 | 72.0 |\n| LiveCodeBench-v6 | 45.7 | 50.3 | 30.4 | 16.6 |\n| IFEval | 84.7 | 84.7 | 83.2 | 61.4 |\n\n### Serving Throughput (Single H100, SGLang)\n\n| Concurrency | I-DLM-8B (tok\u002Fs\u002Freq) | LLaDA-2.1-mini | SDAR-8B |\n|-------------|----------------------|----------------|---------|\n| C=32 | 186-193 | 51-86 | 43-52 |\n| C=64 | 124-125 | 28-57 | 27-28 |\n\n---\n\n## Model Zoo\n\n| Model | HuggingFace | Description |\n|-------|-------------|-------------|\n| I-DLM-8B | [yifanyu\u002FI-DLM-8B](https:\u002F\u002Fhuggingface.co\u002Fyifanyu\u002FI-DLM-8B) | Converted from Qwen3-8B |\n| I-DLM-32B | [yifanyu\u002FI-DLM-32B](https:\u002F\u002Fhuggingface.co\u002Fyifanyu\u002FI-DLM-32B) | Converted from Qwen3-32B |\n| I-DLM-8B-LoRA | [yifanyu\u002FI-DLM-8B-lora-r128](https:\u002F\u002Fhuggingface.co\u002Fyifanyu\u002FI-DLM-8B-lora-r128) | LoRA adapter (rank=128) for lossless R-ISD |\n\n## Quick Start\n\n### Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FIntrospective-Diffusion\u002FI-DLM.git\ncd introspective-dlm\u002Finference\nbash install.sh\n```\n\n### Launch Server\n\n```bash\npython -m sglang.launch_server \\\n    --model-path yifanyu\u002FI-DLM-8B \\\n    --trust-remote-code --tp-size 1 --dtype bfloat16 \\\n    --mem-fraction-static 0.85 --max-running-requests 32 \\\n    --attention-backend flashinfer --dllm-algorithm IDLMBlockN \\\n    --dllm-algorithm-config inference\u002Fconfigs\u002Fidlm_blockN4_config.yaml \\\n    --port 30000\n```\n\n### Generate\n\n```bash\ncurl http:\u002F\u002Flocalhost:30000\u002Fv1\u002Fchat\u002Fcompletions \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\n    \"model\": \"default\",\n    \"messages\": [{\"role\": \"user\", \"content\": \"Prove that sqrt(2) is irrational.\"}],\n    \"max_tokens\": 4096,\n    \"temperature\": 1.0\n  }'\n```\n\nSee [inference\u002FREADME.md](inference\u002FREADME.md) for detailed setup, evaluation, and benchmarking.\n\n---\n\n## Method\n\n### Key Insight: Introspective Consistency\n\nAR models inherently agree with their own generations (introspective acceptance rate ~0.98). Standard diffusion LMs with bidirectional attention lack this property (~0.57-0.70). I-DLM recovers it through:\n\n1. **Strict causal masking** across both masked and clean tokens\n2. **Logit shift** (Dream shift): hidden state at position *i* predicts token *i*+1\n3. **All-masked training**: CE loss on both noisy (masked) and clean token positions\n\n### Training\n\nInput construction: concatenate fully-masked sequence with clean sequence `[x_t | x_0]`, apply strict causal attention uniformly, and compute CE loss on all non-padding positions.\n\n```\nL = CE_noisy + alpha * CE_clean(clean region with shifted labels)\n```\n\nSee [training\u002FREADME.md](training\u002FREADME.md) for setup and usage.\n\n### Inference: Introspective Strided Decoding (ISD)\n\nEach forward pass simultaneously:\n- **Generates** N new tokens from MASK positions (proposal distribution *q*)\n- **Verifies** previously generated tokens now visible as clean positions (anchor distribution *p*)\n\nAcceptance via `min(1, p(x)\u002Fq(x))` guarantees output matches the base AR distribution.\n\nSee [inference\u002FREADME.md](inference\u002FREADME.md) for details.\n\n---\n\n## Repository Structure\n\n```\nintrospective-dlm\u002F\n├── training\u002F                  # Training code and configs\n│   ├── README.md\n│   ├── run_train_b*-allmasked_idlm_sample.sh\n│   ├── model\u002F                 # Model configs\n│   └── llama_factory_sdar\u002F    # Modified LlamaFactory framework\n├── inference\u002F                 # Inference and serving via SGLang\n│   ├── README.md\n│   ├── configs\u002F               # Algorithm config YAMLs\n│   ├── eval\u002F                  # Evaluation scripts\n│   └── sglang\u002F                # SGLang integration code\n└── README.md\n```\n\n## Acknowledgments\n\nThis project builds upon:\n- [LLaMA-Factory](https:\u002F\u002Fgithub.com\u002Fhiyouga\u002FLLaMA-Factory) for training\n- [SDAR](https:\u002F\u002Fgithub.com\u002FJetAstra\u002FSDAR) for model architecture\n- [SGLang](https:\u002F\u002Fgithub.com\u002Fsgl-project\u002Fsglang) for inference and serving\n\n## Citation\n\n```bibtex\n@article{yu2026introspective,\n  title={Introspective Diffusion Language Models},\n  author={Yu, Yifan and Jian, Yuqing and Wang, Junxiong and Zhou, Zhongzhu\n          and Zhuang, Donglin and Fang, Xinyu and Yanamandra, Sri\n          and Wu, Xiaoxia and Wu, Qingyang and Song, Shuaiwen Leon\n          and Dao, Tri and Athiwaratkun, Ben and Zou, James\n          and Lai, Fan and Xu, Chenfeng},\n  journal={arXiv preprint arXiv:2604.11035},\n  year={2026}\n}\n```\n\n## License\n\nBSD 3-Clause License. See [LICENSE](LICENSE) for details.\n","I-DLM 是一种扩散语言模型，首次在15个基准测试中达到与同规模自回归模型相当的质量，同时在大批次处理时实现高达3.8倍的服务吞吐量。该项目采用Python开发，通过Introspective Strided Decoding (ISD)算法，在单次生成过程中加入验证步骤，确保输出符合自回归分布，从而在保持高质量的同时显著提高生成速度。适合需要高效文本生成的应用场景，如大规模内容创作、代码辅助生成等。项目已在Hugging Face上发布了多个版本，并提供了训练和推理的支持。","2026-06-11 02:47:42","CREATED_QUERY"]