[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2246":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":15,"starSnapshotCount":15,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},2246,"DMax","czg1225\u002FDMax","czg1225","DMax: Aggressive Parallel Decoding for dLLMs","https:\u002F\u002Farxiv.org\u002Fpdf\u002F2604.08302",null,"Python",126,7,1,0,4,14,2.71,"Apache License 2.0",false,"main",[23,24,25,26,27,28],"acceleration","diffusion-language-models","efficiency","large-language-models","multi-token-prediction","parallel-decoding","2026-06-12 02:00:39","\u003Cdiv align=\"center\">\n\n# 🚀 DMax: Aggressive Parallel Decoding for dLLMs\n\n\u003Cp>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fczg1225\u002FDMax\u002Fblob\u002Fmain\u002FLICENSE\">\n    \u003Cimg alt=\"Apache\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache-4E94CE.svg\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.08302\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-darkred.svg\" alt=\"Paper\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FZigeng\u002Fdmax-models\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-Models-FFB000.svg\" alt=\"Project\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FZigeng\u002Fdmax-training-data\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-Dataset-FFB000.svg\" alt=\"Project\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp>\u003Cstrong>DMax is a new dLLM paradigm achieving aggressive parallel decoding while preserving generation quality.\u003C\u002Fstrong>\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4856fa9e-9dae-41b7-9716-568f36a0f638\n\n> **DMax: Aggressive Parallel Decoding for dLLMs**  \n> [Zigeng Chen](https:\u002F\u002Fczg1225.github.io\u002Fchenzigeng99\u002F), [Gongfan Fang](https:\u002F\u002Ffangggf.github.io\u002F), [Xinyin Ma](https:\u002F\u002Fhorseee.github.io\u002F), [Ruonan Yu](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=UHP95egAAAAJ&hl=en), [Xinchao Wang](https:\u002F\u002Fsites.google.com\u002Fsite\u002Fsitexinchaowang\u002F)  \n> [xML Lab](https:\u002F\u002Fsites.google.com\u002Fview\u002Fxml-nus), National University of Singapore \\\n> Paper [Arxiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.08302)\n\n---\n\n\u003Ca id=\"updates\">\u003C\u002Fa>\n\n## ⭐ Updates\n\n- **[May 25, 2026]**: Our latest model, **DMax-16B**, is now available. It is a highly parallel, general-purpose dLLM that delivers superior efficiency across math, code, and general-purpose tasks. To run inference or evaluation, simply set the model path to `Zigeng\u002FDMax-16B`.\n- **[April 10, 2026]**: Our Arxiv paper is available now.\n- **[April 10, 2026]**: Code, model and dataset are released.\n\n---\n\n\u003Ca id=\"highliths\">\u003C\u002Fa>\n\n## 💪 Highlights\n\n- **Aggressive Decoding Parallelism**: Achieves 6.0 TPF on math and reasoning tasks and 6.6 TPF on code tasks while preserving accuracy.\n- **Self-Revising dLLM**: Extends a pretrained MDLM into a UDLM with an intrinsic ability to revise its own erroneous predictions during decoding.\n- **Soft Parallel Decoding**: Uses interpolation between mask and token embeddings to propagate confidence priors from previous steps.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Ftradeoff.png\" width=\"93%\" \u002F>\n  \u003Cbr>\n  \u003Cem>Superior Parallelism-Accuracy Trade-off, Increased TPF with Maintained Accuracy.\u003C\u002Fem>\n\u003C\u002Fdiv>\n\n---\n\n## 📚 Table of Contents\n\n- [💡 Introduction](#introduction)\n- [💻 Model and Datasets](#model-and-datasets)\n- [🚀 Quick Start](#quick-start)\n- [🔧 Installation](#installation)\n- [🔥 Training](#training)\n- [⚡ Evaluation](#evaluation)\n- [🔍 Decoding Process Visualization](#decoding-process-visualization)\n- [☀️ Acknowledgement](#acknowledgement)\n- [📖 Citation](#citations)\n\n---\n\n\u003Ca id=\"introduction\">\u003C\u002Fa>\n\n## 💡 Introduction\n\nWe present DMax, a new paradigm for efficient dLLMs. It mitigates error accumulation in parallel decoding, enabling aggressive decoding parallelism while preserving generation quality. Unlike conventional masked dLLMs that decode through a binary mask-to-token transition, DMax reformulates decoding as a progressive self-refinement from mask embeddings to token embeddings. At the core of our approach is On-Policy Uniform Training, a novel training strategy that efficiently unifies masked and uniform dLLMs, equipping the model to recover clean tokens from both masked inputs and its own erroneous predictions. Building on this foundation, we further intoduce Soft Parallel Decoding. Extensive experiments across a variety of benchmarks demonstrate the effectiveness of DMax.\n\n\u003C!-- ![figure](assets\u002Fintro.png) -->\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Ftrain.png\" width=\"90%\" \u002F>\n  \u003Cbr>\n  \u003Cem>Overview of the On-Policy Uniform Training.\u003C\u002Fem>\n\u003C\u002Fdiv>\n\n---\n\n\u003Ca id=\"model-and-datasets\">\u003C\u002Fa>\n\n## 💻 Model and Datasets\n\n| Model | Description | Source Model | Link |\n| --- | --- | --- | --- |\n| 🤖 DMax-16B | Highly parallel general-purpose dLLM. | LLaDA-2.0-mini | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FZigeng\u002FDMax-16B)  |\n| 🤖 DMax-Math-16B | Highly parallel dLLM for math and reasoning. | LLaDA-2.0-mini | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FZigeng\u002FDMax-Math-16B) |\n| 🤖 DMax-Coder-16B | Highly parallel dLLM for code generation. | LLaDA-2.0-mini | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FZigeng\u002FDMax-Coder-16B) |\n\n| Dataset | Description | Link |\n| --- | --- | --- |\n| 📊 DMax-Math-Training-Data | Trajectories on math problems generated by LLaDA-2.0-mini | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FZigeng\u002FDMax-LLaDA-2.0-Mini-Math-Trajectories) |\n| 📊 DMax-Code-Training-Data | Trajectories on code problems generated by LLaDA-2.0-mini | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FZigeng\u002FDMax-LLaDA-2.0-Mini-Code-Trajectories) |\n\n---\n\n\n\u003Ca id=\"quick-start\">\u003C\u002Fa>\n\n## 🚀 Quick Start\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM\nfrom transformers import AutoTokenizer\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"Zigeng\u002FDMax-16B\", trust_remote_code=True, device_map=\"cuda:0\"\n)\nmodel = model.to(torch.bfloat16)\nmodel.eval()\ntokenizer = AutoTokenizer.from_pretrained(\"Zigeng\u002FDMax-16B\", trust_remote_code=True)\n\nprompt = \"A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take?\" + \"\\nLet's think step by step\\n\"\n\ninput_ids = tokenizer.apply_chat_template(\n    [{\"role\": \"user\", \"content\": prompt}],\n    add_generation_prompt=True,\n    tokenize=True,\n    return_tensors=\"pt\",\n)\n\nnfe, generated_tokens = model.generate_spd(\n    inputs=input_ids,\n    gen_length=2048,\n    block_length=32,\n    threshold=0.0,\n)\n\ngenerated_answer = tokenizer.decode(\n    generated_tokens[0],\n    skip_special_tokens=True,\n)\n\nprint(generated_answer)\nprint(\"nfe:\",nfe,\"token length\",len(generated_tokens[0]))\n```\n\n---\n\n\u003Ca id=\"installation\">\u003C\u002Fa>\n\n## 🔧 Installation\n\n1. Clone the **DMax** reposity\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fczg1225\u002FDMax.git --recursive\ncd DMax\n```\n\n2. Install **dFactory** environment for training:\n\n```bash\ncd dFactory\nconda create -n dFactory python==3.11\nconda activate dFactory\npip install -e VeOmni\u002F\n```\n\n3. Install **dInfer** environment for efficient evaluation:\n\n```bash\ncd dInfer\nconda create -n dInfer python==3.11\nconda activate dInfer\npip install .\npip install sglang==0.5.3.post1\npip install vllm==0.10.2\n```\n\n---\n\n\u003Ca id=\"training\">\u003C\u002Fa>\n\n## 🔥 Training\n\nOur training scripts is based on the dFactory reposity.\n\n```bash\ncd dFactory\n```\n\n### 1. Download and Merge Model Weights\n\nThe training scripts require model weights in a \"merged-expert\" format for optimal performance. Before starting, you must download the standard weights and convert them.\n\n**Download the original model:** Follow the helper script to download the weights from the Hugging Face Hub.\n\n```bash\n# Choose a destination for the original model files\npython scripts\u002Fdownload_hf_model.py \\\n  --repo_id inclusionAI\u002FLLaDA2.0-mini \\\n  --local_dir \u002Fpath\u002Fto\u002Fseparate_expert_model\n```\n\n**Convert to the merged format:** Run the following script to create the merged checkpoint required for training.\n\n```bash\n# Use the path from the previous step as the source\npython scripts\u002Fmoe_convertor.py \\\n  --input-path \u002Fpath\u002Fto\u002Fseparate_expert_model \\\n  --output-path \u002Fpath\u002Fto\u002Fsave\u002Fmerged_model \\\n  --mode merge\n```\n\n### 2. Prepare Training Data\n\nBefore training, the dataset must be converted into the conversational format expected by our training pipeline. The script below transforms the original `\"question\"` and `\"answer\"` fields into a `\"messages\"` field. Run the following command to perform the conversion.\n\n```bash\n#prepare the math and reasoning training data\npython scripts\u002Fbuild_dataset_oput.py --dataset_path Zigeng\u002FDMax-LLaDA-2.0-Mini-Math-Trajectories\n# or prepare the code training data\npython scripts\u002Fbuild_dataset_oput.py --dataset_path Zigeng\u002FDMax-LLaDA-2.0-Mini-Code-Trajectories\n```\n\n### 3. Modify Training Configs\n\nEdit `configs\u002Fsft\u002Fllada2_mini_bd_oput.yaml`:\n\n```yaml\nmodel:\n  model_path: \"\u002Fpath\u002Fto\u002Fsave\u002Fmerged_model\"\ndata:\n  train_path: \"\u002Fyour\u002Fdata\u002Fpath\"\ntrain:\n  output_dir: \"\u002Fyour\u002Foutput\u002Fpath\"\n```\n\n### 4. Run Training\n\nOnce all preparation steps are finished, you can launch the fine-tuning process with the following command.  \nThe default configuration uses distributed training across 8 GPUs.\n\n```bash\nPYTHONPATH=$(pwd)\u002FVeOmni:$PYTHONPATH sh train.sh tasks\u002Ftrain_llada2_bd_oput.py configs\u002Fsft\u002Fllada2_mini_bd_oput.yaml\n```\n\n### 5. Interact with the Trained Model\n\nTo interact with a trained model, complete the following two steps:\n\n#### Step 1: Convert the Checkpoint\n\nFirst, convert the checkpoint from the merged format used during training back to the standard Mixture-of-Experts (MoE) format.\n\n> **Note:** the `--input-path` should point to the saved Hugging Face checkpoint, **not** the root output directory specified during training. The checkpoint is typically located in a subdirectory such as:\n`TRAIN_OUTPUT_DIR\u002Fcheckpoints\u002Fglobal_step_XXX\u002Fhf_ckpt\u002F`\n\nRun the following command to perform the conversion:\n\n```bash\npython scripts\u002Fmoe_convertor.py \\\n  --input-path \u002Fpath\u002Fto\u002Fmerged_model \\\n  --output-path \u002Fpath\u002Fto\u002Fsave\u002Fseparate_expert_model \\\n  --mode split\n```\n\n**Step 2: Copy the Modeling File**\n\nAfter the conversion, a final manual step is required. You must copy the DMax model's architecture file (`modeling_llada2_moe.py` and `configuration_llada2_moe`) into the newly created separate_expert_model directory. This file must come from the directory of your local saved DMax model. The training and conversion processes only update the model weights, not the architecture file, which is why the DMax version is needed.\n\n```bash\ncp \u002Fpath\u002Fto\u002Flocal_saved_DMax_model\u002Fmodeling_llada2_moe.py \u002Fpath\u002Fto\u002Fsave\u002Fseparate_expert_model\u002F\ncp \u002Fpath\u002Fto\u002Flocal_saved_DMax_model\u002Fconfiguration_llada2_moe.py \u002Fpath\u002Fto\u002Fsave\u002Fseparate_expert_model\u002F\n```\n\nWith the model converted and the modeling file in place, you are now ready to chat!\n\n---\n\n\u003Ca id=\"evaluation\">\u003C\u002Fa>\n\n## ⚡ Evaluation\n\nOur training scripts is based on the dInfer reposity.\n```bash\ncd dInfer\u002Fevaluations\n```\n\nDownload the DMax model: Follow the helper script to download the weights from the Hugging Face Hub.\n```bash\n# Choose a destination for the original model files\npython download_hf_model.py \\\n  --repo_id Zigeng\u002FDMax-Math-16B \\\n  --local_dir \u002Fpath\u002Fto\u002Flocal_saved_model\n```\n\n### 1. Evaluation on Math & Reasoning Benchmarks\n\nWe provide evaluation scripts for several math and reasoning benchmarks. Run the following command to launch the evaluation. You may modify the inference settings in `eval_llada_dmax_math.sh` as needed. Before running the script, please set `model_path` to the path of your locally saved model.\n\nThe current evaluation suite supports four benchmarks:\n\n- ✅ `GSM8K`\n- ✅ `MATH500`\n- ✅ `Minerva_Algebra`\n- ✅ `ASDIV`\n\n```bash\nbash eval_llada_dmax_math.sh\n```\n\nAfter generation, run the following scripts to extract answers from the generated responses and evaluate accuracy against the ground-truth labels.\n\n```bash\npython val_gsm8k.py       # postprocess and calculate accuracy on GSM8K\npython val_math.py        # postprocess and calculate accuracy on MATH500\npython val_algebra.py     # postprocess and calculate accuracy on Minerva_Algebra\npython val_asdiv.py       # postprocess and calculate accuracy on ASDIV\n```\n\n### 2. Evaluation on Code Benchmarks\n\nWe also provide evaluation scripts for code generation benchmarks. Run the following command to start the evaluation. You may modify the inference settings in `eval_llada_dmax_code.sh` as needed. Before running the script, please set `model_path` to the path of your locally saved model.\n\nThe current evaluation suite supports the following four benchmarks:\n\n- ✅ `HumanEval_Instruct`\n- ✅ `MBPP_Instruct`\n- ✅ `HumanEval_Instruct_Plus`\n- ✅ `MBPP_Instruct_Plus`\n\n```bash\nbash eval_llada_dmax_code.sh\n```\n\n---\n\n\u003Ca id=\"decoding-process-visualization\">\u003C\u002Fa>\n\n## 🔍 Decoding Process Visualization\n\nWe provide a script for visualizing the full decoding process. Run `demo.py` to generate an HTML file named `dllm_demo.html`.Then open this file in Chrome to view the decoding visualization.\n\n```bash\npython demo.py\n```\n\n![demo](assets\u002Fdemo.png)\n\n\n---\n\n\u003Ca id=\"acknowledgement\">\u003C\u002Fa>\n\n## ☀️ Acknowledgement\n\nOur code builds on [dFactory](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FdFactory), [dInfer](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FdInfer), and we acknowledge these great works for laying the groundwork that made our approach possible.\n\n---\n\n\u003Ca id=\"citations\">\u003C\u002Fa>\n\n## 📚 Citation\nIf our research assists your work, please give us a star ⭐ or cite us using:\n```\n@article{chen2026dmax,\n  title={DMax: Aggressive Parallel Decoding for dLLMs},\n  author={Chen, Zigeng and Fang, Gongfan and Ma, Xinyin and Yu, Ruonan and Wang, Xinchao},\n  journal={arXiv preprint arXiv:2604.08302},\n  year={2026}\n}\n```","DMax 是一种新的高效解码范式，旨在为扩散语言模型（dLLMs）实现激进的并行解码同时保持生成质量。其核心功能包括通过自修正机制在解码过程中修正错误预测、使用软并行解码技术传播先前步骤的信心先验值，从而在数学和代码任务上分别达到6.0 TPF和6.6 TPF的同时保持准确性。DMax 适合需要高效率与高质量文本生成的应用场景，如大规模语言模型推理、代码生成及数学问题求解等。此项目采用Python编写，并遵循Apache License 2.0开源许可协议。",2,"2026-06-11 02:49:05","CREATED_QUERY"]