[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72464":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},72464,"dllm","ZHZisZZ\u002Fdllm","ZHZisZZ","dLLM: Simple Diffusion Language Modeling","https:\u002F\u002Farxiv.org\u002Fpdf\u002F2602.22661",null,"Python",2559,267,17,14,0,6,13,70,18,29.28,"Apache License 2.0",false,"main",[26,27,28],"discrete-diffusion-models","llm","nlp","2026-06-12 02:03:03","\u003Ch1 align=\"center\">dLLM\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\nSimple Diffusion Language Modeling\n\u003C\u002Fp>\n\n\u003C!-- \u003Cdiv align=\"center\">\n\n[![Report](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-B31B1B?style=for-the-badge&logo=arxiv&logoColor=white)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.22661)\n[![Models](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-yellow?style=for-the-badge&logo=huggingface&logoColor=white)](https:\u002F\u002Fhuggingface.co\u002Fdllm-hub)\n\n\u003C\u002Fdiv> -->\n\n\u003Cp align=\"center\">\n    📃 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2602.22661\" target=\"_blank\">Report\u003C\u002Fa> | 🤗 \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdllm-hub\" target=\"_blank\">Models\u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n\u003Cimg\n  src=\"assets\u002Flogo.gif\"\n  alt=\"dLLM logo\">\n\u003C\u002Fp>\n\n## Overview\n**dLLM** is a library that unifies the training and evaluation of **diffusion language models**, bringing transparency and reproducibility to the entire development pipeline:\n\n- dLLM provides scalable training pipelines (based on [`transformers`](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fblob\u002Fmain\u002Fsrc\u002Ftransformers) [Trainer](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers\u002Fblob\u002Fmain\u002Fsrc\u002Ftransformers\u002Ftrainer.py)), with support for [LoRA](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fpeft), [DeepSpeed](https:\u002F\u002Fgithub.com\u002Fdeepspeedai\u002FDeepSpeed), [FSDP](https:\u002F\u002Fpytorch.org\u002Fblog\u002Fintroducing-pytorch-fully-sharded-data-parallel-api\u002F) and beyond.\n\n- dLLM provides unified evaluation pipelines (based on [`lm-evaluation-harness`](https:\u002F\u002Fgithub.com\u002FEleutherAI\u002Flm-evaluation-harness)) that abstracts away inference details and making customization simple.\n\n- Built on these components, dLLM provides the minimal **training \u002F inference \u002F evaluation** recipes for open-weight models (e.g., [LLaDA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09992) and [Dream](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.15487)), and implementations of training algorithms (e.g., [MDLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.07524) (masked diffusion), [BD3LM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09573) (block diffusion), [Edit Flows](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09018) and so on).\n\n\u003C!-- > [!NOTE]\n> This repository is primarily for educational purposes and does not aim for 100% exact reproduction of official models (which is impossible). We hope it serves as a helpful reference for the community — contributions and improvements are always welcome! -->\n\n\n## News\n\n\u003C!-- **[2026\u002F02]** 📄 Checkout our **[`technical report`](assets\u002FdLLM.pdf)**! -->\n\n**[2026\u002F04] 🎯 [`diffu-GRPO`](https:\u002F\u002Fgithub.com\u002Fdllm-reasoning\u002Fd1)**: We support diffu-GRPO training for masked diffusion language models, validated on [LLaDA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09992) and [Tiny-A2D](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fdllm-hub\u002Ftiny-a2d) across five reasoning tasks (GSM8K, MATH, Countdown, Sudoku, Code). See [`examples\u002Frl`](\u002Fexamples\u002Frl#grpo) for training instructions.\n\n**[2026\u002F02] ⚡[`Fast-dLLM`](https:\u002F\u002Fgithub.com\u002FNVlabs\u002FFast-dLLM)**: We support accelerated inference and evaluation of  [LLaDA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09992) and [Dream](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.15487) with [Fast-dLLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22618) (cache, confidence-threshold decoding). See [`examples\u002Ffastdllm`](\u002Fexamples\u002Ffastdllm) for inference \u002F evaluation instructions.\n\n**[2025\u002F12] 🤗[`Tiny-A2D`](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fdllm-hub\u002Ftiny-a2d)**: We released a collection of **SOTA** small (0.5B\u002F0.6B) diffusion models adapted from AR models, with fully open recipes for converting **ANY** AR model (e.g., Qwen, LLaMA, and GPT-2) into a diffusion model. See [`examples\u002Fa2d`](\u002Fexamples\u002Fa2d) for training \u002F inference \u002F evaluation instructions.\n\n**[2025\u002F11] 🤗[`BERT-Chat`](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fdllm-hub\u002Fbert-chat)**: We released a collection of BERTs finetuned to chat with diffusion, with open recipes for turning **ANY** BERT encoder (e.g., BERT, RoBERTa, ModernBERT) into a diffusion model. See [`examples\u002Fbert`](\u002Fexamples\u002Fbert) for training \u002F inference \u002F evaluation instructions.\n\n\n## Table of Contents\n- [Features](#features)\n- [Setup](#setup)\n- [Files](#files)\n- [Training](#training)\n- [Inference](#inference)\n- [Evaluation](#evaluation)\n- [Citation](#citation)\n\n\n## Features\n- [`examples\u002Fllada`](\u002Fexamples\u002Fllada): Pretraining, finetuning and evaluating [LLaDA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09992) \u002F [LLaDA-MoE](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.24389).\n- [`examples\u002Fllada2`](\u002Fexamples\u002Fllada2): Inference of [LLaDA2.0](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.15745).\n- [`examples\u002Fllada21`](\u002Fexamples\u002Fllada21): Inference of [LLaDA2.1](https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.08676).\n- [`examples\u002Fdream`](\u002Fexamples\u002Fdream): Pretraining, finetuning and evaluating [Dream](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.15487).\n- [`examples\u002Fa2d`](\u002Fexamples\u002Fa2d): Finetuning any autoregressive model to generate text with [masked diffusion](https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.07524) \u002F [block diffusion](https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09573).\n- [`examples\u002Fbert`](\u002Fexamples\u002Fbert): Finetuning any [BERT](https:\u002F\u002Farxiv.org\u002Fabs\u002F1810.04805) to be lightweight Chatbots.\n    \u003C!-- \u003Cdetails>\n    \u003Csummary>🎬 Click to show BERT-Chat Demo\u003C\u002Fsummary>\n\n    \u003Cp align=\"center\">\n        \u003Cimg src=\"\u002Fexamples\u002Fbert\u002Fassets\u002Fchat.gif\" alt=\"chat\" width=\"80%\">\n    \u003C\u002Fp>\n    \u003Cp align=\"center\">\n    \u003Cem>\n        Chat with \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdllm-hub\u002FModernBERT-large-chat-v0.1\">\u003Ccode>ModernBERT-large-chat-v0.1\u003C\u002Fcode>\u003C\u002Fa>. See \u003Ca href=\"\u002Fexamples\u002Fbert\u002FREADME.md#inference\">Inference\u003C\u002Fa> for details.\n    \u003C\u002Fem>\n    \u003C\u002Fp>\n    \u003C\u002Fdetails> -->\n- [`examples\u002Feditflow`](\u002Fexamples\u002Feditflow): Educational reference for training [Edit Flows](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.09018) models, demonstrating how to extend existing DLLMs (e.g., LLaDA, Dream, BERT-Chat) with *edit operations*—insertion, deletion, and substitution—and how to pretrain or finetune Edit Flows models from scratch on public data.\n   \u003C!-- \u003Cdetails>\n   \u003Csummary>🎬 Click to show EditFlow Demo\u003C\u002Fsummary>\n\n   \u003Cp align=\"center\">\n     \u003Cimg src=\"\u002Fexamples\u002Feditflow\u002Fassets\u002Fall.gif\" alt=\"EditFlow demo\" width=\"100%\">\n   \u003C\u002Fp>\n   \u003Cp align=\"center\">\u003Cem>EditFlow performing insertion (blue), substitution from mask tokens (black), substitution from non-mask tokens (red), and deletion (strikethrough → removed) during sampling.\u003C\u002Fem>\u003C\u002Fp>\n\n   \u003C\u002Fdetails> -->\n- [`examples\u002Ffastdllm`](\u002Fexamples\u002Ffastdllm): Inferencing and evaluating [LLaDA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09992) and [Dream](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.15487) with [Fast-dLLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22618) (cache, confidence-threshold decoding, and beyond).\n- [`examples\u002Frl`](\u002Fexamples\u002Frl): [GRPO](https:\u002F\u002Fgithub.com\u002Fdllm-reasoning\u002Fd1) training for [LLaDA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09992) and [Tiny-A2D](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fdllm-hub\u002Ftiny-a2d) diffusion language models across reasoning tasks (GSM8K, MATH, Countdown, Sudoku, Code).\n- More upcoming.\n\n\n## Setup\n### Installation\n```bash\n# create and activate conda environment\nconda create -n dllm python=3.10 -y\nconda activate dllm\n\n# install pytorch with CUDA 12.4 (other pytorch\u002Fcuda versions should also work)\nconda install cuda=12.4 -c nvidia\npip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \\\n    --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n\n# install dllm package\npip install -e .\n```\n### (optional) Evaluation setup\n\n```bash\n# initialize `lm-evaluation-harness` submodule\ngit submodule update --init --recursive\n\n# install submodule in editable mode with IFEval & Math dependencies\npip install -e \"lm-evaluation-harness[ifeval,math]\"\n```\n\n### (optional) Slurm setup\nFor [Slurm](https:\u002F\u002Fslurm.schedmd.com\u002F) users, update [`scripts\u002Ftrain.slurm.sh`](\u002Fscripts\u002Ftrain.slurm.sh) for your cluster:\n```diff\n- #SBATCH --partition=mllm_safety # Note: adjust this for your cluster\n- #SBATCH --quotatype=spot        # Note: adjust this for your cluster\n+ #SBATCH --partition=YOUR_PARTITION\n+ #SBATCH --quotatype=YOUR_QUOTATYPE\n```\nNext, create a directory for your job logs:\n```shell\nmkdir .logs\n```\nThis folder will store the log files generated by your sbatch jobs.\n\n## Files\n```\n# modules for training \u002F sampling\ndllm\n├── core                   # Core reusable modules shared across `dllm\u002Fpipelines` \n│   ├── samplers\n│   ├── schedulers\n│   └── trainers\n├── data\n├── pipelines              # Application-specific training & inference pipelines\n│   ├── a2d\n│   ├── bert\n│   ├── dream\n│   ├── editflow\n│   ├── fastdllm\n│   ├── llada\n│   │   ├── models         # Model architecture and configs \n│   │   ├── sampler.py     # Inference module\n│   │   ├── trainer.py     # Training module\n│   │   └── eval.py        # Evaluation module\n│   ├── llada2\n│   ├── llada21\n│   └── rl\n├── tools\n└── utils\n\n# entry points for training \u002F sampling\nexamples\n├── a2d\n├── bert\n├── dream\n├── editflow\n├── fastdllm\n├── llada\n│   ├── chat.py            # Interactive inference example\n│   ├── sample.py          # Inference example\n│   ├── pt.py              # Pretraining example\n│   ├── README.md          # Documentation\n│   ├── sft.py             # Supervised finetuning example\n│   └── eval.sh            # Evaluation script\n├── llada2\n├── llada21\n└── rl\n```\n\n## Training\n\nA typical training entry script (for example, [`examples\u002Fllada\u002Fsft.py`](\u002Fexamples\u002Fllada\u002Fsft.py)) looks like this:\n```python\nimport transformers\n\nimport dllm\n\nmodel_args, data_args, training_args = parser.parse_args_into_dataclasses()\n# ----- Model ------------------------------------------------------------------\nmodel = dllm.utils.get_model(model_args=model_args)\n# ----- Tokenizer --------------------------------------------------------------\ntokenizer = dllm.utils.get_tokenizer(model_args=model_args)\n# ----- Dataset ----------------------------------------------------------------\ndataset = \"...\"\n\n# ----- Training --------------------------------------------------------------\ntrainer = dllm.core.trainers.MDLMTrainer(\n    model=model,\n    tokenizer=tokenizer,\n    train_dataset=dataset[\"train\"],\n    eval_dataset=dataset[\"test\"],\n    args=training_args,\n    data_collator=transformers.DataCollatorForSeq2Seq(\n        tokenizer,\n        return_tensors=\"pt\",\n        padding=True,\n        label_pad_token_id=tokenizer.pad_token_id, \n    ),\n)\ntrainer.train()\n```\n\nYou can launch training job locally with `accelerate`, or submit it to a [Slurm](https:\u002F\u002Fslurm.schedmd.com\u002F) cluster using `sbatch`.\n```shell\n# Run locally (ZeRO-2 on 8 GPUs with 4bit quantization and LoRA)\naccelerate launch \\\n    --config_file scripts\u002Faccelerate_configs\u002Fzero2.yaml \\\n    examples\u002Fllada\u002Fsft.py \\\n    --num_train_epochs 4 \\\n    --load_in_4bit True --lora True\n```\n```shell\n# Submit to a Slurm cluster (FSDP on 1 node, 8 GPUs)\nsbatch --gres=gpu:8 scripts\u002Ftrain.slurm.sh \\\n    --accelerate_config \"fsdp\" \\\n    --script_path \"examples\u002Fllada\u002Fsft.py\" \\\n    --num_train_epochs 4\n\n# Submit to a Slurm cluster (FSDP on 2 nodes, 16 GPUs)\nsbatch --nodes=2 --gres=gpu:8 scripts\u002Ftrain.slurm.sh \\\n    --accelerate_config \"fsdp\" \\\n    --script_path \"examples\u002Fllada\u002Fsft.py\" \\\n    --num_train_epochs 4\n```\nSee [Features](#features) for specific training recipes.\n\n\n\u003C!-- Here are some useful tips for training: -->\n#### Useful tips for training:\n- Use a subset of data:\n`--dataset_args \"allenai\u002Ftulu-3-sft-mixture[train:10000,test:1000]\"`\n- Concatenate datasets:\n`--dataset_args \"allenai\u002Ftulu-3-sft-mixture+HuggingFaceTB\u002Fsmoltalk\"`\n- Train with LoRA and 4bit quantization:\n`--load_in_4bit True --lora True`\n- Train with different distributed training methods:\n`--accelerate_config \"ddp,zero-{1,2,3},fsdp\"`\n- Load pretraining dataset in streaming mode:\n`--streaming True`\n- Preprocess SFT dataset before training (e.g., LLaDA):\n  \u003C!-- ```shell\n  # Preprocess SFT data\n  python dllm\u002Ftools\u002Fpreprocess_sft_dataset.py \\\n      --model_name_or_path \"GSAI-ML\u002FLLaDA-8B-Base\" \\\n      --sft_map_fn_path \"dllm.utils.default_sft_map_fn\" \\\n      --dataset_args \"allenai\u002Ftulu-3-sft-mixture\" \\\n      --output_dir \".data\u002Fsft\u002Fllada\u002Ftulu-3-sft-mixture\" \\\n      --num_proc 64\n  \n  # SFT with preprocessed data\n  accelerate launch \\\n      --config_file scripts\u002Faccelerate_configs\u002Ffsdp.yaml \\\n      examples\u002Fllada\u002Fsft.py \\\n      --model_name_or_path \"GSAI-ML\u002FLLaDA-8B-Base\" \\\n      --dataset_args \".data\u002Fsft\u002Fllada\u002Ftulu-3-sft-mixture\" \\\n      --load_preprocessed_data True \\\n      ...\n  ``` -->\n\n  ```diff\n  # Preprocess SFT data\n  + python dllm\u002Ftools\u002Fpreprocess_sft_dataset.py \\\n  +     --model_name_or_path \"GSAI-ML\u002FLLaDA-8B-Base\" \\\n  +     --sft_map_fn_path \"dllm.utils.default_sft_map_fn\" \\\n  +     --dataset_args \"allenai\u002Ftulu-3-sft-mixture\" \\\n  +     --output_dir \".data\u002Fsft\u002Fllada\u002Ftulu-3-sft-mixture\" \\\n  +     --num_proc 64\n  \n  # SFT with preprocessed data\n  accelerate launch \\\n      --config_file scripts\u002Faccelerate_configs\u002Ffsdp.yaml \\\n      examples\u002Fllada\u002Fsft.py \\\n      --model_name_or_path \"GSAI-ML\u002FLLaDA-8B-Base\" \\\n  -   --dataset_args \"allenai\u002Ftulu-3-sft-mixture\" \\\n  +   --dataset_args \".data\u002Fsft\u002Fllada\u002Ftulu-3-sft-mixture\" \\\n  +   --load_preprocessed_data True \\\n      ...\n  ```\n\n## Inference\n\nWe provide unified [samplers](\u002Fdllm\u002Fcore\u002Fsamplers) that abstracts away inference details. \nA typical inference entry script (for example, [`examples\u002Fllada\u002Fsample.py`](\u002Fexamples\u002Fllada\u002Fsample.py)) looks like this:\n```python\nimport dllm\n\nmodel = dllm.utils.get_model(model_args=script_args).eval()\ntokenizer = dllm.utils.get_tokenizer(model_args=script_args)\nsampler = dllm.core.samplers.MDLMSampler(model=model, tokenizer=tokenizer)\n\nmessages = [\n    [{\"role\": \"user\", \"content\": \"Lily runs 12 km\u002Fh for 4 hours. How far in 8 hours?\"}],\n    [{\"role\": \"user\", \"content\": \"Please write an educational python function.\"}],\n]\n\ninputs = tokenizer.apply_chat_template(\n    messages,\n    add_generation_prompt=True,\n    tokenize=True,\n)\n\noutputs = sampler.sample(inputs, return_dict=True)\nsequences = dllm.utils.sample_trim(tokenizer, outputs.sequences.tolist(), inputs)\n```\n\nYou can also try interactive chat script (for example, [`examples\u002Fllada\u002Fchat.py`](\u002Fexamples\u002Fllada\u002Fchat.py)) for visualized multi-turn dialogue:\n```shell\npython -u examples\u002Fllada\u002Fchat.py --model_name_or_path \"GSAI-ML\u002FLLaDA-8B-Instruct\"\n```\n\nYou can accelerate inference of [LLaDA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09992) and [Dream](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.15487) with [Fast-dLLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22618).\n```shell\npython examples\u002Ffastdllm\u002Fllada\u002Fsample.py --model_name_or_path \"GSAI-ML\u002FLLaDA-8B-Instruct\" --use_cache prefix --threshold 0.9\n```\n\n\u003Cp align=\"center\">\n    \u003Cimg src=\"\u002Fassets\u002Fchat.gif\" alt=\"chat\" width=\"80%\">\n\u003C\u002Fp>\n\u003C!-- \u003Cp align=\"center\">\u003Cem>EditFlow performing insertion (blue), substitution from mask tokens (black), substitution from non-mask tokens (red), and deletion (strikethrough → removed) during sampling.\u003C\u002Fem>\u003C\u002Fp> -->\n\n## Evaluation\n> Read [(optional) Evaluation setup](\u002FREADME.md#optional-evaluation-setup) before running evaluation. \n\nFor example, to evaluate [`LLaDA-8B-Instruct`](https:\u002F\u002Fhuggingface.co\u002FGSAI-ML\u002FLLaDA-8B-Instruct) on [`MMLU_Pro`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTIGER-Lab\u002FMMLU-Pro) using 4 GPUs, run:\n```shell\naccelerate launch --num_processes 4 \\\n    dllm\u002Fpipelines\u002Fllada\u002Feval.py \\\n    --tasks \"mmlu_pro\" \\\n    --model \"llada\" \\\n    --apply_chat_template \\\n    --num_fewshot 0 \\\n    --model_args \"pretrained=GSAI-ML\u002FLLaDA-8B-Instruct,is_check_greedy=False,mc_num=1,max_new_tokens=256,steps=256,block_size=256,cfg_scale=0.0\"\n```\n\nWe also provide scripts to automatically evaluate [LLaDA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09992), [Dream](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.15487), and [BERT-Chat](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fdllm-hub\u002Fbert-chat) on all benchmarks.\nFor example, you can run [`examples\u002Fllada\u002Feval.sh`](\u002Fexamples\u002Fllada\u002Feval.sh) directly using the following commands:\n```shell\nbash examples\u002Fllada\u002Feval.sh --model_name_or_path \"GSAI-ML\u002FLLaDA-8B-Instruct\" --instruct True\nbash examples\u002Fllada\u002Feval.sh --model_name_or_path \"GSAI-ML\u002FLLaDA-8B-Base\" --instruct False\n```\n\nWe provide scripts to evaluate [LLaDA](https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.09992) and [Dream](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.15487) using [Fast-dLLM](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22618):\n```shell\nbash examples\u002Ffastdllm\u002Fllada\u002Feval.sh --model_name_or_path \"GSAI-ML\u002FLLaDA-8B-Instruct\" --instruct True --num_gpu 1\nbash examples\u002Ffastdllm\u002Fdream\u002Feval.sh --model_name_or_path \"Dream-org\u002FDream-v0-Base-7B\" --instruct False --num_gpu 1\n```\n\n\n## Citation\n```\n@misc{zhou2026dllm,\n      title={dLLM: Simple Diffusion Language Modeling}, \n      author={Zhanhui Zhou and Lingjie Chen and Hanghang Tong and Dawn Song},\n      year={2026},\n      eprint={2602.22661},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.22661}, \n}\n```\n","dLLM 是一个用于训练和评估扩散语言模型的库，旨在提高开发流程的透明度和可复现性。它基于 `transformers` 框架提供可扩展的训练管道，并支持 LoRA、DeepSpeed 和 FSDP 等技术，同时通过 `lm-evaluation-harness` 提供统一的评估管道，简化了推理细节和自定义过程。dLLM 适用于需要高效训练、推理和评估开放权重模型（如 LLaDA 和 Dream）的场景，以及实现特定训练算法（如 MDLM、BD3LM 和 Edit Flows）。该项目适合研究人员和开发者在自然语言处理领域进行实验和应用开发。",2,"2026-06-11 03:42:11","high_star"]