[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-76437":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":13,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},76437,"Cola-DLM","ByteDance-Seed\u002FCola-DLM","ByteDance-Seed","The codebase of Cola DLM","https:\u002F\u002Fhongcanguo.github.io\u002FCola-DLM\u002F",null,"Python",221,13,37,1,0,3,154,12,3.44,"Apache License 2.0",false,"main",[],"2026-06-12 02:03:41","\u003Cdiv align=\"center\">\n\n# Cola DLM\n\n**Continuous Latent Diffusion Language Model — a hierarchical latent-space text diffusion model with a block-causal DiT prior over a Text VAE.**\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.06548-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.06548)\n[![Model](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingFace-Model-yellow.svg)](https:\u002F\u002Fhuggingface.co\u002FByteDance-Seed\u002FCola-DLM)\n[![HuggingFace Daily](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF-Daily%20Paper-yellow.svg)](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2605.06548)\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject%20Page-Cola--DLM-1e90ff.svg)](https:\u002F\u002Fhongcanguo.github.io\u002FCola-DLM\u002F)\n[![Blog](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FBlog-Post-1e90ff.svg)](https:\u002F\u002Fhongcanguo.github.io\u002Fposts\u002F2026-cola-dlm.html)\n[![Zhihu](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FZhihu-Article-0084ff.svg)](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2038324180920313704)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache%202.0-blue.svg)](LICENSE)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.9%2B-blue.svg)](https:\u002F\u002Fwww.python.org\u002F)\n[![PyTorch](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpytorch-2.1%2B-ee4c2c.svg)](https:\u002F\u002Fpytorch.org\u002F)\n[![Transformers](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ftransformers-4.40%2B-yellow.svg)](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Ftransformers)\n[![Code style: black](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fcode%20style-black-000000.svg)](https:\u002F\u002Fgithub.com\u002Fpsf\u002Fblack)\n\n[English](README.md) · [中文](README_zh.md)\n\n\u003C\u002Fdiv>\n\n> **Cola DLM** (`Co`ntinuous `La`tent `D`iffusion `L`anguage `M`odel) is the official, HuggingFace-Transformers-compatible open-source release of the paper [*Continuous Latent Diffusion Language Model*](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.06548). Cola DLM is a **hierarchical latent-variable language model**: a *Text VAE* learns a stable mapping `q_phi(z_0 | x)` between text and a continuous latent sequence; a *block-causal Diffusion Transformer (DiT)* models the latent prior `p_psi(z_0)` via Flow Matching; and the *conditional decoder* `p_theta(x | z_0)` realizes the actual tokens. From a unified Markov-path perspective, the diffusion process performs **latent prior transport** rather than token-level observation recovery, separating global semantic organization from local textual realization. This repository ships the trained checkpoint together with a no-padding (\"NA\") flatten-concat inference pipeline that runs natively under HuggingFace Transformers.\n\n---\n\n## Paper\n\n- **Title:** Continuous Latent Diffusion Language Model\n- **Authors:** Hongcan Guo, Qinyu Zhao, Yian Zhao, Shen Nie, Rui Zhu, Qiushan Guo, Feng Wang, Tao Yang, Hengshuang Zhao, Guoqiang Wei, Yan Zeng (ByteDance Seed et al.)\n- **arXiv:** [arxiv.org\u002Fabs\u002F2605.06548](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.06548)\n- **Model weights:** [huggingface.co\u002FByteDance-Seed\u002FCola-DLM](https:\u002F\u002Fhuggingface.co\u002FByteDance-Seed\u002FCola-DLM)\n- **HuggingFace daily paper:** [huggingface.co\u002Fpapers\u002F2605.06548](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2605.06548)\n- **Project page:** [hongcanguo.github.io\u002FCola-DLM](https:\u002F\u002Fhongcanguo.github.io\u002FCola-DLM\u002F)\n- **Blog post:** [hongcanguo.github.io\u002Fposts\u002F2026-cola-dlm.html](https:\u002F\u002Fhongcanguo.github.io\u002Fposts\u002F2026-cola-dlm.html)\n- **Zhihu article:** [zhuanlan.zhihu.com\u002Fp\u002F2038324180920313704](https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2038324180920313704)\n\n---\n\n## Method at a glance\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Ffigures\u002Fcola_main_fig.png\" alt=\"Overall workflow of Cola DLM: Text VAE pretraining, joint Text VAE + block-causal Text DiT training, and KV-cached inference.\" width=\"900\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cem>\u003Cstrong>Figure 1 — Overall workflow of Cola DLM.\u003C\u002Fstrong> \u003Cstrong>Stage 1\u003C\u002Fstrong>: Text VAE pretraining with reconstruction, BERT-style masking, and a KL regularizer to a base prior. \u003Cstrong>Stage 2\u003C\u002Fstrong>: joint Text VAE + block-causal Text DiT training; the DiT learns the latent prior \u003Ccode>p_psi(z_0)\u003C\u002Fcode> via Flow Matching under the visible set \u003Ccode>V_b\u003C\u002Fcode>. \u003Cstrong>Inference\u003C\u002Fstrong>: prefix encoding \u003Ccode>q_phi(z\u003Csup>pre\u003C\u002Fsup> | x\u003Csup>pre\u003C\u002Fsup>)\u003C\u002Fcode>, block-wise prior transport \u003Ccode>Phi\u003Csup>psi\u003C\u002Fsup>\u003Csub>0←1\u003C\u002Fsub>\u003C\u002Fcode> in latent space, and conditional decoding \u003Ccode>p_theta(x | z_0)\u003C\u002Fcode> with KV cache.\u003C\u002Fem>\u003C\u002Fp>\n\nCola DLM defines the joint generative distribution as\n\n```\np(x, z_0) = p_theta(x | z_0) * p_psi(z_0),    p(x) = ∫ p_theta(x | z_0) * p_psi(z_0) dz_0,\n```\n\nwhere `q_phi(z_0 | x)` is an inference (encoder) model used only at training and prefix-encoding time. The latent is decomposed into `B` blocks `z_0 = (z_0^(1), ..., z_0^(B))` with a block-causal factorization `p_psi(z_0) = p_psi(z_0^(1)) * ∏_{b≥2} p_psi(z_0^(b) | z_0^(\u003Cb))`, which directly mirrors the block-causal attention pattern of the DiT.\n\nTraining is two-stage:\n\n1. **Stage 1 — Text VAE pretraining.** Learns a stable text↔latent mapping (`q_phi`, `p_theta`) with reconstruction, BERT-style masking, and a KL regularizer to a base prior `p_base`.\n2. **Stage 2 — Joint Text VAE + block-causal Text DiT pretraining.** The DiT learns the latent prior `p_psi(z_0)` via conditional **Flow Matching** under the visible set `V_b = {sg(z_0^(\u003Cb)), z_t^(b)}`, while the VAE remains trainable under recon, mask, and a reference-encoder KL regularizer that prevents latent drift.\n\nInference (this repo) implements the paper's three-step recipe: **(i) prefix encode** `z^pre ~ q_phi(z^pre | x^pre)`; **(ii) block-wise generation** by transporting noise under the historical condition, `hat z_0^(b) = Phi^psi_{0←1}(eps^(b); z^pre, hat z_0^(\u003Cb))`, with `eps^(b) ~ N(0, I)`; **(iii) conditional decoding** `hat x^res ~ p_theta(x^res | z^pre, hat z_0^(1:B))`. See [`docs\u002Farchitecture.md`](docs\u002Farchitecture.md) for the full mapping between paper notation and code paths.\n\n---\n\n## Table of contents\n\n- [Highlights](#highlights)\n- [Installation](#installation)\n- [Quickstart](#quickstart)\n- [OpenAI-compatible deployment](#openai-compatible-deployment)\n- [Evaluation benchmarks](#evaluation-benchmarks)\n- [Unified text–image (preliminary)](#unified-textimage-preliminary)\n- [Project layout](#project-layout)\n- [Documentation](#documentation)\n- [Citation](#citation)\n- [License](#license)\n\n---\n\n## Highlights\n\n- **Hierarchical latent-variable model.** `ColaTextVAEModel` provides the inference encoder `q_phi` and the conditional decoder `p_theta`; `ColaDiTModel` parameterizes the block-causal latent prior `p_psi`. Diffusion is used to *transport the latent prior* (Eq. 2.1.4 of the paper), not to recover tokens.\n- **HuggingFace-native.** `ColaDiTModel` and `ColaTextVAEModel` subclass `transformers.PreTrainedModel` and ship with matching `PretrainedConfig` classes, so `from_pretrained` \u002F `save_pretrained` \u002F `AutoConfig` all work out of the box.\n- **No-padding (\"NA\") inference.** Variable-length samples are concatenated along a single sequence axis with a companion `txt_shape: (B, 1)` describing per-sample lengths. RoPE, attention masks and the prior-transport loop are all driven by those lengths — no `max_len` padding is allocated at any point.\n- **Block-causal prior + classifier-free guidance.** The DiT realizes one block of `Phi^psi_{0←1}` per generation step under the block-causal visibility constraint `V_b`, alternating a conditional (prefix-aware) and unconditional (empty-prefix) pass exactly like the training objective.\n- **KV cache everywhere.** Both the DiT and the VAE decoder cache per-sample K\u002FV projections between blocks, so generating block `t+1` only pays attention to the newly appended block's Q.\n- **OpenAI-compatible serving.** [`openai_adapter\u002F`](openai_adapter\u002F) exposes Cola DLM through `POST \u002Fv1\u002Fchat\u002Fcompletions`, making it easy to deploy behind existing OpenAI-style clients, gateways, and evaluation tools.\n- **Reproducible benchmark.** [`scripts\u002Frun_benchmark.sh`](scripts\u002Frun_benchmark.sh) reproduces the 8-task evaluation (LAMBADA, MMLU, OBQA, HellaSwag, RACE, SIQA, SQuAD, Story Cloze) reported in the paper's RQ4 with a single shell command, including multi-GPU data-parallel sharding.\n\nSee [`docs\u002Farchitecture.md`](docs\u002Farchitecture.md) and [`docs\u002Fmodel_card.md`](docs\u002Fmodel_card.md) for a deeper technical description.\n\n---\n\n## Installation\n\nCola DLM targets **Python 3.9+** and **PyTorch 2.1+** on Linux \u002F macOS.\n\n### From source (recommended)\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fyour-org\u002Fcola-dlm.git\ncd cola-dlm\n\n# Editable install with runtime dependencies\npip install -e .\n\n# Or with dev extras (pytest, ruff, black, pre-commit)\npip install -e \".[dev]\"\n```\n\n### From PyPI (once published)\n\n```bash\npip install cola-dlm\n```\n\n---\n\n## Quickstart\n\n### 1. Prepare model weights\n\nDownload the HuggingFace-format model weights from [ByteDance-Seed\u002FCola-DLM](https:\u002F\u002Fhuggingface.co\u002FByteDance-Seed\u002FCola-DLM), or place compatible local weights under `hf_models\u002Fcola_dlm\u002F`:\n\n```\nhf_models\u002F\n├── cola_dlm\u002F\n│   ├── cola_dit\u002F        # config.json + model.safetensors\n│   └── cola_vae\u002F        # config.json + model.safetensors\n└── tokenizer.json\n```\n\n### 2. Programmatic inference\n\n```python\nimport torch\nfrom tokenizers import Tokenizer\nfrom cola_dlm import (\n    ColaDiTModel,\n    ColaTextVAEModel,\n    generate_task_repaint_inference,\n)\n\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\ndit = ColaDiTModel.from_pretrained(\"hf_models\u002Fcola_dlm\u002Fcola_dit\").to(device)\nvae = ColaTextVAEModel.from_pretrained(\"hf_models\u002Fcola_dlm\u002Fcola_vae\").to(device)\ntokenizer = Tokenizer.from_file(\"hf_models\u002Ftokenizer.json\")\n\nprompts = [{\"question\": \"Question: What is the capital of France? Answer:\"}]\nresults = generate_task_repaint_inference(\n    dit=dit,\n    vae=vae,\n    tokenizer=tokenizer,\n    prompts=prompts,\n    task_name=\"lambada\",\n    device=device,\n    max_new_tokens=32,\n    temperature=0.0,\n    guidance_scale=7.0,\n    timestep_num=16,\n    pad_token_id=100277,\n)\nprint(results[0][\"generate\"])\n```\n\n`generate_task_repaint_inference` implements the paper's inference algorithm end-to-end: (i) prefix encode through the Text VAE, (ii) block-wise prior transport through the block-causal DiT, (iii) conditional decoding back to tokens. See [`examples\u002Fquickstart.py`](examples\u002Fquickstart.py) for a runnable, end-to-end script.\n\n### 3. CLI inference\n\n```bash\ncola-dlm-infer \\\n    --dit_path hf_models\u002Fcola_dlm\u002Fcola_dit \\\n    --vae_path hf_models\u002Fcola_dlm\u002Fcola_vae \\\n    --tokenizer_path hf_models\u002Ftokenizer.json \\\n    --input_jsonl generate_task_data\u002Flambada.jsonl \\\n    --output_dir eval_output\u002Fmy_run \\\n    --task_name lambada\n```\n\nRun `cola-dlm-infer --help` for the full argument list.\n\n---\n\n## OpenAI-compatible deployment\n\nThe [`openai_adapter\u002F`](openai_adapter\u002F) directory adds a lightweight HTTP service for serving Cola DLM through an OpenAI-compatible Chat Completions API:\n\n```text\nPOST \u002Fv1\u002Fchat\u002Fcompletions\n```\n\nInstall the adapter dependencies from the repository root:\n\n```bash\npip install -e .\npip install -r openai_adapter\u002Frequirements.txt\n```\n\nThen start the service with the model paths and optional bearer token:\n\n```bash\nexport COLA_DIT_PATH=hf_models\u002Fcola_dlm\u002Fcola_dit\nexport COLA_VAE_PATH=hf_models\u002Fcola_dlm\u002Fcola_vae\nexport COLA_TOKENIZER_PATH=hf_models\u002Ftokenizer.json\nexport COLA_MODEL_NAME=cola-dlm\nexport COLA_API_KEY=change-me\n\nuvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000\n```\n\nThe service supports `GET \u002Fhealth`, `GET \u002Fv1\u002Fmodels`, and non-streaming `POST \u002Fv1\u002Fchat\u002Fcompletions`. See [`openai_adapter\u002FREADME.md`](openai_adapter\u002FREADME.md) for request examples, environment variables, and production notes.\n\n---\n\n## Evaluation benchmarks\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Ffigures\u002Frq4_scaling_vs_ar_llada.png\" alt=\"Scaling curves across 8 benchmarks plus Task Average — Cola DLM (red) vs AR (blue) and LLaDA (orange), up to ~2000 EFLOPs.\" width=\"900\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cem>\u003Cstrong>Figure 2 — RQ4 headline scaling result.\u003C\u002Fstrong> Strictly matched ~2B-parameter setup, unified \u003Cem>generative\u003C\u002Fem> evaluation protocol, scaling curves up to ~2000 EFLOPs across 8 benchmarks plus Task Average. \u003Cstrong>Cola DLM (red)\u003C\u002Fstrong> reaches the best final Task Average — and the curve is \u003Cem>still rising\u003C\u002Fem> — with a clear lead on reasoning-heavy \u003Cstrong>MMLU, RACE, Story Cloze, OBQA\u003C\u002Fstrong>; SQuAD eventually surpasses AR and approaches LLaDA's strong region. The result is conservative: latent dimension \u003Ccode>d=16\u003C\u002Fcode>, no extended training, room to scale further.\u003C\u002Fem>\u003C\u002Fp>\n\nThe `scripts\u002F` folder contains a one-click reproduction of the 8-task evaluation pipeline used in the paper's RQ4 scaling comparison:\n\n```bash\n# Evaluate all 8 tasks (assumes hf_models\u002F and generate_task_data\u002F are populated)\nbash scripts\u002Frun_benchmark.sh\n\n# Single task, single GPU\nTASKS=\"lambada\" NUM_GPUS=1 bash scripts\u002Frun_benchmark.sh\n\n# Compute accuracy from evaluation outputs\npython scripts\u002Facc_calc.py\n```\n\nReference accuracy numbers (see [`eval_output\u002Faccuracy_summary.csv`](eval_output\u002Faccuracy_summary.csv)):\n\n| Task       | Accuracy (%) |\n|------------|--------------|\n| LAMBADA    | 50.80        |\n| MMLU       | 19.30        |\n| OBQA       | 23.00        |\n| HellaSwag  | 10.70        |\n| RACE       | 19.60        |\n| SIQA       | 28.90        |\n| SQuAD      | 30.90        |\n| Story Cloze| 30.77        |\n| **Tasks Average** | **26.75** |\n\n> **Note on open-source model and accuracy:**\n> The released model weights correspond to the **2000 EFLOPs** entry on the scaling curve in the paper's RQ4 — the largest training-compute checkpoint reported. Because the internal architecture used for evaluation in the paper differs slightly from the open-source HuggingFace Transformers-based implementation in this repository, per-task accuracy numbers may exhibit minor fluctuations, but the overall trend is consistent with the paper. Notably, the **Tasks Average (26.75%) measured here is slightly higher than the final average reported in the paper**.\n\n---\n\n## Unified text–image (preliminary)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Ffigures\u002Funified_samples.png\" alt=\"Unified text-image qualitative samples: text-only continuation, image-conditioned text generation, text-to-image samples, and shared MMDiT prior schematic.\" width=\"900\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\u003Cem>\u003Cstrong>Figure 3 — Towards unified text–image modeling.\u003C\u002Fstrong> Modality-specific VAE encoders\u002Fdecoders interface with a \u003Cem>shared\u003C\u002Fem> block-causal MMDiT prior over a joint latent state — the same hierarchical latent decomposition extends naturally from text to vision. \u003Cem>Left\u003C\u002Fem>: text-only continuation and image-conditioned text generation (image-to-text). \u003Cem>Middle\u003C\u002Fem>: text-to-image samples from in-house pretraining only (no SFT, no high-quality data curation). \u003Cem>Right\u003C\u002Fem>: schematic of the shared block-causal MMDiT prior. This is intentionally early-stage; comprehensive unified multimodal training is left for future work — see the paper's Discussion for the full set of qualitative samples.\u003C\u002Fem>\u003C\u002Fp>\n\n> The released open-source code in this repository covers the **text-only** Cola DLM pipeline (Text VAE + block-causal DiT prior). Unified text–image training and inference are reported in the paper's Discussion as preliminary experiments and are not included in this release.\n\n---\n\n## Project layout\n\n```\ncola-dlm\u002F\n├── cola_dlm\u002F                 # Importable Python package\n│   ├── __init__.py           # Public API re-exports\n│   ├── configuration_cola_dit.py   # ColaDiTConfig — block-causal DiT prior knobs\n│   ├── configuration_cola_vae.py   # ColaTextVAEConfig — Text VAE knobs\n│   ├── modeling_cola_dit.py  # ColaDiTModel — block-causal DiT prior p_psi(z_0)\n│   ├── modeling_cola_vae.py  # ColaTextVAEModel — encoder q_phi + decoder p_theta\n│   ├── attention_utils.py    # NA flatten-concat helpers + block-causal mask (visible set V_b)\n│   └── inference.py          # Batch benchmark CLI + generate_task_repaint_inference\n├── docs\u002F                     # Architecture, model card, inference docs\n├── examples\u002F                 # Minimal runnable examples\n├── openai_adapter\u002F            # OpenAI-compatible HTTP serving adapter\n├── scripts\u002F                  # Shell entry points (benchmark + accuracy)\n├── tests\u002F                    # Unit + smoke tests\n├── eval_output\u002F              # Reference benchmark outputs (CSV summary committed)\n├── generate_task_data\u002F       # Benchmark JSONL datasets\n├── pyproject.toml            # Build + metadata + dep spec\n├── requirements.txt          # Pinned runtime deps\n├── LICENSE                   # Apache-2.0\n├── NOTICE                    # Apache-2.0 attribution\n├── SECURITY.md               # Vulnerability reporting\n└── README.md \u002F README_zh.md  # Project documentation\n```\n\n---\n\n## Documentation\n\nLong-form docs live under [`docs\u002F`](docs\u002F):\n\n- [`docs\u002Farchitecture.md`](docs\u002Farchitecture.md) — hierarchical latent-variable framing, VAE + DiT architecture, block-wise prior-transport loop, CFG, NA flatten-concat layout, Stage 1 \u002F Stage 2 training reference.\n- [`docs\u002Fmodel_card.md`](docs\u002Fmodel_card.md) — intended use, training data, limitations, bias, responsible-AI notes.\n- [`docs\u002Finference.md`](docs\u002Finference.md) — how to run batch benchmarks and the Python API.\n- [`openai_adapter\u002FREADME.md`](openai_adapter\u002FREADME.md) — how to deploy the OpenAI-compatible HTTP service.\n\nFor security-sensitive reports, please follow [`SECURITY.md`](SECURITY.md).\n\n---\n\n## Star History\n\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F?repos=ByteDance-Seed%2FCola-DLM&type=date&legend=top-left\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=ByteDance-Seed\u002FCola-DLM&type=date&theme=dark&legend=top-left\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=ByteDance-Seed\u002FCola-DLM&type=date&legend=top-left\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=ByteDance-Seed\u002FCola-DLM&type=date&legend=top-left\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n---\n\n## Citation\n\nIf Cola DLM contributes to your research, please cite the paper:\n\n```bibtex\n@article{guo2026cola,\n  title   = {Continuous Latent Diffusion Language Model},\n  author  = {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and\n             Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and\n             Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan},\n  journal = {arXiv preprint arXiv:2605.06548},\n  year    = {2026},\n  url     = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.06548},\n}\n```\n\nYou may additionally cite this open-source release:\n\n```bibtex\n@software{cola_dlm_2026,\n  title   = {Cola DLM: Official Open-Source Inference Code for Continuous Latent Diffusion Language Model},\n  year    = {2026},\n  url     = {https:\u002F\u002Fgithub.com\u002Fyour-org\u002Fcola-dlm},\n  version = {0.1.0}\n}\n```\n\n---\n\n## License\n\nCola DLM is released under the [Apache License 2.0](LICENSE). See [`NOTICE`](NOTICE) for third-party attributions.\n","Cola DLM 是一个基于连续潜在扩散的语言模型，它通过层次化的潜在变量结构来生成文本。该项目的核心功能包括使用Text VAE建立文本与连续潜在序列之间的稳定映射、利用块因果扩散变换器（DiT）建模潜在先验，并通过条件解码器实现实际的文本输出。技术上，Cola DLM依赖于PyTorch和HuggingFace Transformers框架，支持Python 3.9及以上版本。该模型特别适用于需要高质量文本生成的应用场景，如创意写作辅助、对话系统以及内容创作等。",2,"2026-06-11 03:55:03","CREATED_QUERY"]