[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80814":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":11,"contributorsCount":11,"subscribersCount":11,"size":11,"stars1d":11,"stars7d":11,"stars30d":13,"stars90d":11,"forks30d":11,"starsTrendScore":11,"compositeScore":14,"rankGlobal":8,"rankLanguage":8,"license":15,"archived":16,"fork":16,"defaultBranch":17,"hasWiki":18,"hasPages":16,"topics":19,"createdAt":8,"pushedAt":8,"updatedAt":20,"readmeContent":21,"aiSummary":22,"trendingCount":11,"starSnapshotCount":11,"syncStatus":23,"lastSyncTime":24,"discoverSource":25},80814,"PA-BDM","SII-sc22mc\u002FPA-BDM","SII-sc22mc",null,"Python",41,0,38,3,37.3,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:30","\u003Cdiv align=\"center\">\n\n# Prefix-Adaptive Block Diffusion for Efficient Document Recognition\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-PA--BDM-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2605.16861)\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FSII-sc22mc\u002FPA-BDM\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-Repository-black?logo=github\" alt=\"GitHub\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FMingxuChai\u002FPA-BDM\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-blue\" alt=\"Hugging Face\">\u003C\u002Fa>\n\n\u003C\u002Fdiv>\n\nPA-BDM is a document-recognition model built on Qwen2.5-VL and block diffusion. It targets text, formula, table, and diagram recognition, and is designed to keep the flexible-length generation and KV-cache benefits of Block Diffusion Models while improving their structural consistency and inference speed.\n\nThe paper identifies two bottlenecks in standard BDM decoding: tokens are only committed after a full block is finished, and bidirectional intra-block denoising conflicts with left-to-right inter-block generation. PA-BDM addresses them with:\n\n- **Causal intra-block denoising**: tokens inside a candidate block attend from prefix to suffix, matching autoregressive structural order.\n- **Confidence-gated Structural Loss (CSL)**: training supervision focuses on the longest reliable masked prefix and avoids noisy gradients from unstable continuations.\n- **Progressive Prefix Commitment (PPC)**: inference treats block size as a maximum candidate range, commits the longest reliable prefix into KV cache, and resets the unresolved suffix as a new candidate range.\n\n## Repository Layout\n\n```text\nPA-BDM\u002F\n|-- infer.ipynb                                      # Notebook inference demo\n|-- main.py                                         # CLI inference demo\n|-- run_train.sh                                    # One-command training launcher\n|-- requirements.txt                                # Pip dependencies except CUDA torch stack\n|-- environment.yml                                 # Conda environment with CUDA torch stack\n|-- init_env.sh                                     # Editable installs for train\u002F and eval\u002F\n|-- docs\u002F\n|   |-- INSTALLATION.md\n|   |-- INFERENCE.md\n|   `-- TRAINING_EVALUATION.md\n|-- train\u002F\n|   |-- llava\u002F                                      # PA-BDM \u002F DiffusionVL training code\n|   `-- scripts\u002F\n|       `-- diffusionvl_qwenvl_finetune_causal_64_include_tr.sh\n`-- eval\u002F\n    |-- scripts\u002F                                    # lmms-eval launchers\n    `-- lmms-eval\u002F\n```\n\n## Quick Start\n\n### 1. Create Environment\n\nConda is recommended for GPU training:\n\n```bash\nconda env create -f environment.yml\nconda activate pa-bdm\nbash init_env.sh\n```\n\nPip-only setup is also possible. Install the CUDA-matched PyTorch stack first, then install the remaining packages:\n\n```bash\nconda create -n pa-bdm python=3.10 -y\nconda activate pa-bdm\npip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\npip install -r requirements.txt\nbash init_env.sh\n```\n\n### 2. Download Models\n\nFor inference, download the released PA-BDM checkpoint:\n\n```bash\nhuggingface-cli download MingxuChai\u002FPA-BDM --local-dir \u002Fpath\u002Fto\u002Fmodels\u002FPA-BDM\n```\n\nFor training from Qwen2.5-VL, convert the base model to the DiffusionVL-QwenVL format:\n\n```bash\npython scripts\u002Fdiffusionvl_prepare\u002Fconvert_qwen2.5vl_to_diffusionvl.py \\\n  --source_path Qwen\u002FQwen2.5-VL-3B-Instruct \\\n  --dest_path \u002Fpath\u002Fto\u002Fmodels\u002FDiffusionVL-Qwen2.5VL-3B-causal\n```\n\n### 3. Run Inference\n\nNotebook:\n\n```bash\njupyter lab infer.ipynb\n```\n\nCLI:\n\n```bash\npython main.py \\\n  --model-path \u002Fpath\u002Fto\u002Fmodels\u002FPA-BDM \\\n  --image example_formula.jpg \\\n  --task formula \\\n  --gen-length 1024 \\\n  --steps 32 \\\n  --confidence-threshold 0.95\n```\n\nTask prompts:\n\n- `text`: `Text Recognition.`\n- `formula`: `Formula Recognition.`\n- `table`: `Table Recognition.`\n- `diagram`: `Diagram Recognition.`\n\n### 4. Prepare Training Data\n\nThe loader accepts a single JSON\u002FJSONL file or a YAML file that mixes multiple datasets. Each sample follows the LLaVA-style multimodal conversation format:\n\n```json\n{\n  \"id\": \"sample-000001\",\n  \"image\": \"images\u002Fsample.png\",\n  \"conversations\": [\n    {\"from\": \"human\", \"value\": \"\u003Cimage>\\nFormula Recognition.\"},\n    {\"from\": \"gpt\", \"value\": \"\\\\frac{x}{y}\"}\n  ]\n}\n```\n\nYAML example:\n\n```yaml\ndatasets:\n  - json_path: \u002Fpath\u002Fto\u002Fformula_train.json\n    image_root: \u002Fpath\u002Fto\u002Fformula_images\n    sampling_strategy: all\n  - json_path: \u002Fpath\u002Fto\u002Ftable_train.jsonl\n    image_root: \u002Fpath\u002Fto\u002Ftable_images\n    sampling_strategy: random:50000\n```\n\nSupported sampling strategies are `all`, `first:N`, `end:N`, `random:N`, and percentage forms such as `random:20%`.\n\n### 5. Run Training\n\nThe root launcher wraps `train\u002Fscripts\u002Fdiffusionvl_qwenvl_finetune_causal_64_include_tr.sh` and exposes the paths as environment variables:\n\n```bash\nPRETRAINED_CHECKPOINT=\u002Fpath\u002Fto\u002Fmodels\u002FDiffusionVL-Qwen2.5VL-3B-causal \\\nDATA_PATH=\u002Fpath\u002Fto\u002Ftrain_data.yaml \\\nIMAGE_FOLDER=\u002Fpath\u002Fto\u002Fimages \\\nOUTPUT_DIR=\u002Fpath\u002Fto\u002Foutputs \\\nGPU_NUM=4 \\\nRUN_NAME=pa-bdm-csl-64 \\\nBD3LM_BLOCK_SIZE=64 \\\nbash run_train.sh\n```\n\nFor multi-node training, set the standard torchrun variables:\n\n```bash\nNUM_NODES=4 \\\nGPU_NUM=8 \\\nMASTER_ADDR=10.0.0.1 \\\nMASTER_PORT=29199 \\\nRANK=0 \\\nbash run_train.sh\n```\n\nThe paper reports experiments with PA-BDM-1.2B\u002F3B, CSL and PPC confidence threshold `0.95`, and maximum candidate block size commonly set to `32` unless otherwise specified. This repository's `causal_64_include_tr` training script defaults to `BD3LM_BLOCK_SIZE=64`; set `BD3LM_BLOCK_SIZE=32` if you want the paper's default block-size setting.\n\n\n## Documentation\n\n| Document | Description |\n| :--- | :--- |\n| [Installation](docs\u002FINSTALLATION.md) | Environment, model download, and editable package setup |\n| [Training & Evaluation](docs\u002FTRAINING_EVALUATION.md) | Data format, one-command training, and evaluation scripts |\n| [Inference](docs\u002FINFERENCE.md) | Notebook and CLI inference |\n\n## Acknowledgements\n\nThis repo is mainly built on [Qwen2.5-VL](https:\u002F\u002Fgithub.com\u002FQwenLM\u002FQwen3-VL), [LLaDA-V](https:\u002F\u002Fgithub.com\u002FML-GSAI\u002FLLaDA-V), [BD3LMs](https:\u002F\u002Fgithub.com\u002Fkuleshov-group\u002Fbd3lms), and [DiffusionVL](https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.15713). We thank the authors for their open-source contributions.\n\n## Citation\n\nIf you find this work useful, please cite:\n\n```bibtex\n@misc{chai2026prefixadaptiveblockdiffusionefficient,\n      title={Prefix-Adaptive Block Diffusion for Efficient Document Recognition},\n      author={Mingxu Chai and Ziyu Shen and Chenyu Liu and Kaidi Zhang and Jiazheng Zhang and Dingwei Zhu and Zhiheng Xi and Ruoyu Chen and Jun Long and Jihua Kang and Tao Gui and Qi Zhang},\n      year={2026},\n      eprint={2605.16861},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.16861},\n}\n```\n","PA-BDM是一个基于Qwen2.5-VL和块扩散模型的文档识别模型，专注于文本、公式、表格和图表的识别。其核心功能包括因果块内去噪、基于置信度门控的结构损失（CSL）以及逐步前缀提交（PPC），这些技术旨在提高模型的结构一致性和推理速度。通过这些机制，PA-BDM解决了标准块扩散模型在解码过程中存在的两个主要瓶颈：全块完成前不生成令牌及双向块内去噪与从左到右块间生成之间的冲突。该模型适用于需要高效且准确地从文档中提取多种类型信息的应用场景，如学术论文解析、企业报告分析等。",2,"2026-06-11 04:02:25","CREATED_QUERY"]