[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80714":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":13,"stars7d":17,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":14,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":16,"starSnapshotCount":16,"syncStatus":15,"lastSyncTime":27,"discoverSource":28},80714,"RAVEN","mvp-ai-lab\u002FRAVEN","mvp-ai-lab","Implementation of our paper \"RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO\"","",null,"Python",48,1,3,2,0,4,45.3,"Other",false,"main",true,[],"2026-06-12 04:01:29","# RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO\n\n[Yanzuo Lu](https:\u002F\u002Fyanzuo.lu\u002F) · [Ronglai Zuo](https:\u002F\u002F2000zrl.github.io\u002F) · [Jiankang Deng](https:\u002F\u002Fjiankangdeng.github.io\u002F) — Imperial College London\n\nProject page: \u003Chttps:\u002F\u002Fyanzuo.lu\u002Fraven>\n\n## TL;DR\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc1aa3b08-4a6e-431f-8b63-d7266774de3b\n\nCausal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the **Real-time Autoregressive Video Extrapolation Network (RAVEN)**, a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose **Consistency-model Group Relative Policy Optimization (CM-GRPO)**, which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler–Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.\n\n## Setup\n\nThe base environment (Python 3.10, CUDA 12.8 toolkit, `uv`, system libs) is provisioned via conda from `tools\u002Fenvironment.yaml`. The project itself then lives in a `uv`-managed venv with Python dependencies pinned in `tools\u002Frequirements.lock` (torch 2.11+cu128, transformers 4.57, diffusers 0.37, wandb, clearml), plus locally-built flash-attention 2\u002F3 and magi-attention wheels.\n\n```sh\nconda env create -f tools\u002Fenvironment.yaml       # creates the `raven` conda env\nconda activate raven\nbash tools\u002Fprepare_venv.sh                        # builds .\u002Fvenv, syncs requirements.lock,\n                                                  # then builds + installs flash_attn{,_3}\n                                                  # and magi_attention wheels into .\u002Fassets\nsource venv\u002Fbin\u002Factivate\n```\n\nTargets Hopper (SM 9.0) by default — adjust `TORCH_CUDA_ARCH_LIST` \u002F `FLASH_ATTN_CUDA_ARCHS` in `tools\u002Fprepare_venv.sh` for other GPUs. Override `CONDA_ENV`, `CUDA_HOME`, or `MAX_JOBS` via env vars if your layout differs.\n\nDownload the corresponding model checkpoints (Wan2.1-T2V-1.3B base, our released RAVEN and CM-GRPO weights, and any upstream baseline weights referenced by the configs you intend to run) yourself and point the `weight` fields in each config at the local paths.\n\nCM-GRPO ships in three interchangeable flavors on [`mvp-lab\u002FRAVEN`](https:\u002F\u002Fhuggingface.co\u002Fmvp-lab\u002FRAVEN); pick the one that matches your config:\n\n- **LoRA adapter** (`cmgrpo_raven_lora.safetensors`) — adapter only. CM-GRPO was trained on top of RAVEN, so the backbone loads `raven_model.pt` as the base and the adapter on top:\n\n  ```jsonc\n  \"backbone\": {\n      \"weight\": \"\u002Fpath\u002Fto\u002Fraven_model.pt\",\n      \"lora\": {\n          \"enabled\": true,\n          \"weight\": \"\u002Fpath\u002Fto\u002Fcmgrpo_raven_lora.safetensors\"\n      }\n  }\n  ```\n\n- **Base + LoRA bundle** (`cmgrpo_raven_full.pt`) — RAVEN base and the LoRA adapter packed into a single PEFT-wrapped state dict (the raw output of our DCP→torch checkpoint conversion). Skip the separate base weight and load the bundle through `lora.weight`:\n\n  ```jsonc\n  \"backbone\": {\n      \"lora\": {\n          \"enabled\": true,\n          \"weight\": \"\u002Fpath\u002Fto\u002Fcmgrpo_raven_full.pt\"\n      }\n  }\n  ```\n\n- **Merged** (`cmgrpo_raven_merge.pt`) — full backbone with the adapter already baked into RAVEN. Drop the `lora` block entirely and load it as the base weight:\n\n  ```jsonc\n  \"backbone\": {\n      \"weight\": \"\u002Fpath\u002Fto\u002Fcmgrpo_raven_merge.pt\"\n  }\n  ```\n\n  This flavor is also compatible with the `third_party\u002F\u003Cbaseline>\u002F` inference entrypoints as well as the original upstream baseline implementations.\n\nRAVEN itself (`raven_model.pt`) is a single full backbone and follows the merged pattern.\n\n## Running\n\nEvery command dispatches through `tools\u002Fmulti_run.sh \u003Cjsonc>`, which wraps `torchrun` over `main.py`. Override `N` (procs per node), `NNODES`, `MASTER_ADDR`, `MASTER_PORT` via env vars; defaults autodetect from SLURM or local GPUs. Set `D=\u003Cn>` to launch under `debugpy` with `n` procs.\n\nTrain RAVEN:\n\n```sh\nbash tools\u002Fmulti_run.sh configs\u002Ftrials\u002Fdistribution_matching_distillation\u002Fcausal_wan2.1_1.3B_t2v\u002Fraven.jsonc\n```\n\nCM-GRPO on top of RAVEN:\n\n```sh\nbash tools\u002Fmulti_run.sh configs\u002Ftrials\u002Fgroup_relative_policy_optimization\u002Fcausal_wan2.1_1.3B_t2v\u002Fcmgrpo_raven_raft0.35ta2aq1iq1ms0.75.jsonc\n```\n\nSample the VBench prompt suite (videos only; scoring is in the next section):\n\n```sh\nbash tools\u002Fmulti_run.sh configs\u002Ftrials\u002Fvbench_t2v\u002Fcausal_wan2.1_1.3B_t2v\u002Fraven.jsonc\nbash tools\u002Fmulti_run.sh configs\u002Ftrials\u002Fvbench_t2v\u002Fcausal_wan2.1_1.3B_t2v\u002Fcmgrpo.jsonc\n```\n\nQualitative samples on the 100-prompt baseline set:\n\n```sh\nbash tools\u002Fmulti_run.sh configs\u002Ftrials\u002Fgenerate_t2v\u002Fcausal_wan2.1_1.3B_t2v\u002Fraven_baseline_prompts.jsonc\nbash tools\u002Fmulti_run.sh configs\u002Ftrials\u002Fgenerate_t2v\u002Fcausal_wan2.1_1.3B_t2v\u002Fcmgrpo_baseline_prompts.jsonc\n```\n\n## Baseline sampling\n\nBaseline methods compared in the paper (CausVid, Self Forcing, Reward Forcing, Causal Forcing, LongLive, Rolling Forcing) are vendored under `third_party\u002F\u003Cbaseline>\u002F` with a consistent `inference.py \u002F inference.sh` interface and identical sampling settings, so each baseline's output directory can be fed straight into the VBench scoring pipeline below. Each `inference.sh` wraps `torchrun -m third_party.\u003Cbaseline>.inference --config_path \u003Cyaml>`:\n\n```sh\nbash third_party\u002Fcausal_forcing\u002Finference.sh   third_party\u002Fcausal_forcing\u002Fconfigs\u002Fcausal_forcing_dmd_chunkwise_vbench.yaml\nbash third_party\u002Fcausvid\u002Finference.sh          third_party\u002Fcausvid\u002Fconfigs\u002Fwan_causal_dmd_vbench.yaml\nbash third_party\u002Flonglive\u002Finference.sh         third_party\u002Flonglive\u002Fconfigs\u002Flonglive_vbench.yaml\nbash third_party\u002Freward_forcing\u002Finference.sh   third_party\u002Freward_forcing\u002Fconfigs\u002Freward_forcing_vbench.yaml\nbash third_party\u002Frolling_forcing\u002Finference.sh  third_party\u002Frolling_forcing\u002Fconfigs\u002Frolling_forcing_dmd_vbench.yaml\nbash third_party\u002Fself_forcing\u002Finference.sh     third_party\u002Fself_forcing\u002Fconfigs\u002Fself_forcing_dmd_vbench.yaml\n```\n\nEach `*_vbench.yaml` references the upstream-released model checkpoint and writes mp4s into `runs\u002F\u003Cbaseline>_vbench_extended\u002Fvideos\u002F`. The prompt list comes from `assets\u002Fvbench_self_forcing_extended.txt` (945 prompts, shipped).\n\n## VBench evaluation\n\nScoring uses the official VBench harness, which lives in its own venv under `third_party\u002Fvbench\u002F` to avoid clashing with the project venv.\n\nInstall once:\n\n```sh\nbash third_party\u002Fvbench\u002Fprepare_venv.sh\n```\n\nCreates `third_party\u002Fvbench\u002Fvenv\u002F`, syncs `third_party\u002Fvbench\u002Frequirements.lock`, builds `detectron2` from source, and pre-downloads every VBench dimension submodule (DINO, RAFT, AMT, CLIP, etc.) into `$VBENCH_CACHE_DIR` (default `~\u002F.cache\u002Fvbench`).\n\nScore any video directory (RAVEN\u002FCM-GRPO outputs from the Running section, or any baseline output from above):\n\n```sh\nbash third_party\u002Fvbench\u002Feval.sh runs\u002F\u003Crun_name>\u002Fvideos\n```\n\nInternally this:\n1. **Static filters** the first 75 motion-dimension prompts via `vbench static_filter` to pick the highest-motion sample per prompt;\n2. Runs `vbench` evaluation across all 16 dimensions via `torchrun -m vbench.launch.evaluate`;\n3. Aggregates with `python -m third_party.vbench.cal_final_score` to produce the Total \u002F Quality \u002F Semantic scores reported in the paper.\n\nOutputs land alongside the input dir as `\u003Cvideos>_filtered\u002Fevaluation_results\u002F`. Override `VBENCH_SAMPLES_PER_PROMPT` (default 5), `STATIC_FILTER_SAMPLES_PER_PROMPT` (default 25), `N`, `NNODES`, etc. via env vars. The eval reads `assets\u002Fvbench_all_dimension.txt` (the canonical VBench prompt list, shipped).\n\n## Citation\n\nIf you find this work useful, please cite RAVEN. A BibTeX entry will be added when available.\n\n```bibtex\n@article{lu2026raven,\n  title = {RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO},\n  author = {Lu, Yanzuo and Zuo, Ronglai and Deng, Jiankang},\n  year = 2026,\n  journal = {arXiv preprint arXiv:2605.15190}\n}\n```\n","RAVEN是一个实现实时自回归视频外推的项目，基于一致性模型GRPO技术。它通过将每个自我展开打包成干净的历史端点和噪声去噪状态交织的序列，使训练注意力与推理时的外推对齐，从而提高长时间生成的质量。该项目使用Python编写，并依赖于CUDA 12.8工具包、torch 2.11等库。RAVEN特别适用于需要实时流式生成未来视频片段的应用场景，如在线视频内容创作或虚拟现实体验。实验表明，RAVEN在质量、语义及动态程度评估方面均优于近期的因果视频蒸馏基线方法。","2026-06-11 04:01:43","CREATED_QUERY"]