[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80682":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":13,"stars7d":13,"stars30d":13,"stars90d":14,"forks30d":14,"starsTrendScore":15,"compositeScore":16,"rankGlobal":8,"rankLanguage":8,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":8,"pushedAt":8,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":25,"discoverSource":26},80682,"Stream-R1","FrameX-AI\u002FStream-R1","FrameX-AI",null,"Python",49,5,47,2,0,6,44.53,"Apache License 2.0",false,"main",true,[],"2026-06-12 04:01:29","\u003Cdiv align=\"center\">\n\n\u003Ch1>Stream-R1: \u003Cbr> Reliability-Perplexity Aware Reward Distillation \u003Cbr> for Streaming Video Generation\u003C\u002Fh1>\n\n\u003Cdiv>\n  \u003Ca href=\"#\" target=\"_blank\">\u003Cstrong>Bin Wu\u003C\u002Fstrong>\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n  \u003Ca href=\"https:\u002F\u002Fcorleone-huang.github.io\u002F\" target=\"_blank\">\u003Cstrong>Mengqi Huang\u003C\u002Fstrong>\u003C\u002Fa>\u003Csup>1,&dagger;,&Dagger;\u003C\u002Fsup>,\n  \u003Ca href=\"#\" target=\"_blank\">\u003Cstrong>Shaojin Wu\u003C\u002Fstrong>\u003C\u002Fa>\u003Csup>3,&Dagger;\u003C\u002Fsup>,\n  \u003Ca href=\"#\" target=\"_blank\">Weinan Jia\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n  \u003Ca href=\"#\" target=\"_blank\">Yuxin Wang\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>,\n  \u003Ca href=\"#\" target=\"_blank\">Zhendong Mao\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n  \u003Ca href=\"#\" target=\"_blank\">Yongdong Zhang\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n\u003Cdiv>\n  \u003Csup>1\u003C\u002Fsup> University of Science and Technology of China,\n  \u003Csup>2\u003C\u002Fsup> FrameX.AI,\n  \u003Csup>3\u003C\u002Fsup> Independent Researcher\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n\u003Csub>\u003Csup>&dagger;\u003C\u002Fsup> Corresponding author &nbsp;&middot;&nbsp; \u003Csup>&Dagger;\u003C\u002Fsup> Project lead\u003C\u002Fsub>\n\n\u003Cbr>\u003Cbr>\n\n[![Project Page](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-blue)](https:\u002F\u002Fstream-r1.github.io\u002F)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-red)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.03849)\n[![Models](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗_Models-FrameX--AI%2FStream--R1-yellow)](https:\u002F\u002Fhuggingface.co\u002FFrameX-AI\u002FStream-R1)\n\n\u003C\u002Fdiv>\n\n## Overview\n\n> **TL;DR**: Existing distribution-matching distillation (DMD) methods for streaming video diffusion treat every rollout, frame, and pixel as equally informative supervision. **Stream-R1** instead reweights the DMD objective along two complementary axes — *Inter-Reliability* across rollouts and *Intra-Perplexity* across spatiotemporal regions — with a single shared video reward model. The student concentrates updates where the local reward landscape has not yet flattened, converging to the teacher's high-quality mode rather than its full mixture, and surpasses the multi-step Wan2.1 teacher on VBench Total\u002FSemantic at **23.1 FPS** with no architectural change and zero inference overhead.\n\nQualitative results across 30 s \u002F 60 s \u002F 2 min \u002F 3 min: \u003Chttps:\u002F\u002Fstream-r1.github.io\u002F#duration>.\n\n### Method\n\nStream-R1 modulates the standard DMD generator loss as\n\n$$\\mathcal{L}_{\\text{Stream-R1}} \\;=\\; \\underbrace{\\exp(\\beta \\cdot r_{\\text{final}})}_{\\mathbf{W}_{\\text{inter}}}\\;\\cdot\\;\\text{mean}\\!\\big(\\underbrace{\\mathbf{W}_{\\text{intra}}}_{F\\times H\\times W}\\,\\odot\\,\\mathcal{L}_{\\text{DMD}}\\big)$$\n\nwith three reward-guided components, all derived from one pretrained video reward model:\n\n1. **Inter-Reliability Weighting** — the DMD gradient `g = f_fake − f_real` varies in reliability across rollouts; we exponentially rescale each rollout's loss by `exp(β·r_final)`, so reliable rollouts dominate supervision while low-quality rollouts are attenuated.\n2. **Intra-Perplexity Weighting** — back-propagates the reward model to obtain a per-pixel saliency volume `S ∈ R^{F×H×W}`, factorizes it into a temporal profile and per-frame spatial maps, and uses the product as `W_intra`. Optimization pressure concentrates on the regions and frames where the local reward landscape has not yet flattened — i.e. where further refinement yields the largest expected gain.\n3. **Adaptive Reward Balancing** — tracks per-axis (VQ \u002F MQ \u002F TA) improvement in a sliding window and subtracts the std of per-axis deltas from the reward, keeping the three quality axes improving at similar rates.\n\nSaliency from the three axes is fused with an adaptive softmax weighting that allocates more attention to the currently weaker axis, so a single reward signal drives both `W_inter` and `W_intra`.\n\n### Shipped configuration (`configs\u002Fexp_stream_r1.yaml`)\n\n| Knob | Value | Role |\n|---|---|---|\n| `reward_mode` | `BalancedOverall` | Inter-Reliability + Adaptive Reward Balancing |\n| `spatial_reward` \u002F `spatial_reward_pixel_grad` | `true` \u002F `true` | Intra-Perplexity spatial (pixel-level gradient saliency) |\n| `temporal_saliency_weighting` | `true` | Intra-Perplexity temporal (per-frame importance) |\n| `spatial_reward_combination` | `adaptive` | adaptive saliency fusion across VQ\u002FMQ\u002FTA |\n| `spatial_reward_min_weight` | `0.15` | spatial floor (σ\\_min) |\n| `temporal_saliency_min_weight` | `0.2` | temporal floor (τ\\_min) |\n| `full_training_steps` × `gradient_accumulation_steps` | `1000 × 8` | 8000 raw steps on 8 GPUs |\n\n## Requirements\n\n- NVIDIA GPU: ≥24 GB for inference, ≥80 GB for training (8 GPUs recommended).\n- Linux, ≥64 GB RAM.\n\n## Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FFrameX-AI\u002FStream-R1.git\ncd Stream-R1\n\nconda create -n stream_r1 python=3.10\nconda activate stream_r1\n\npip install -r requirements.txt\npip install flash-attn --no-build-isolation\npip install -e .\n```\n\n## Pretrained Checkpoints\n\nRequired for **training** (teacher \u002F reward \u002F init) and **inference** (Stream-R1):\n\n| Model | Download |\n|-------|----------|\n| VideoReward | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FKlingTeam\u002FVideoReward) |\n| Wan2.1-T2V-1.3B | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FWan-AI\u002FWan2.1-T2V-1.3B) |\n| Wan2.1-T2V-14B | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002FWan-AI\u002FWan2.1-T2V-14B) |\n| ODE Initialization | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fgdhe17\u002FSelf-Forcing\u002Fblob\u002Fmain\u002Fcheckpoints\u002Fode_init.pt) |\n| **Stream-R1 (T2V-1.3B)** | **[Hugging Face](https:\u002F\u002Fhuggingface.co\u002FFrameX-AI\u002FStream-R1)** |\n\nAfter downloading:\n```\ncheckpoints\u002F\n├── Videoreward\u002F\n├── Wan2.1-T2V-1.3B\u002F\n├── Wan2.1-T2V-14B\u002F\n├── Stream-R1-T2V-1.3B\u002F\n└── ode_init.pt\n```\n\nOr run the helper:\n```bash\npip install \"huggingface_hub[cli]\"\nbash download_checkpoints.sh\n```\n\n## Inference\n\nPlace the released Stream-R1 weights at `checkpoints\u002FStream-R1-T2V-1.3B\u002Fstream_r1.pt`\n(any filename works — pass it via `--checkpoint_path`). You can also run inference\non a checkpoint produced by your own training run\n(`output\u002F\u003Ctimestamp>_stream_r1\u002Fcheckpoint_model_*\u002Fgenerator.pt`).\n\n```bash\n# 5-second video\npython inference.py \\\n    --num_output_frames 21 \\\n    --config_path configs\u002Fstream_r1.yaml \\\n    --checkpoint_path checkpoints\u002FStream-R1-T2V-1.3B\u002Fstream_r1.pt \\\n    --output_folder videos\u002Fstream_r1-5s \\\n    --data_path prompts\u002FMovieGenVideoBench_extended.txt \\\n    --use_ema\n\n# 30-second video\npython inference.py \\\n    --num_output_frames 120 \\\n    --config_path configs\u002Fstream_r1.yaml \\\n    --checkpoint_path checkpoints\u002FStream-R1-T2V-1.3B\u002Fstream_r1.pt \\\n    --output_folder videos\u002Fstream_r1-30s \\\n    --data_path prompts\u002FMovieGenVideoBench_extended.txt \\\n    --use_ema\n```\n\n## Training\n\n```bash\nbash run_stream_r1.sh\n```\n\nThe launcher reads `configs\u002Fexp_stream_r1.yaml`, runs `train.py` via `torchrun` on 8 GPUs (1000 optimizer steps × grad-accum 8 → 8000 raw steps), then renders 20 evaluation videos from the final checkpoint. Override defaults with environment variables, e.g.:\n\n```bash\nNUM_GPUS=4 CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_stream_r1.sh\n```\n\nManual launch:\n```bash\ntorchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=5235 --rdzv_backend=c10d \\\n    --rdzv_endpoint=$MASTER_PORT train.py \\\n    --config_path configs\u002Fexp_stream_r1.yaml \\\n    --logdir logs\u002Fstream_r1 \\\n    --disable-wandb\n```\n\nFor multi-node training, set `--nnodes`, `--node-rank`, and `--rdzv_endpoint=$MASTER_IP:$MASTER_PORT` accordingly.\n\n## Results\n\n### VBench (5-second, 832×480)\n\n| Model | Params | FPS↑ | Total↑ | Quality↑ | Semantic↑ |\n|-------|:------:|:----:|:------:|:--------:|:---------:|\n| Wan2.1 (multi-step teacher) | 1.3B | 0.78 | 84.26 | **85.30** | 80.09 |\n| LTX-Video | 1.9B | 8.98 | 80.00 | 82.30 | 70.79 |\n| SkyReels-V2 | 1.3B | 0.49 | 82.67 | 84.70 | 74.53 |\n| MAGI-1 | 4.5B | 0.19 | 79.18 | 82.04 | 67.74 |\n| NOVA | 0.6B | 0.88 | 80.12 | 80.39 | 79.05 |\n| Pyramid Flow | 2B | 6.7 | 81.72 | 84.74 | 69.62 |\n| CausVid | 1.3B | 17.0 | 82.88 | 83.93 | 78.69 |\n| Self Forcing | 1.3B | 17.0 | 83.80 | 84.59 | 80.64 |\n| LongLive | 1.3B | 20.7 | 83.22 | 83.68 | 81.37 |\n| Rolling Forcing | 1.3B | 17.5 | 81.22 | 84.08 | 69.78 |\n| Reward Forcing | 1.3B | 23.1 | 84.13 | 84.84 | 81.32 |\n| **Stream-R1 (Ours)** | **1.3B** | **23.1** | **84.40** | \u003Cu>85.14\u003C\u002Fu> | **81.44** |\n\nStream-R1 surpasses its multi-step Wan2.1 teacher on **Total** and **Semantic** while running ~30× faster, demonstrating that reward-guided distillation can push the student beyond the teacher's quality frontier. (Underlined = second best, **bold** = best.)\n\n### VideoReward (per-axis)\n\n| Model | Visual↑ | Dynamic↑ | Text↑ |\n|-------|:-------:|:--------:|:-----:|\n| SkyReels-V2 | 3.30 | 3.05 | 2.70 |\n| CausVid | 4.66 | 3.16 | 3.32 |\n| Self Forcing | 3.89 | 3.44 | 3.11 |\n| LongLive | 4.79 | 3.81 | 3.98 |\n| Reward Forcing | 4.82 | **4.18** | 4.04 |\n| **Stream-R1 (Ours)** | **4.92** | \u003Cu>4.04\u003C\u002Fu> | **4.11** |\n\nProject page and qualitative results: \u003Chttps:\u002F\u002Fstream-r1.github.io\u002F>\n\n## Citation\n\nA BibTeX entry will be added shortly. In the meantime please cite via the arXiv preprint at \u003Chttps:\u002F\u002Farxiv.org\u002Fabs\u002F2605.03849>.\n\n## Acknowledgements\n\nBuilt on [CausVid](https:\u002F\u002Fgithub.com\u002Ftianweiy\u002FCausVid), [Self Forcing](https:\u002F\u002Fgithub.com\u002Fguandeh17\u002FSelf-Forcing), [Wan2.1](https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.1), and [VideoAlign](https:\u002F\u002Fgithub.com\u002FKlingTeam\u002FVideoAlign). Stream-R1 extends the Reward Forcing codebase with the Inter-Reliability \u002F Intra-Perplexity formulation.\n\n## License\n\nSee [LICENSE](LICENSE).\n\n## Contact\n\nFor questions about the project, please open a GitHub issue or reach out to the corresponding author [Mengqi Huang](https:\u002F\u002Fcorleone-huang.github.io\u002F).\n","Stream-R1 是一个用于流式视频生成的可靠性-困惑度感知奖励蒸馏框架。其核心功能在于通过调整分布匹配蒸馏（DMD）目标，依据跨rollout的*Inter-Reliability*和时空区域内的*Intra-Perplexity*两个互补维度重新加权，从而让学习过程更加高效且集中于高质量模式的学习上，而无需改变模型架构或增加推理开销。该技术特别适用于需要快速生成高质量视频内容的场景，如在线直播、虚拟现实等，能够在保持23.1 FPS的帧率下超越现有方法。项目采用Python开发，并遵循Apache License 2.0开源许可协议。","2026-06-11 04:01:37","CREATED_QUERY"]