[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2507":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":14,"stars7d":15,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":16,"starSnapshotCount":16,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},2507,"UniVidX","houyuanchen111\u002FUniVidX","houyuanchen111","[SIGGRAPH 2026 \u002F TOG] Official code of the paper \"UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors\".","",null,"Python",226,9,1,4,0,38,3,false,"main",[],"2026-06-12 02:00:41","## ___***UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors***___\n\n\u003Cdiv align=\"center\">\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fhouyuanchen111' target='_blank'>Houyuan Chen\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002FLuh1124' target='_blank'>Hong Li\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>&emsp;       \n    \u003Ca href='https:\u002F\u002Frefkxh.github.io\u002F' target='_blank'>Xianghao Kong\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>&emsp;   \n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Ftrzhu11' target='_blank'>Tianrui Zhu\u003C\u002Fa>\u003Csup>3\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fguoyww.github.io\u002F' target='_blank'>Yuwei Guo\u003C\u002Fa>\u003Csup>4\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fgithub.com\u002Fzhuxing0' target='_blank'>Weiqing Xiao\u003C\u002Fa>\u003Csup>3\u003C\u002Fsup>\u003Cbr>\n    \u003Ca href='https:\u002F\u002Fhugoycj.github.io\u002F' target='_blank'>Chongjie Ye\u003C\u002Fa>\u003Csup>5\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Flllyasviel.github.io\u002Flvmin_zhang\u002F' target='_blank'>Lvmin Zhang\u003C\u002Fa>\u003Csup>6\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fsites.google.com\u002Fview\u002Ffromandto' target='_blank'>Hao Zhao\u003C\u002Fa>\u003Csup>7\u003C\u002Fsup>&emsp;\n    \u003Ca href='https:\u002F\u002Fanyirao.com\u002F' target='_blank'>Anyi Rao\u003C\u002Fa>\u003Csup>1,*\u003C\u002Fsup>\n\u003C\u002Fdiv>\n\n\u003Cdiv>\n\u003Cdiv align=\"center\">\n    \u003Csup>1\u003C\u002Fsup>MMLab, HKUST, &emsp;\n    \u003Csup>2\u003C\u002Fsup>BUAA&emsp;\n    \u003Csup>3\u003C\u002Fsup>NJU&emsp;\n    \u003Csup>4\u003C\u002Fsup>CUHK&emsp;\n    \u003Csup>5\u003C\u002Fsup>FNii, CUHKSZ&emsp;\n    \u003Csup>6\u003C\u002Fsup>Stanford&emsp;\n    \u003Csup>7\u003C\u002Fsup>AIR,THU\u003Cbr>\n\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2605.00658\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-PDF-b31b1b\" alt=\"Paper\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhouyuanchen111.github.io\u002FUniVidX.github.io\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002Fprs-eth\u002FMarigold\u002Fmain\u002Fdoc\u002Fbadges\u002Fbadge-website.svg\" alt=\"Website\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fhouyuanchen\u002FUniVidX\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗%20Hugging%20Face%20-Model-green\" alt=\"Hugging Face Model\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache--2.0-929292\" alt=\"License\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fteaser.png\" width=\"800px\" alt=\"UniVidX Teaser\">\n  \u003Cbr>\n\u003C\u002Fdiv>\n\n\n## 📖 Overview\n\nWe introduce ***UniVidX***, a unified multimodal video diffusion framework that transcends the boundaries of task-specific models. \nBy incorporating Stochastic Condition Masking (SCM), Decoupled Gated LoRA (DGL), and Cross-Modal Self-Attention (CMSA), a single model can achieve versatile video generation and perception. Whether applied to Intrinsic tasks (**UniVid-Intrinsic**) or Alpha channel processing (**UniVid-Alpha**), our approach achieves outstanding performance with remarkable data efficiency (\u003C1k training videos).\n\n---\n\n## 🚀 News\n- **[2026\u002F05\u002F04]** Initial release of **UniVidX**.\n---\n\n## 🛠️ Installation\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002Fhouyuanchen111\u002FUniVidX.git\ncd UniVidX\n\n# Create environment\nconda create -n unividx python=3.10\nconda activate unividx\n\n# Install dependencies\npip install -r requirements.txt\n```\n\n## 🤖 Model Zoo\n\nYou can download the weights of backbone Wan2.1-T2V-14B from either **ModelScope** or **Hugging Face**.\n\n**Option 1: ModelScope**\n```bash\npip install modelscope\nmkdir -p .\u002Fcheckpoints\u002FWan-AI\nmodelscope download Wan-AI\u002FWan2.1-T2V-14B --local_dir .\u002Fcheckpoints\u002FWan-AI\u002FWan2.1-T2V-14B\n```\n\n**Option 2: Hugging Face**\n```bash\npip install \"huggingface_hub[cli]\"\nmkdir -p .\u002Fmodels\u002FWan-AI\nhuggingface-cli download Wan-AI\u002FWan2.1-T2V-14B --local-dir .\u002Fmodels\u002FWan-AI\u002FWan2.1-T2V-14B\n```\n\nThen, download checkpoints of **UniVid-Intrinsic** and **UniVid-Alpha** manually from Hugging Face or let the scripts auto-download them.\n\n| Model Name | Link |\n| :--- | :--- |\n| UniVid-Intrinsic | [🤗 Download](https:\u002F\u002Fhuggingface.co\u002Fhouyuanchen\u002FUniVidX) |\n| UniVid-Alpha | [🤗 Download](https:\u002F\u002Fhuggingface.co\u002Fhouyuanchen\u002FUniVidX) |\n\n---\n\n## 💻 Inference\nWe use YAML files to centrally manage inference parameters. Below are the configuration templates for **UniVid-Intrinsic** and **UniVid-Alpha**.\n\n#### UniVid-Intrinsic\n\n```yaml\n# configs\u002Funivid_intrinsic_inference.yaml\n\nexperiment_name: \"univid_intrinsic_inference\"   # Output folder name\nmode: \"t2RAIN\"                                   # Task Mode (One of the 15 supported tasks)\n\n# --- Conditional Inputs ---\n# Configure paths based on your chosen 'mode'. Set unused inputs to null.\ninference_rgb_path: null\ninference_albedo_path: null\ninference_irradiance_path: null\ninference_normal_path: null\n\n# --- Text Prompt ---\n# We recommend using Chinese prompts.\nprompt: \"一只小刺猬，穿着白色小围裙，头上戴着厨师帽，正站在小凳子上，双手举着一个小平底锅，锅里冒着热气，表情专注而自豪，位于一个现代化的迷你厨房中，不锈钢台面反射着明亮的光线，各种小厨具整齐地排列着, 镜头从右向左移动。\"\n\n# --- Model Settings ---\nmodel:\n  name: 'UniVidIntrinsic' \n  params:\n    # Path to Wan2.1 Backbone\n    model_paths: '[\"models\u002FWan-AI\u002FWan2.1-T2V-14B\u002Fmodels_t5_umt5-xxl-enc-bf16.pth\", \"models\u002FWan-AI\u002FWan2.1-T2V-14B\u002FWan2.1_VAE.pth\"]'\n    resume_from_checkpoint: \"checkpoints\u002Funivid_intrinsic.safetensors\"\n    \n    # LoRA Configuration\n    lora_base_model: \"dit\"\n    lora_target_modules: \"self_attn.q,self_attn.k,self_attn.v,self_attn.o,ffn.0,ffn.2\"\n    lora_rank: 32\n    lora_modalities: [\"rgb\", \"albedo\", \"irradiance\", \"normal\"]\n```\n\n#### UniVid-Alpha\n\n```yaml\n# configs\u002Funivid_alpha_inference.yaml\n\nexperiment_name: \"univid_alpha_inference\"   # Output folder name\nmode: \"R2PFB\"                                # Task Mode (One of the 15 supported tasks)\n\n\n# --- Conditional Inputs ---\n# Configure paths based on your chosen 'mode'. Set unused inputs to null.\ninference_rgb_path: \".\u002Fassets\u002FR2PFB\u002Fbl.mp4\"\ninference_pha_path: null\ninference_fgr_path: null\ninference_bgr_path: null\n\n# --- Text Prompt ---\nprompt: \"\"\n\n# --- Model Settings ---\nmodel:\n  name: 'UniVidAlpha' \n  params:\n    # Path to Wan2.1 Backbone\n    model_paths: '[\"models\u002FWan-AI\u002FWan2.1-T2V-14B\u002Fmodels_t5_umt5-xxl-enc-bf16.pth\", \"models\u002FWan-AI\u002FWan2.1-T2V-14B\u002FWan2.1_VAE.pth\"]'\n    resume_from_checkpoint: \"checkpoints\u002Funivid_alpha.safetensors\"\n    \n    # LoRA Configuration\n    lora_base_model: \"dit\"\n    lora_target_modules: \"self_attn.q,self_attn.k,self_attn.v,self_attn.o,ffn.0,ffn.2\"\n    lora_rank: 32\n    lora_modalities: [\"com\", \"pha\", \"fgr\", \"bgr\"]\n```\n\nOnce your YAML configuration is ready, run the corresponding inference script:\n```bash\n# univid_alpha_inference\npython scripts\u002Finference_univid_alpha.py --config configs\u002Funivid_alpha_inference.yaml\n\n# univid_intrinsic_inference\npython scripts\u002Finference_univid_intrinsic.py --config configs\u002Funivid_intrinsic_inference.yaml\n```\n\nBelow are the 15 tasks (modes) supported by **UniVid-Intrinsic** and **UniVid-Alpha**, along with their corresponding inputs and outputs:\n\n| Task Category | UniVid-Intrinsic | UniVid-Alpha |\n| :--- | :--- | :--- |\n| **Text $\\to$ X** | `t2RAIN` | `t2RPFB` |\n| **X $\\to$ X** | `R2AIN`, `RA2IN`, `RI2AN`, `RN2AI`, `RIN2A`, `RAN2I`, `RAI2N`, `AIN2R` | `R2PFB`, `RP2FB`, `RF2PB`, `RB2PF`, `FB2RP`, `PFB2R`, `RFB2P`, `RPB2F`, `RPF2B` |\n| **Text & X $\\to$ X** | `A2RIN`, `I2RAN`, `N2RAI`, `AI2RN`, `AN2RI`, `IN2RA` | `P2RFB`, `F2RPB`, `B2RPF`, `PF2RB`, `PB2RF` |\n\n\nDifferent tasks can be combined to enable interesting applications. For example:\n### 1. `t2RAIN` $\\to$ `IN2RA` (Prompt-driven Video Editing)\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd style=\"border: none; vertical-align: top;\">\n      \u003Ctable>\n        \u003Ctr>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_editing\u002Frgb_gen.gif\" width=\"100%\">\u003C\u002Ftd>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_editing\u002Falbedo_gen.gif\" width=\"100%\">\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_editing\u002Firradiance_gen.gif\" width=\"100%\">\u003C\u002Ftd>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_editing\u002Fnormal_gen.gif\" width=\"100%\">\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n          \u003Ctd colspan=\"2\" style=\"text-align: center; font-weight: bold;\">\n             \"一个开放式的现代极简厨房...\"\u003Cbr>t2RAIN\n          \u003C\u002Ftd>\n        \u003C\u002Ftr>\n      \u003C\u002Ftable>\n    \u003C\u002Ftd>\n    \u003Ctd style=\"border: none; vertical-align: middle; font-size: 50px; padding: 0 20px;\">\n      &rarr;\n    \u003C\u002Ftd>\n    \u003Ctd style=\"border: none; vertical-align: top;\">\n      \u003Ctable>\n        \u003Ctr>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_editing\u002Frgb_edit.gif\" width=\"100%\">\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_editing\u002Falbedo_edit.gif\" width=\"100%\">\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n          \u003Ctd style=\"text-align: center; font-weight: bold;\">\n            \"橙子放在不锈钢台上...\"\u003Cbr>IN2RA\n          \u003C\u002Ftd>\n        \u003C\u002Ftr>\n      \u003C\u002Ftable>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### 2. `R2PFB` $\\to$ `PB2RF` (Prompt-driven Video Inpainting)\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd style=\"border: none; vertical-align: top;\">\n      \u003Ctable>\n        \u003Ctr>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_inpainting\u002Finput_bl.gif\" width=\"100%\">\u003C\u002Ftd>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_inpainting\u002Falpha.gif\" width=\"100%\">\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_inpainting\u002Ffg.gif\" width=\"100%\">\u003C\u002Ftd>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_inpainting\u002Fbg.gif\" width=\"100%\">\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n          \u003Ctd colspan=\"2\" style=\"text-align: center; font-weight: bold;\">\n             \"\"\u003Cbr>R2PFB\n          \u003C\u002Ftd>\n        \u003C\u002Ftr>\n      \u003C\u002Ftable>\n    \u003C\u002Ftd>\n    \u003Ctd style=\"border: none; vertical-align: middle; font-size: 50px; padding: 0 20px;\">\n      &rarr;\n    \u003C\u002Ftd>\n    \u003Ctd style=\"border: none; vertical-align: top;\">\n      \u003Ctable>\n        \u003Ctr>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_inpainting\u002Finpaint_bl.gif\" width=\"100%\">\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n          \u003Ctd>\u003Cimg src=\"assets\u002Fvideo_inpainting\u002Finpaint_fg.gif\" width=\"100%\">\u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n          \u003Ctd style=\"text-align: center; font-weight: bold;\">\n            \"带墨镜，穿粉色西装...\"\u003Cbr>PB2RF\n          \u003C\u002Ftd>\n        \u003C\u002Ftr>\n      \u003C\u002Ftable>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\nMore applications await your exploration.\n\n---\n\n## 🏋️ Training\nWe utilize `accelerate` for distributed training. The training configurations are provided in `configs\u002Funivid_intrinsic_train.yaml` and `configs\u002Funivid_alpha_train.yaml`.\n\nRun the following commands to start training:\n\n```bash\n# 1. Train UniVid-Intrinsic\naccelerate launch --config_file \"configs\u002Faccelerate_config.yaml\" \\\n    \"scripts\u002Ftrain.py\" \\\n    --config \"configs\u002Funivid_intrinsic_train.yaml\"\n\n# 2. Train UniVid-Alpha\naccelerate launch --config_file \"configs\u002Faccelerate_config.yaml\" \\\n    \"scripts\u002Ftrain.py\" \\\n    --config \"configs\u002Funivid_alpha_train.yaml\"\n```\n\n\n## 📊 Citation\n\nIf you find this work useful, please cite:\n\n```bibtex\n@article{chen2026unividx,\n  title     = {UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors},\n  author    = {Chen, Houyuan and Li, Hong and Kong, Xianghao and Zhu, Tianrui and Xu, Shaocong and Xiao, Weiqing and Guo, Yuwei and Ye, Chongjie and Zhang, Lvmin and Zhao, Hao and Rao, Anyi},\n  journal   = {ACM Transactions on Graphics},\n  volume    = {45},\n  number    = {4},\n  articleno = {51},\n  year      = {2026},\n  month     = jul,\n  doi       = {10.1145\u002F3811304},\n  url       = {https:\u002F\u002Fdoi.org\u002F10.1145\u002F3811304}\n}\n```\n\n## 📝 Acknowledgements\n\nCode is built on [DiffSynth-Studio](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FDiffSynth-Studio). Thanks all the authors for their excellent contributions!\n\n## 📜 License\n\nThis project is released under the [Apache License 2.0](LICENSE).","UniVidX 是一个统一的多模态视频生成框架，通过扩散先验技术实现多种视频生成任务。该项目集成了随机条件掩码（SCM）、解耦门控LoRA（DGL）和跨模态自注意力机制（CMSA），使得单个模型能够高效地处理多样化的视频生成需求，包括内在任务（如UniVid-Intrinsic）及Alpha通道处理（如UniVid-Alpha）。其在数据效率方面表现尤为突出，仅需少于1000个训练视频即可达到优异性能。该工具特别适用于需要灵活且高质量视频内容生成的应用场景，比如创意设计、影视特效制作等领域。",2,"2026-06-11 02:50:08","CREATED_QUERY"]