[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74270":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":16,"starSnapshotCount":16,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},74270,"SoulX-LiveAct","Soul-AILab\u002FSoulX-LiveAct","Soul-AILab","Official inference code for SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory","",null,"Python",1115,106,47,12,0,4,15,19.09,false,"main",true,[],"2026-06-12 02:03:24","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\".\u002Fassets\u002Flogo.png\" alt=\"LiveAct Logo\" width=\"30%\">\n\n# SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory\n\n[Dingcheng Zhen*\u003Csup>&#9993;\u003C\u002Fsup>](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=jSLx3CcAAAAJ) · [Xu Zheng*](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=Ii1c51QAAAAJ) · [Ruixin Zhang*](https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Ruixin_Zhang5) · [Zhiqi Jiang*](https:\u002F\u002Fopenreview.net\u002Fprofile?id=~Zhiqi_Jiang3)\n\n[Yichao Yan]() · [Ming Tao]() · [Shunshun Yin]()\n\n\u003C\u002Fdiv>\n\n**SoulX-LiveAct** presents a novel framework that enables **lifelike, multimodal-controlled, high-fidelity** human animation video generation for real-time streaming interactions.\n\n(I) We identify diffusion-step-aligned neighbor latents as a key inductive bias for AR diffusion, providing a principled and theoretically grounded **Neighbor Forcing** for step-consistent AR video generation.\n\n(II) We introduce **ConvKV Memory**, a lightweight plug-in compression mechanism that enables constant-memory hour-scale video generation with negligible overhead.\n\n(III) We develop an optimized real-time system that achieves **20 FPS using only two H100\u002FH200 GPUs** with end-end adaptive FP8 precision, sequence parallelism, and operator fusion at 720×416 or 512×512 resolution.\n\n\n\u003Cdiv align=\"center\">\n  \u003Ca href='http:\u002F\u002Farxiv.org\u002Fabs\u002F2603.11746'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTechnical-Report-red'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fsoul-ailab.github.io\u002Fsoulx-liveact\u002F'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-green'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fgithub.com\u002FSoul-AILab\u002FSoulX-LiveAct'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGithub-Home-blue'>\u003C\u002Fa>\n  \u003Ca href='https:\u002F\u002Fhuggingface.co\u002FSoul-AILab\u002FLiveAct'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-yellow'>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\n## 🔥🔥🔥 News\n\n* 📢 Mar 18, 2026: We now support consumer GPUs (e.g., RTX 4090, RTX 5090) with FP8 KV cache and CPU model offloading. In our tests, the 18B model (14B Wan2.1 + 4B audio module) achieves a throughput of 6 FPS on a single RTX 5090.\n* 👋 Mar 16, 2026: We release the inference code and model weights of SoulX-LiveAct.\n\n\n## 🎥 Demo\n\n[\u002F\u002F]: # (**Note:** Due to GitHub limitations, the videos are heavily compressed. Please refer to the [demo page]&#40;https:\u002F\u002Fdemopagedemo.github.io\u002FLiveAct\u002F&#41; for the original results.)\n\n### 👫 Podcast\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd>\u003Cvideo controls playsinline width=\"666\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7d50441c-2a90-48c7-a557-c375936f2b65\">\u003C\u002Fvideo>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n### 🎤 Music & Talk Show\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd>\u003Cvideo controls playsinline width=\"360\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F9fd4fbcf-3e76-48ca-a8e0-2a46da18da5c\">\u003C\u002Fvideo>\u003C\u002Ftd>\n    \u003Ctd>\u003Cvideo controls playsinline width=\"360\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F9ac3ad4b-db6a-470b-9f4f-6ab9d1c8d998\">\u003C\u002Fvideo>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n### 📱 FaceTime\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd>\u003Cvideo controls playsinline width=\"360\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F143bb565-078a-48ba-8daa-f2fb56616189\">\u003C\u002Fvideo>\u003C\u002Ftd>\n    \u003Ctd>\u003Cvideo controls playsinline width=\"360\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F5619381e-bd8c-4aac-a1d6-2a1fdfe9d673\">\u003C\u002Fvideo>\u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\n## 📑 Open-source Plan\n\n  - [x] Release inference code and checkpoints\n  - [x] GUI demo Support\n  - [x] End-end adaptive FP8 precision\n  - [x] Support model offloading for consumer GPUs (e.g., RTX 4090, RTX 5090) to reduce memory usage\n  - [ ] Support FP4 precision for B-series GPUs (e.g., RTX 5090, B100, B200)\n  - [ ] Release training code\n\n## ▶️ Quick Start\n\n### 🛠️ Dependencies and Installation\n\n#### Step 1: Install Basic Dependencies\n\n```bash\nconda create -n liveact python=3.10\nconda activate liveact\npip install -r requirements.txt\nconda install conda-forge::sox -y\n```\n\n#### Step 2: Install SageAttention\nTo enable fp8 attention kernel, you need to install SageAttention:\n* Install SageAttention:\n  ```bash\n  git clone https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention.git\n  cd SageAttention\n  git checkout v2.2.0\n  python setup.py install\n  ```\n\n* (Optional) Install the modified version of SageAttention: \n  To enable SageAttention for QKV's operator fusion, you need to install it by the following command:\n\n  ```bash\n  git clone https:\u002F\u002Fgithub.com\u002FZhiqiJiang\u002FSageAttentionFusion.git\n  cd SageAttentionFusion\n  python setup.py install\n  ```\n\n#### Step 3: Install vllm:\n  To enable fp8 gemm kernel, you need to install vllm:\n  ```bash\n  pip install vllm==0.11.0\n  ```\n\n#### Step 4 Install LightVAE:：\n\n  ```bash\n  git clone https:\u002F\u002Fgithub.com\u002FModelTC\u002FLightX2V\n  cd LightX2V\n  python setup_vae.py install\n  ```\n\n\n### 🤗 Download Checkpoints\n\n### Model Cards\n| ModelName             | Download                                                                                                                       |\n|-----------------------|--------------------------------------------------------------------------------------------------------------------------------| \n| SoulX-LiveAct         | [🤗 Huggingface](https:\u002F\u002Fhuggingface.co\u002FSoul-AILab\u002FLiveAct),   [魔搭 ModelScope](https:\u002F\u002Fmodelscope.cn\u002Fmodels\u002FSoul-AILab\u002FLiveAct) |\n| chinese-wav2vec2-base | [🤗 Huggingface](https:\u002F\u002Fhuggingface.co\u002FTencentGameMate\u002Fchinese-wav2vec2-base)                                                 |\n\n\n### 🔑 Inference\n\n#### Usage of LiveAct\n\n#### 1. Run real-time streaming inference on two H100\u002FH200 GPUs\n\n```bash\nUSE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \\\ntorchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \\\n    generate.py \\\n    --size 416*720 \\\n    --ckpt_dir MODEL_PATH \\\n    --wav2vec_dir chinese-wav2vec2-base \\\n    --fps 20 \\\n    --dura_print \\\n    --input_json examples\u002Fexample.json \\\n    --steam_audio\n```\n\n#### 2. Run with action or emotion editing at real-time streaming performance\n\n```bash\nUSE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \\\ntorchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \\\n    generate.py \\\n    --size 512*512 \\\n    --ckpt_dir MODEL_PATH \\\n    --wav2vec_dir chinese-wav2vec2-base \\\n    --fps 24 \\\n    --input_json examples\u002Fexample_edit.json\n```\n\n#### 3. Run with the best performance settings\n\n```bash\nUSE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \\\ntorchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \\\n    generate.py \\\n    --size 480*832 \\\n    --ckpt_dir MODEL_PATH \\\n    --wav2vec_dir chinese-wav2vec2-base \\\n    --fps 24 \\\n    --input_json examples\u002Fexample.json\n```\n\n#### 4. Run on RTX 4090\u002FRTX 5090 GPUs\n**Note:** FP8 KV cache may slightly affect generation quality.\n```bash\nUSE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \\\npython generate.py \\\n    --size 416*720 \\\n    --ckpt_dir MODEL_PATH \\\n    --wav2vec_dir chinese-wav2vec2-base \\\n    --fps 24 \\\n    --input_json examples\u002Fexample.json \\\n    --fp8_kv_cache \\\n    --block_offload \\\n    --t5_cpu\n```\n\n#### 5. Run with single GPU for Eval\n\n```bash\nUSE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \\\npython generate.py \\\n    --size 480*832 \\\n    --ckpt_dir MODEL_PATH \\\n    --wav2vec_dir chinese-wav2vec2-base \\\n    --fps 24 \\\n    --input_json examples\u002Fexample.json \\\n    --audio_cfg 1.7 \\\n    --t5_cpu\n```\n\n\n### Command Line Arguments\n\n| Argument          | Type  | Required | Default | Description                                                                                   |\n|-------------------|-------|----------|---------|-----------------------------------------------------------------------------------------------|\n| `--size`          | str   | Yes      | -       | The width and height of the generated video.                                                  |\n| `--t5_cpu`        | bool  | No       | false   | Whether to place T5 model on CPU.                                                             |\n| `--offload_cache` | bool  | No       | -       | Whether to place kv cache on CPU.                                                             |\n| `--fps`           | int   | Yes      | -       | The target fps  of the generated video.                                                       |\n| `--audio_cfg`     | float | No       | 1.0     | Classifier free guidance scale for audio control.                                             |\n| `--dura_print`    | bool  | No       | no      | Whether print duration for every block.                                                       |\n| `--input_json`    | str   | Yes      | _       | The condition json file path to generate the video.                                           |\n| `--seed`          | int   | No       | 42      | The seed to use for generating the image or video.                                            |\n| `--steam_audio`   | bool  | No       | false   | Whether inference with steaming audio.                                                        |\n| `--mean_memory`   | bool  | No       | false   | Whether to use the mean memory strategy during inference for further performance improvement. |\n| `--fp8_kv_cache`   | bool  | No       | false   | Whether to store kv cache in FP8 and dequantize to BF16 on use. FP8 KV cache may slightly affect generation quality.|\n| `--block_offload`   | bool  | No       | false   | Whether to offload model blocks to CPU between block forwards.|\n\n\n### 💻 GUI demo\nRun SoulX-LiveAct inference on the GUI demo and evaluate real-time performance.\n\n\u003Cdiv>\n  \u003Cvideo controls playsInline src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7150345d-693f-4250-af07-e94daa6ef6ed\" width=\"50%\">\u003C\u002Fvideo>\n\u003C\u002Fdiv>\n\n**Note:** The first few blocks during the initial run require warm-up. Normal performance will be observed from the second run onward.\n\n#### 1. Run real-time streaming inference on two H100\u002FH200 GPUs\n\n```bash\nUSE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \\\ntorchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535) \\\n  demo.py \\\n  --ckpt_dir MODEL_PATH \\\n  --wav2vec_dir chinese-wav2vec2-base \\\n  --size 416*720 \\\n  --video_save_path .\u002Fgenerated_videos\n```\n\n#### 2. Run on RTX 4090\u002FRTX 5090 GPUs\n```bash\nUSE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \\\ntorchrun --nproc_per_node=1 --master_port=$(shuf -n 1 -i 10000-65535) \\\n  demo.py \\\n  --ckpt_dir MODEL_PATH \\\n  --wav2vec_dir chinese-wav2vec2-base \\\n  --size 416*720 \\\n  --fp8_kv_cache \\\n  --block_offload \\\n  --t5_cpu \\\n  --video_save_path .\u002Fgenerated_videos\n```\n\n## 📚 Citation\n\n```bibtex\n@misc{zhen2026soulxliveacthourscalerealtimehuman,\n      title={SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory}, \n      author={Dingcheng Zhen and Xu Zheng and Ruixin Zhang and Zhiqi Jiang and Yichao Yan and Ming Tao and Shunshun Yin},\n      year={2026},\n      eprint={2603.11746},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2603.11746}, \n}\n```\n## 📮 Contact Us\nIf you are interested in leaving a message to our work, feel free to email dingchengzhen@soulapp.cn.\n\nYou’re welcome to join our WeChat group or Soul group for technical discussions.\n\u003Cp align=\"center\">\n  \u003Cspan style=\"display: inline-block; margin-right: 10px;\">\n    \u003Cimg src=\"assets\u002FQRCode_WX.png\" width=\"200\" alt=\"WeChat Group QR Code\"\u002F>\n  \u003C\u002Fspan>\n  \u003Cspan style=\"display: inline-block;\">\n    \u003Cimg src=\"assets\u002FQRCode_Soul.png\" width=\"300\" alt=\"WeChat QR Code\"\u002F>\n  \u003C\u002Fspan>\n\u003C\u002Fp>","SoulX-LiveAct 是一个用于实时生成高保真人类动画视频的框架，特别适用于长时间实时流媒体互动。该项目的核心功能包括通过邻域强制和卷积键值记忆技术实现步长一致的自回归视频生成，并支持在仅使用两块H100\u002FH200 GPU的情况下达到20帧每秒的处理速度。此外，它还优化了系统以适应消费级GPU（如RTX 4090, RTX 5090），并利用FP8 KV缓存及CPU模型卸载技术进一步提升性能。此项目非常适合需要高质量、低延迟人物动画的应用场景，例如在线播客、音乐表演、脱口秀等多媒体内容创作领域。",2,"2026-06-11 03:49:44","high_star"]