[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72440":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":32,"discoverSource":33},72440,"MimicMotion","Tencent\u002FMimicMotion","Tencent","High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance","https:\u002F\u002Ftencent.github.io\u002FMimicMotion\u002F",null,"Python",2600,236,30,89,0,2,7,15,6,29.12,"Other",false,"main",true,[27,28],"diffusion-models","video-generation","2026-06-12 02:03:03","# MimicMotion [ICML 2025]\n\n\u003Ca href='http:\u002F\u002Ftencent.github.io\u002FMimicMotion'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-Green'>\u003C\u002Fa> \u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.19680'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-red'>\u003C\u002Fa>\n\n\u003Cp align=\"center\">\n\u003Cb>MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance\u003C\u002Fb>\n\u003Cbr\u002F>\n\u003Ci>Yuang Zhang\u003Csup>1,2\u003C\u002Fsup>, Jiaxi Gu\u003Csup>1\u003C\u002Fsup>, Li-Wen Wang\u003Csup>1\u003C\u002Fsup>, Han Wang\u003Csup>1,2\u003C\u002Fsup>, Junqi Cheng\u003Csup>1\u003C\u002Fsup>, Yuefeng Zhu\u003Csup>1\u003C\u002Fsup>, Fangyuan Zou\u003Csup>1\u003C\u002Fsup>\u003C\u002Fi>\n\u003Cbr\u002F>\n[\u003Csup>1\u003C\u002Fsup>Tencent  \u003Csup>2\u003C\u002Fsup>Shanghai Jiao Tong University]\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Ffigures\u002Fpreview_1.gif\" width=\"100\" \u002F>\n  \u003Cimg src=\"assets\u002Ffigures\u002Fpreview_2.gif\" width=\"100\" \u002F>\n  \u003Cimg src=\"assets\u002Ffigures\u002Fpreview_3.gif\" width=\"100\" \u002F>\n  \u003Cimg src=\"assets\u002Ffigures\u002Fpreview_4.gif\" width=\"100\" \u002F>\n  \u003Cimg src=\"assets\u002Ffigures\u002Fpreview_5.gif\" width=\"100\" \u002F>\n  \u003Cimg src=\"assets\u002Ffigures\u002Fpreview_6.gif\" width=\"100\" \u002F>\n  \u003Cbr\u002F>\n  \u003Cspan>Highlights: \u003Cb>rich details\u003C\u002Fb>, \u003Cb> good temporal smoothness\u003C\u002Fb>, and \u003Cb>long video length\u003C\u002Fb>. \u003C\u002Fspan>\n\u003C\u002Fp>\n\n## Overview\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Ffigures\u002Fmodel_structure.png\" alt=\"model architecture\" width=\"640\"\u002F>\n  \u003C\u002Fbr>\n  \u003Ci>An overview of the framework of MimicMotion.\u003C\u002Fi>\n\u003C\u002Fp>\n\nIn recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a controllable video generation framework, dubbed *MimicMotion*, which can generate high-quality videos of arbitrary length with any motion guidance. Comparing with previous methods, our approach has several highlights. Firstly, with confidence-aware pose guidance, temporal smoothness can be achieved so model robustness can be enhanced with large-scale training data. Secondly, regional loss amplification based on pose confidence significantly eases the distortion of image significantly. Lastly, for generating long smooth videos, a progressive latent fusion strategy is proposed. By this means, videos of arbitrary length can be generated with acceptable resource consumption. With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in multiple aspects.\n\n## News\n\n* `[2025-05-03]`: &#x1F389; Our paper is accepted by ICML 2025. Congratulations and many thanks to the co-authors!\n* `[2024-07-08]`: 🔥 [A superior model checkpoint](https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FMimicMotion\u002Fblob\u002Fmain\u002FMimicMotion_1-1.pth) has been released as version 1.1. The maximum number of video frames has now been expanded from 16 to 72, significantly enhancing the video quality!\n* `[2024-07-01]`: Project page, code, technical report and [a basic model checkpoint](https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FMimicMotion\u002Fblob\u002Fmain\u002FMimicMotion_1.pth) are released. A better checkpoint supporting higher quality video generation will be released very soon. Stay tuned!\n\n## Quickstart\n\nFor the initial released version of the model checkpoint, it supports generating videos with a maximum of 72 frames at a 576x1024 resolution. If you encounter insufficient memory issues, you can appropriately reduce the number of frames.\n\n### Environment setup\n\nRecommend python 3+ with torch 2.x are validated with an Nvidia V100 GPU. Follow the command below to install all the dependencies of python:\n\n```\nconda env create -f environment.yaml\nconda activate mimicmotion\n```\n\n### Download weights\nIf you experience connection issues with Hugging Face, you can utilize the mirror endpoint by setting the environment variable: `export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com`.\nPlease download weights manually as follows:\n```\ncd MimicMotions\u002F\nmkdir models\n```\n1. Download DWPose pretrained model: [dwpose](https:\u002F\u002Fhuggingface.co\u002Fyzd-v\u002FDWPose\u002Ftree\u002Fmain)\n    ```\n    mkdir -p models\u002FDWPose\n    wget https:\u002F\u002Fhuggingface.co\u002Fyzd-v\u002FDWPose\u002Fresolve\u002Fmain\u002Fyolox_l.onnx?download=true -O models\u002FDWPose\u002Fyolox_l.onnx\n    wget https:\u002F\u002Fhuggingface.co\u002Fyzd-v\u002FDWPose\u002Fresolve\u002Fmain\u002Fdw-ll_ucoco_384.onnx?download=true -O models\u002FDWPose\u002Fdw-ll_ucoco_384.onnx\n    ```\n2. Download the pre-trained checkpoint of MimicMotion from [Huggingface](https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FMimicMotion)\n    ```\n    wget -P models\u002F https:\u002F\u002Fhuggingface.co\u002Ftencent\u002FMimicMotion\u002Fresolve\u002Fmain\u002FMimicMotion_1-1.pth\n    ```\n3. The SVD model [stabilityai\u002Fstable-video-diffusion-img2vid-xt-1-1](https:\u002F\u002Fhuggingface.co\u002Fstabilityai\u002Fstable-video-diffusion-img2vid-xt-1-1) will be automatically downloaded.\n\nFinally, all the weights should be organized in models as follows\n\n```\nmodels\u002F\n├── DWPose\n│   ├── dw-ll_ucoco_384.onnx\n│   └── yolox_l.onnx\n└── MimicMotion_1-1.pth\n```\n\n### Model inference\n\nA sample configuration for testing is provided as `test.yaml`. You can also easily modify the various configurations according to your needs.\n\n```\npython inference.py --inference_config configs\u002Ftest.yaml\n```\n\nTips: if your GPU memory is limited, try set env `PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256`.\n\n### VRAM requirement and Runtime\n\nFor the 35s demo video, the 72-frame model requires 16GB VRAM (4060ti) and finishes in 20 minutes on a 4090 GPU.\n\nThe minimum VRAM requirement for the 16-frame U-Net model is 8GB; however, the VAE decoder demands 16GB. You have the option to run the VAE decoder on CPU.\n\n## Citation\t\n```bib\n@inproceedings{zhang2025mimicmotion,\n  title={MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance},\n  author={Yuang Zhang and Jiaxi Gu and Li-Wen Wang and Han Wang and Junqi Cheng and Yuefeng Zhu and Fangyuan Zou},\n  booktitle={International Conference on Machine Learning},\n  year={2025}\n}\n```\n","MimicMotion 是一个用于生成高质量人体动作视频的项目，采用置信度感知的姿态引导技术。其核心功能包括通过置信度感知姿态指导来实现时间平滑性，基于姿态置信度的区域损失放大以减少图像失真，以及一种渐进式的潜在融合策略来生成任意长度的流畅视频。这些特点使得 MimicMotion 在控制性和视频质量上优于现有方法。该项目适用于需要高可控性和细节丰富的人体动作视频生成场景，如虚拟现实、动画制作及运动分析等领域。","2026-06-11 03:42:03","high_star"]