[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71008":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},71008,"RobustVideoMatting","PeterL1n\u002FRobustVideoMatting","PeterL1n","Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!","https:\u002F\u002Fpeterl1n.github.io\u002FRobustVideoMatting\u002F",null,"Python",9388,1197,134,114,0,4,10,47,12,82.94,"GNU General Public License v3.0",false,"master",true,[27,28,29,30,31],"ai","computer-vision","deep-learning","machine-learning","matting","2026-06-12 04:00:58","# Robust Video Matting (RVM)\n\n![Teaser](\u002Fdocumentation\u002Fimage\u002Fteaser.gif)\n\n\u003Cp align=\"center\">English | \u003Ca href=\"README_zh_Hans.md\">中文\u003C\u002Fa>\u003C\u002Fp>\n\nOfficial repository for the paper [Robust High-Resolution Video Matting with Temporal Guidance](https:\u002F\u002Fpeterl1n.github.io\u002FRobustVideoMatting\u002F). RVM is specifically designed for robust human video matting. Unlike existing neural models that process frames as independent images, RVM uses a recurrent neural network to process videos with temporal memory. RVM can perform matting in real-time on any videos without additional inputs. It achieves **4K 76FPS** and **HD 104FPS** on an Nvidia GTX 1080 Ti GPU. The project was developed at [ByteDance Inc.](https:\u002F\u002Fwww.bytedance.com\u002F)\n\n\u003Cbr>\n\n## News\n\n* [Nov 03 2021] Fixed a bug in [train.py](https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Fcommit\u002F48effc91576a9e0e7a8519f3da687c0d3522045f).\n* [Sep 16 2021] Code is re-released under GPL-3.0 license.\n* [Aug 25 2021] Source code and pretrained models are published.\n* [Jul 27 2021] Paper is accepted by WACV 2022.\n\n\u003Cbr>\n\n## Showreel\nWatch the showreel video ([YouTube](https:\u002F\u002Fyoutu.be\u002FJvzltozpbpk), [Bilibili](https:\u002F\u002Fwww.bilibili.com\u002Fvideo\u002FBV1Z3411B7g7\u002F)) to see the model's performance. \n\n\u003Cp align=\"center\">\n    \u003Ca href=\"https:\u002F\u002Fyoutu.be\u002FJvzltozpbpk\">\n        \u003Cimg src=\"documentation\u002Fimage\u002Fshowreel.gif\">\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\nAll footage in the video are available in [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1VFnWwuu-YXDKG-N6vcjK_nL7YZMFapMU?usp=sharing).\n\n\u003Cbr>\n\n\n## Demo\n* [Webcam Demo](https:\u002F\u002Fpeterl1n.github.io\u002FRobustVideoMatting\u002F#\u002Fdemo): Run the model live in your browser. Visualize recurrent states.\n* [Colab Demo](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F10z-pNKRnVNsp0Lq9tH1J_XPZ7CBC_uHm?usp=sharing): Test our model on your own videos with free GPU. \n\n\u003Cbr>\n\n## Download\n\nWe recommend MobileNetv3 models for most use cases. ResNet50 models are the larger variant with small performance improvements. Our model is available on various inference frameworks. See [inference documentation](documentation\u002Finference.md) for more instructions.\n\n\u003Ctable>\n    \u003Cthead>\n        \u003Ctr>\n            \u003Ctd>Framework\u003C\u002Ftd>\n            \u003Ctd>Download\u003C\u002Ftd>\n            \u003Ctd>Notes\u003C\u002Ftd>\n        \u003C\u002Ftr>\n    \u003C\u002Fthead>\n    \u003Ctbody>\n        \u003Ctr>\n            \u003Ctd>PyTorch\u003C\u002Ftd>\n            \u003Ctd>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3.pth\">rvm_mobilenetv3.pth\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_resnet50.pth\">rvm_resnet50.pth\u003C\u002Fa>\n            \u003C\u002Ftd>\n            \u003Ctd>\n                Official weights for PyTorch. \u003Ca href=\"documentation\u002Finference.md#pytorch\">Doc\u003C\u002Fa>\n            \u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n            \u003Ctd>TorchHub\u003C\u002Ftd>\n            \u003Ctd>\n                Nothing to Download.\n            \u003C\u002Ftd>\n            \u003Ctd>\n                Easiest way to use our model in your PyTorch project. \u003Ca href=\"documentation\u002Finference.md#torchhub\">Doc\u003C\u002Fa>\n            \u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n            \u003Ctd>TorchScript\u003C\u002Ftd>\n            \u003Ctd>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_fp32.torchscript\">rvm_mobilenetv3_fp32.torchscript\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_fp16.torchscript\">rvm_mobilenetv3_fp16.torchscript\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_resnet50_fp32.torchscript\">rvm_resnet50_fp32.torchscript\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_resnet50_fp16.torchscript\">rvm_resnet50_fp16.torchscript\u003C\u002Fa>\n            \u003C\u002Ftd>\n            \u003Ctd>\n                If inference on mobile, consider export int8 quantized models yourself. \u003Ca href=\"documentation\u002Finference.md#torchscript\">Doc\u003C\u002Fa>\n            \u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n            \u003Ctd>ONNX\u003C\u002Ftd>\n            \u003Ctd>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_fp32.onnx\">rvm_mobilenetv3_fp32.onnx\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_fp16.onnx\">rvm_mobilenetv3_fp16.onnx\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_resnet50_fp32.onnx\">rvm_resnet50_fp32.onnx\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_resnet50_fp16.onnx\">rvm_resnet50_fp16.onnx\u003C\u002Fa>\n            \u003C\u002Ftd>\n            \u003Ctd>\n                Tested on ONNX Runtime with CPU and CUDA backends. Provided models use opset 12. \u003Ca href=\"documentation\u002Finference.md#onnx\">Doc\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Ftree\u002Fonnx\">Exporter\u003C\u002Fa>.\n            \u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n            \u003Ctd>TensorFlow\u003C\u002Ftd>\n            \u003Ctd>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_tf.zip\">rvm_mobilenetv3_tf.zip\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_resnet50_tf.zip\">rvm_resnet50_tf.zip\u003C\u002Fa>\n            \u003C\u002Ftd>\n            \u003Ctd>\n                TensorFlow 2 SavedModel. \u003Ca href=\"documentation\u002Finference.md#tensorflow\">Doc\u003C\u002Fa>\n            \u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n            \u003Ctd>TensorFlow.js\u003C\u002Ftd>\n            \u003Ctd>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_tfjs_int8.zip\">rvm_mobilenetv3_tfjs_int8.zip\u003C\u002Fa>\u003Cbr>\n            \u003C\u002Ftd>\n            \u003Ctd>\n                Run the model on the web. \u003Ca href=\"https:\u002F\u002Fpeterl1n.github.io\u002FRobustVideoMatting\u002F#\u002Fdemo\">Demo\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Ftree\u002Ftfjs\">Starter Code\u003C\u002Fa>\n            \u003C\u002Ftd>\n        \u003C\u002Ftr>\n        \u003Ctr>\n            \u003Ctd>CoreML\u003C\u002Ftd>\n            \u003Ctd>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_1280x720_s0.375_fp16.mlmodel\">rvm_mobilenetv3_1280x720_s0.375_fp16.mlmodel\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_1280x720_s0.375_int8.mlmodel\">rvm_mobilenetv3_1280x720_s0.375_int8.mlmodel\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_1920x1080_s0.25_fp16.mlmodel\">rvm_mobilenetv3_1920x1080_s0.25_fp16.mlmodel\u003C\u002Fa>\u003Cbr>\n                \u003Ca  href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Freleases\u002Fdownload\u002Fv1.0.0\u002Frvm_mobilenetv3_1920x1080_s0.25_int8.mlmodel\">rvm_mobilenetv3_1920x1080_s0.25_int8.mlmodel\u003C\u002Fa>\u003Cbr>\n            \u003C\u002Ftd>\n            \u003Ctd>\n                CoreML does not support dynamic resolution. Other resolutions can be exported yourself. Models require iOS 13+. \u003Ccode>s\u003C\u002Fcode> denotes \u003Ccode>downsample_ratio\u003C\u002Fcode>. \u003Ca href=\"documentation\u002Finference.md#coreml\">Doc\u003C\u002Fa>, \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FPeterL1n\u002FRobustVideoMatting\u002Ftree\u002Fcoreml\">Exporter\u003C\u002Fa>\n            \u003C\u002Ftd>\n        \u003C\u002Ftr>\n    \u003C\u002Ftbody>\n\u003C\u002Ftable>\n\nAll models are available in [Google Drive](https:\u002F\u002Fdrive.google.com\u002Fdrive\u002Ffolders\u002F1pBsG-SCTatv-95SnEuxmnvvlRx208VKj?usp=sharing) and [Baidu Pan](https:\u002F\u002Fpan.baidu.com\u002Fs\u002F1puPSxQqgBFOVpW4W7AolkA) (code: gym7).\n\n\u003Cbr>\n\n## PyTorch Example\n\n1. Install dependencies:\n```sh\npip install -r requirements_inference.txt\n```\n\n2. Load the model:\n\n```python\nimport torch\nfrom model import MattingNetwork\n\nmodel = MattingNetwork('mobilenetv3').eval().cuda()  # or \"resnet50\"\nmodel.load_state_dict(torch.load('rvm_mobilenetv3.pth'))\n```\n\n3. To convert videos, we provide a simple conversion API:\n\n```python\nfrom inference import convert_video\n\nconvert_video(\n    model,                           # The model, can be on any device (cpu or cuda).\n    input_source='input.mp4',        # A video file or an image sequence directory.\n    output_type='video',             # Choose \"video\" or \"png_sequence\"\n    output_composition='com.mp4',    # File path if video; directory path if png sequence.\n    output_alpha=\"pha.mp4\",          # [Optional] Output the raw alpha prediction.\n    output_foreground=\"fgr.mp4\",     # [Optional] Output the raw foreground prediction.\n    output_video_mbps=4,             # Output video mbps. Not needed for png sequence.\n    downsample_ratio=None,           # A hyperparameter to adjust or use None for auto.\n    seq_chunk=12,                    # Process n frames at once for better parallelism.\n)\n```\n\n4. Or write your own inference code:\n```python\nfrom torch.utils.data import DataLoader\nfrom torchvision.transforms import ToTensor\nfrom inference_utils import VideoReader, VideoWriter\n\nreader = VideoReader('input.mp4', transform=ToTensor())\nwriter = VideoWriter('output.mp4', frame_rate=30)\n\nbgr = torch.tensor([.47, 1, .6]).view(3, 1, 1).cuda()  # Green background.\nrec = [None] * 4                                       # Initial recurrent states.\ndownsample_ratio = 0.25                                # Adjust based on your video.\n\nwith torch.no_grad():\n    for src in DataLoader(reader):                     # RGB tensor normalized to 0 ~ 1.\n        fgr, pha, *rec = model(src.cuda(), *rec, downsample_ratio)  # Cycle the recurrent states.\n        com = fgr * pha + bgr * (1 - pha)              # Composite to green background. \n        writer.write(com)                              # Write frame.\n```\n\n5. The models and converter API are also available through TorchHub.\n\n```python\n# Load the model.\nmodel = torch.hub.load(\"PeterL1n\u002FRobustVideoMatting\", \"mobilenetv3\") # or \"resnet50\"\n\n# Converter API.\nconvert_video = torch.hub.load(\"PeterL1n\u002FRobustVideoMatting\", \"converter\")\n```\n\nPlease see [inference documentation](documentation\u002Finference.md) for details on `downsample_ratio` hyperparameter, more converter arguments, and more advanced usage.\n\n\u003Cbr>\n\n## Training and Evaluation\n\nPlease refer to the [training documentation](documentation\u002Ftraining.md) to train and evaluate your own model.\n\n\u003Cbr>\n\n## Speed\n\nSpeed is measured with `inference_speed_test.py` for reference.\n\n| GPU            | dType | HD (1920x1080) | 4K (3840x2160) |\n| -------------- | ----- | -------------- |----------------|\n| RTX 3090       | FP16  | 172 FPS        | 154 FPS        |\n| RTX 2060 Super | FP16  | 134 FPS        | 108 FPS        |\n| GTX 1080 Ti    | FP32  | 104 FPS        | 74 FPS         |\n\n* Note 1: HD uses `downsample_ratio=0.25`, 4K uses `downsample_ratio=0.125`. All tests use batch size 1 and frame chunk 1.\n* Note 2: GPUs before Turing architecture does not support FP16 inference, so GTX 1080 Ti uses FP32.\n* Note 3: We only measure tensor throughput. The provided video conversion script in this repo is expected to be much slower, because it does not utilize hardware video encoding\u002Fdecoding and does not have the tensor transfer done on parallel threads. If you are interested in implementing hardware video encoding\u002Fdecoding in Python, please refer to [PyNvCodec](https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FVideoProcessingFramework).\n\n\u003Cbr>  \n\n## Project Members\n* [Shanchuan Lin](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fshanchuanlin\u002F)\n* [Linjie Yang](https:\u002F\u002Fsites.google.com\u002Fsite\u002Flinjieyang89\u002F)\n* [Imran Saleemi](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fimran-saleemi\u002F)\n* [Soumyadip Sengupta](https:\u002F\u002Fhomes.cs.washington.edu\u002F~soumya91\u002F)\n\n\u003Cbr>\n\n## Third-Party Projects\n\n* [NCNN C++ Android](https:\u002F\u002Fgithub.com\u002FFeiGeChuanShu\u002Fncnn_Android_RobustVideoMatting) ([@FeiGeChuanShu](https:\u002F\u002Fgithub.com\u002FFeiGeChuanShu))\n* [lite.ai.toolkit](https:\u002F\u002Fgithub.com\u002FDefTruth\u002FRobustVideoMatting.lite.ai.toolkit) ([@DefTruth](https:\u002F\u002Fgithub.com\u002FDefTruth))\n* [Gradio Web Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fakhaliq\u002FRobust-Video-Matting) ([@AK391](https:\u002F\u002Fgithub.com\u002FAK391))\n* [Unity Engine demo with NatML](https:\u002F\u002Fhub.natml.ai\u002F@natsuite\u002Frobust-video-matting) ([@natsuite](https:\u002F\u002Fgithub.com\u002Fnatsuite))  \n* [MNN C++ Demo](https:\u002F\u002Fgithub.com\u002FDefTruth\u002Flite.ai.toolkit\u002Fblob\u002Fmain\u002Flite\u002Fmnn\u002Fcv\u002Fmnn_rvm.cpp) ([@DefTruth](https:\u002F\u002Fgithub.com\u002FDefTruth))\n* [TNN C++ Demo](https:\u002F\u002Fgithub.com\u002FDefTruth\u002Flite.ai.toolkit\u002Fblob\u002Fmain\u002Flite\u002Ftnn\u002Fcv\u002Ftnn_rvm.cpp) ([@DefTruth](https:\u002F\u002Fgithub.com\u002FDefTruth))\n\n","RobustVideoMatting 是一个用于视频抠图的深度学习项目，支持 PyTorch、TensorFlow、TensorFlow.js、ONNX 和 CoreML 多种框架。该项目的核心功能是通过递归神经网络处理视频中的时间信息，实现高分辨率视频的实时抠图，无需额外输入即可在任何视频上进行操作，并且在 Nvidia GTX 1080 Ti GPU 上可以达到 4K 分辨率 76FPS 和高清 104FPS 的性能。RVM 特别适用于需要高质量人物视频抠图的应用场景，如影视后期制作、虚拟演播室和在线直播等。",2,"2026-06-11 03:35:24","high_star"]