[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-85177":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":16,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":21,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":23,"readmeContent":24,"aiSummary":10,"trendingCount":16,"starSnapshotCount":16,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},85177,"SwiftVR","H-oliday\u002FSwiftVR","H-oliday","Repo for SwiftVR: Real-Time One-Step Generative Video Restoration","",null,"Python",57,8,52,4,0,38.86,"Apache License 2.0",false,"main",true,[],"2026-06-15 10:05:16","\u003Ch1 align=\"center\">SwiftVR: Real-Time One-Step Generative Video Restoration\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\u003Cimg src=\"assets\u002Fteaser.avif\" width=\"100%\" alt=\"SwiftVR teaser\">\u003C\u002Fp>\n\n> **SwiftVR** is the first generative video restoration model to reach **real-time 1080p streaming on a consumer-grade GPU** (≈26 FPS on a single RTX 5090), sustains **31 FPS at QHD (2560×1440)** and **14 FPS at 4K (3840×2160)** on a single H100, and streams at resolutions where every compared diffusion-based VR baseline runs out of memory.\n\n\u003Cp>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2606.09516\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2606.09516-b31b1b.svg?style=flat-square\" alt=\"arXiv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fh-oliday.github.io\u002FSwiftVR\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-1f8acb.svg?style=flat-square\" alt=\"Project Page\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FH-oliday\u002FSwiftVR\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20HuggingFace-Model-ffce00.svg?style=flat-square\" alt=\"HuggingFace\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FH-oliday\u002FSwiftVR\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-green.svg?style=flat-square\" alt=\"License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n## Updates\n\n* [2026\u002F06] Release the inference code and pretrained weights 🎉\n\n## ✨ Highlights\n\n* **Mask-free shifted-window self-attention (MFSWA).** Each spatial window is **pre-gathered into a dense tensor**, so every attention call reduces to a single standard scaled-dot-product (SDPA) call — *no attention mask, cyclic shift, or padding ever enters the graph*. This gives a **1.62× throughput gain over its full-attention teacher** at essentially identical quality, with **no dedicated sparse kernel**.\n* **Restoration-aware Autoencoder (ReAE).** A lightweight encoder–decoder jointly fine-tuned with the DiT in pixel space removes the heavy-3D-VAE \u002F tiled-decoding bottleneck.\n* **Causal chunk-wise streaming.** A minimal causal protocol (no rolling KV cache, no overlapped DiT inference) bounds the temporal axis, confining the residual (\\mathcal{O}(N^2)) cost to the spatial axes.\n\n\n## 📊 Results\n\n### Efficiency at 2560×1440\n\nSingle H100, causal streaming, 24 frames.\n\n| Metric | DOVE (tile) | SeedVR2-3B (tile)| FlashVSR-Tiny | **SwiftVR (Ours)** |\n|---|:---:|:---:|:---:|:---:|\n| Avg. Time (s) ↓ | 27.615 | 17.320 | 2.493 | 0.766 |\n| FPS ↑ | 0.85 | 1.39 | 9.61 | 31.32 |\n| Peak Mem. (GB) ↓ | 59.24 | 35.35 | 34.35 | 38.01 |\n\n> At **3840×2160**, every compared diffusion-based VR baseline **OOMs** on a single H100; SwiftVR sustains **14 FPS**.\n\n### Qualitative comparison\n\n\u003Cimg src=\"assets\u002Fqualitative.png\" width=\"100%\" alt=\"SwiftVR qualitative comparison\">\n\n## 🛠 Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FH-oliday\u002FSwiftVR.git\ncd SwiftVR\n\nconda create -n swiftvr python=3.10 -y\nconda activate swiftvr\n\n# Install PyTorch matching your CUDA toolkit first, e.g. CUDA 12.4:\npip install torch==2.10.0 torchvision==0.25.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n\n# Install SwiftVR (editable) and its dependencies:\npip install -e .\n```\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Hardware notes\u003C\u002Fb>\u003C\u002Fsummary>\n\n* **Server:** single H100-80G reproduces the QHD\u002F4K numbers above.\n* **Consumer:** single RTX 5090 reaches ≈26 FPS at 1080p with the *same checkpoint* (default PyTorch SDPA path, bfloat16, causal chunk protocol).\n* No hardware-specific retraining or kernel rewrite is required on any platform.\n\n\u003C\u002Fdetails>\n\n## 🗂 Model Zoo\n\n| Model Name | Date    | Backbone       | Link                                                  |\n| ---------- | ------- | -------------- | ----------------------------------------------------- |\n| SwiftVR        | 2026.06 | Wan2.2-TI2V-5B | [🤗 HuggingFace](https:\u002F\u002Fhuggingface.co\u002FH-oliday\u002FSwiftVR) |\n\n```bash\nhuggingface-cli download H-oliday\u002FSwiftVR --local-dir checkpoints\u002F\n```\n\nExpected checkpoint layout, where `checkpoints\u002F` is the directory passed to `from_pretrained`:\n\n```text\ncheckpoints\u002F\n├── reae.safetensors             # Restoration-aware Autoencoder weights\n├── prompt_embedding.safetensors # precomputed empty-prompt text embedding, key: \"prompt_emb\"\n└── transformer\u002F                 # diffusers-format DiT\n    ├── config.json\n    └── diffusion_pytorch_model.safetensors\n```\n\n## 🚀 Quick Start\n\n### Python API\n\n```python\nfrom swiftvr import SwiftVRPipeline\n\npipe = SwiftVRPipeline.from_pretrained(\"checkpoints\u002F\").to(\"cuda\", dtype=\"bfloat16\")\n\npipe.restore_video(\"low_quality.mp4\", \"restored.mp4\", upscale=4)\n```\n\n`restore_video` also accepts an image folder as input and can write a PNG sequence with `png_save=True`.\n\nTunable knobs include:\n\n* `clip_len`: middle chunk size, multiple of 4\n* `dit_overlap`: overlap for DiT inference\n* `fps`: output video frame rate\n* `quality`: 0–100, mapped to x265 CRF\n* `queue_size`: pipeline queue size\n\n### Streaming\n\nCausal, chunk-by-chunk restoration without future frames.\n\n```python\nsession = pipe.stream(clip_len=24, resolution=(1920, 1080))\n\nfor lq_chunk in read_chunks(\"low_quality.mp4\", n=24):   # lq_chunk: [T, H, W, 3] uint8\n    hq = session.step(lq_chunk)                         # [1, T', 3, H', W'] in [0, 1], or None if buffered\n    if hq is not None:\n        write(hq)\n\ntail = session.flush()                                  # flush the final buffered frames\n```\n\n### Command line\n\n```bash\npython scripts\u002Finference.py \\\n  --input low_quality.mp4 \\\n  --output restored.mp4 \\\n  --checkpoint checkpoints\u002F \\\n  --upscale 4 \\\n  --clip-len 24 \\\n  --dtype bfloat16\n```\n\nUse `--png` to write a PNG sequence.\n\n## 📁 Repository Structure\n\n```text\nSwiftVR\u002F\n├── README.md\n├── LICENSE\n├── requirements.txt\n├── setup.py\n├── scripts\u002F\n│   └── inference.py              # CLI entry point, thin wrapper over SwiftVRPipeline\n└── swiftvr\u002F\n    ├── __init__.py               # exports SwiftVRPipeline\n    ├── pipeline.py               # SwiftVRPipeline: from_pretrained \u002F to \u002F restore_video \u002F stream\n    ├── runner.py                 # four-stage pipelined runner: reader → H2D → GPU → writer\n    ├── io.py                     # frame reading, GPU preprocessing, mp4 \u002F PNG writing\n    ├── models\u002F\n    │   ├── reae.py               # Restoration-aware Autoencoder\n    │   └── transformer.py        # DiT + mask-free shifted-window self-attention\n    └── streaming\u002F\n        ├── chunk.py              # fixed-size causal chunk protocol\n        ├── tae.py                # streaming autoencoder with causal boundary state\n        └── dit.py                # one-step streaming DiT with fixed timestep and RoPE offsets\n```\n\n## 🙏 Acknowledgements\n\nSwiftVR builds on [Wan2.2-TI2V-5B](https:\u002F\u002Fgithub.com\u002FWan-Video), the lightweight autoencoder [TAEHV](https:\u002F\u002Fgithub.com\u002Fmadebyollin\u002Ftaehv), and the [RealBasicVSR](https:\u002F\u002Fgithub.com\u002Fckkelvinchan\u002FRealBasicVSR) degradation pipeline.\n\nWe thank the authors of [DOVE](https:\u002F\u002Fgithub.com\u002Fzhengchen1999\u002FDOVE), [SeedVR2](https:\u002F\u002Fgithub.com\u002FByteDance-Seed\u002FSeedVR), and [FlashVSR](https:\u002F\u002Fgithub.com\u002FOpenImagingLab\u002FFlashVSR) for releasing strong baselines, and the [UltraVideo](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FAPRIL-AIGC\u002FUltraVideo) team for the training corpus.\n\n## 📜 License\n\nSwiftVR is released under the **Apache License 2.0**.\n\nCopyright 2026 SwiftVR Authors.\n\nLicensed under the Apache License, Version 2.0. You may obtain a copy of the License at:\n\nhttps:\u002F\u002Fwww.apache.org\u002Flicenses\u002FLICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, this project is distributed on an **\"AS IS\" BASIS**, without warranties or conditions of any kind, either express or implied. See the [LICENSE](.\u002FLICENSE) file for the full license text.\n\n\n\n## Contact\n\nIf you have any questions, feel free to reach out:\n\n* Email: [kakibluee@gmail.com](mailto:kakibluee@gmail.com)\n\n\u003Cdiv align=\"center\">\n\u003Csub>If SwiftVR is useful to your research or product, please consider giving it a ⭐.\u003C\u002Fsub>\n\u003C\u002Fdiv>\n",2,"2026-06-15 02:30:14","CREATED_QUERY"]