[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-76057":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":14,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":16,"starSnapshotCount":16,"syncStatus":14,"lastSyncTime":27,"discoverSource":28},76057,"Warp-as-History","yyfz\u002FWarp-as-History","yyfz","Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video","https:\u002F\u002Fyyfz.github.io\u002Fwarp-as-history\u002F",null,"Python",212,9,2,1,0,4,75,52.5,"Apache License 2.0",false,"main",[],"2026-06-12 04:01:20","\u003Cdiv align=\"center\">\n\u003Ch1>\n  Warp-as-History:\n  Generalizable Camera-Controlled Video Generation \n  from \u003Cstrong>One\u003C\u002Fstrong> Training Video\n\u003C\u002Fh1>\n\u003Cp class=\"eyebrow\">Video History is More Than Context.\u003C\u002Fp>\n\u003Cp class=\"authors\">\u003Ca href=\"https:\u002F\u002Fyyfz.github.io\u002F\">Yifan Wang\u003C\u002Fa>\u003Csup>1,2\u003C\u002Fsup> and \u003Ca href=\"https:\u002F\u002Ftonghe90.github.io\u002F\">Tong He\u003C\u002Fa>\u003Csup>2,3\u003C\u002Fsup>\u003C\u002Fp>\n\u003Cp class=\"affiliations\">\n  \u003Cspan>\u003Csup>1\u003C\u002Fsup> Shanghai Jiao Tong University\u003C\u002Fspan>\n  \u003Cspan>\u003Csup>2\u003C\u002Fsup> Shanghai AI Laboratory\u003C\u002Fspan>\n  \u003Cspan>\u003Csup>3\u003C\u002Fsup> Shanghai Innovation Institute\u003C\u002Fspan>\n\u003C\u002Fp>\n\u003Cimg src=\"assets\u002Fgithub_teaser.jpg\" alt=\"Warp-as-History teaser\" width=\"100%\">\n\u003Cp>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15182\">\n    \u003Cimg src=\"assets\u002Fpaper_button.svg\" alt=\"Paper\" height=\"44\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fyyfz.github.io\u002Fwarp-as-history\">\n    \u003Cimg src=\"assets\u002Fdemo_button.svg\" alt=\"See More Demo\" height=\"44\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\nThis repository provides the official implementation of Warp-as-History. Our method enables interactive camera trajectory following and viewpoint manipulation, similar to HappyOyster and Genie 3, using only a single camera-annotated training example.\n\u003C\u002Fdiv>\n\n\n## Installation\n\n```bash\ngit clone --recurse-submodules https:\u002F\u002Fgithub.com\u002Fyyfz\u002FWarp-as-History.git\ncd Warp-as-History\n\nconda create -n warp-as-history python=3.10 -y\nconda activate warp-as-history\npython -m pip install --upgrade pip setuptools wheel\n```\n\nInstall PyTorch for your own CUDA\u002Fdriver setup. For example, CUDA 12.4:\n\n```bash\npip install torch==2.5.1 torchvision==0.20.1 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\n```\n\nThen install the project dependencies:\n\n```bash\npip install -r requirements.txt\npip install -e .\npip install -e third_party\u002FPi3\n```\n\n`third_party\u002FPi3` is a git submodule. If you cloned without submodules, run\n`git submodule update --init --recursive`.\n\n`xformers` and `flash-attn` are optional. The default code path uses PyTorch\nnative attention. In our CUDA 12.4 \u002F PyTorch 2.5.1 setup, this FlashAttention\nversion works:\n\n```bash\npip install \"flash-attn==2.7.4.post1\" --no-build-isolation\n```\n\nFor other CUDA\u002FPyTorch setups, install a `flash-attn` version compatible with\nyour environment.\n\n## Models\n\n- Helios-Distilled (default): [`BestWishYsh\u002FHelios-Distilled`](https:\u002F\u002Fhuggingface.co\u002FBestWishYsh\u002FHelios-Distilled\u002Ftree\u002Fmain)\n- Pi3X: [`yyfz233\u002FPi3X`](https:\u002F\u002Fhuggingface.co\u002Fyyfz233\u002FPi3X)\n- Warp-as-History LoRA (default): [`yyfz233\u002Fwarp-as-history`](https:\u002F\u002Fhuggingface.co\u002Fyyfz233\u002Fwarp-as-history)\n- Helios-Mid (optional, training only): [`BestWishYsh\u002FHelios-Mid`](https:\u002F\u002Fhuggingface.co\u002FBestWishYsh\u002FHelios-Mid)\n\nDownload the required models once before inference or training:\n\n```bash\nhuggingface-cli download BestWishYsh\u002FHelios-Distilled \\\n  --local-dir checkpoints\u002Fhelios-distilled\n\nhuggingface-cli download yyfz233\u002FPi3X model.safetensors \\\n  --local-dir checkpoints\u002Fpi3x\n\nhuggingface-cli download yyfz233\u002Fwarp-as-history visible_lora_state_step1000.safetensors \\\n  --local-dir checkpoints\u002Fwarp-as-history\n\n# only for training\nhuggingface-cli download BestWishYsh\u002FHelios-Mid \\\n  --local-dir checkpoints\u002Fhelios-mid\n```\n\nModel check:\n\n```bash\npython scripts\u002Fcheck_models.py\n```\n\nMissing Helios-Mid is reported as a warning unless you plan to train with it.\n\n## Inference\n\nThe demo CSV files under `data\u002Fdemo` contain one input image path, prompt, and\neither `camera_poses_path` or a pre-rendered `warp_video_path`. Run a minimal\nend-to-end inference with:\n\n```bash\npython scripts\u002Finfer_warp_as_history.py data\u002Fdemo\u002Fangel.csv \\\n  --output runs\u002Fangel.mp4\n```\n\nBy default, inference loads\n`checkpoints\u002Fwarp-as-history\u002Fvisible_lora_state_step1000.safetensors`. Pass\n`--no_lora` only for ablations.\n\nPass `--warp_debug_dir runs\u002Fangel_warp_debug` to also save the warp\nconditioning video as `runs\u002Fangel_warp_debug\u002Fwarp.mp4`.\n\nEach demo CSV has these columns:\n\n```csv\nfirst_frame_path,prompt,camera_poses_path,warp_video_path,warp_visibility_mask_path\n```\n\n`camera_poses_path` should point to an `.npz` file whose `camera_poses` entry\ncontains OpenCV `c2w` poses with shape `[T, 4, 4]`.\n\nWhen both `warp_video_path` and `camera_poses_path` are provided, inference uses\nthe pre-rendered warp video. Without `--output`, the script writes\n`runs\u002F\u003Ccsv_stem>.mp4`. By default it uses the warp video frame count, or all\nframes in `camera_poses.npz`; pass `--num_frames 33` only when you want a short\nsmoke test.\n\n```python\nfrom warp_as_history import WarpAsHistoryPipeline\n\npipe = WarpAsHistoryPipeline.from_pretrained(\n    \"checkpoints\u002Fhelios-distilled\",\n).to(\"cuda\")\n\nvideo = pipe(\n    prompt=\"a car driving through a roundabout\",\n    image=first_frame,\n    camera_poses=camera_poses,\n    camera_control_translation_scale=0.1,\n)\n```\n\n`camera_control_translation_scale` controls the online warp translation scale\nand defaults to `0.1`. Warp-as-History conditioning loads the default LoRA\nfrom `checkpoints\u002Fwarp-as-history\u002Fvisible_lora_state_step1000.safetensors`\nunless you pass `lora_path=None` or another disabled value such as `\"off\"`.\n\nIf neither `camera_poses` nor `warp_video` is provided,\n`WarpAsHistoryPipeline` falls back to the original Helios pipeline. This path\ndoes not load or apply Warp-as-History LoRA weights, prompt triggers, warp\nlatents, or visible-token masking:\n\n```python\nvideo = pipe(\n    prompt=\"a car driving through a roundabout\",\n    image=first_frame,\n    num_frames=33,\n)\n```\n\nPassing an explicit `lora_path` without `camera_poses` or `warp_video` raises\nan error, because WAH LoRA weights are only defined for Warp-as-History\nconditioning. Original Helios keyword arguments, such as `guidance_scale` and\n`num_inference_steps`, are passed through on this fallback path.\n\nTo save the warp conditioning used by a Warp-as-History run, pass\n`warp_debug_dir`. The pipeline writes only `warp.mp4` under that directory:\n\n```python\nvideo = pipe(\n    prompt=prompt,\n    image=first_frame,\n    camera_poses=camera_poses,\n    warp_debug_dir=\"runs\u002Fangel_warp_debug\",\n)\n```\n\nUse `return_warp_debug=True` when you also want the returned object to include\nthe CPU `warp_video` tensor. Warp debug is only available when `camera_poses` or\n`warp_video` is provided.\n\nFor online\u002Fautoregressive generation, initialize a state once and feed one\ncamera or warp chunk at a time:\n\n```python\nstate = pipe.init_autoregressive_state(\n    prompt=prompt,\n    image=first_frame,\n    conditioning_type=\"camera\",\n    num_frames=99,\n    height=384,\n    width=640,\n    generator=generator,\n)\n\nwindow = state[\"window_num_frames\"]  # 33 with the default WAH recipe\nfor chunk_index in range(state[\"num_warp_chunks\"]):\n    start = chunk_index * window\n    camera_chunk = camera_poses[start : start + window]\n    chunk_video, state = pipe.generate_next_chunk(\n        state,\n        camera_poses=camera_chunk,\n    )\n\nvideo = pipe.finalize_autoregressive_state(state)\n```\n\n`generate_next_chunk` returns the newly finalized video frames plus the next\nstate. For camera control, the first chunk should provide `window` poses. Later\nchunks may either provide `window` new poses, in which case the pipeline\nprepends the cached previous boundary pose, or provide `window + 1` poses\nincluding that boundary pose explicitly. For pre-rendered warp conditioning,\ninitialize with `conditioning_type=\"warp\"` and pass exactly `window` warp frames\nper call via `warp_video` and optionally `warp_visibility_mask`.\n\nAn interactive browser UI is available for prompt-and-button camera control:\n\n```bash\npython scripts\u002Fweb_control.py \\\n  --host 0.0.0.0 \\\n  --port 7860\n```\n\nOpen the printed URL, upload a first frame, enter a prompt, select translation\nand rotation buttons, then click Generate. The server keeps the autoregressive\nstate alive between Generate clicks. Generated mp4 files are written under `runs\u002Fweb_control` by default. \n\n\u003Ca href=\"assets\u002Fwebcontrol_demo.mp4\">\n  \u003Cimg src=\"assets\u002Fwebcontrol_demo.gif\" alt=\"WebControl demo\" width=\"100%\">\n\u003C\u002Fa>\n\n## Training\n\nPreview sampled training batches:\n\n```bash\npython scripts\u002Fdryrun_online_warp_batch.py\n```\n\nTrain:\n\n```bash\npython scripts\u002Ftrain_warp_as_history_lora.py \\\n  --prompt_csv data\u002Ftraining\u002Ftraining_data.csv \\\n  --data_root data\u002Ftraining \\\n  --output_dir runs\u002Fwarp_as_history_lora \\\n  --max_steps 1000 \\\n  --save_every 1000 \\\n  --log_every 10 \\\n  --overwrite\n```\n\nThe training script writes `train_config.json`, `train_loss.json`,\n`visible_lora_state.pt`, and step checkpoints when `--save_every` is enabled.\n\n## GPU memory\n\nThe numbers below were measured on a clean single GPU with Helios-Distilled,\nBF16, `384x640`, 33 frames, and no CPU\u002Foffload mode unless noted.\n\n| Run | Peak VRAM |\n| --- | ---: |\n| Original Helios I2V | 46.1 GB |\n| Warp-as-History with pre-rendered `warp_video_path` | 46.1 GB |\n| Warp-as-History with online `camera_poses_path` | 53.6 GB |\n| Helios-Mid LoRA training, 1 step | 48.7 GB |\n\nPre-rendered warp inference has essentially the same memory footprint as the\noriginal Helios pipeline. Online camera inference is higher because Pi3X and\nthe camera-warp renderer stay resident together with Helios. Helios' low-VRAM\ngroup-offloading mode is a different configuration and is not included in this\ntable.\n\n## Citation\n\nIf you find this work useful, please cite:\n\n```bibtex\n@misc{wang2026warpashistorygeneralizablecameracontrolledvideo,\n      title={Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video}, \n      author={Yifan Wang and Tong He},\n      year={2026},\n      eprint={2605.15182},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.15182}, \n}\n```\n\n## Acknowledgements\n\nWe sincerely thank the authors of\n[Helios](https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FHelios) for releasing such an\nexcellent open-source video generation model. Warp-as-History is built directly\non top of Helios, and this work would not be possible without their model,\ncodebase, and open research contribution.\n\n## License\n\n- Helios code and weights follow the upstream Helios license:\n  https:\u002F\u002Fgithub.com\u002FPKU-YuanGroup\u002FHelios\n- Pi3X code and weights follow the upstream Pi3 license:\n  https:\u002F\u002Fgithub.com\u002Fyyfz\u002FPi3\n- Warp-as-History code authored in this repository is licensed under\n  Apache-2.0; see [LICENSE](LICENSE).\n- LoRA weights are released under CC BY-NC 4.0 and are strictly\n  non-commercial.\n- Some training\u002Finference examples are derived from one publicly available video\n  sequence from the DAVIS Challenge dataset. The original DAVIS data is not\n  covered by this repository license and should be obtained from the official\n  DAVIS website: https:\u002F\u002Fdavischallenge.org\u002F. Please follow the DAVIS dataset\n  terms and cite the corresponding DAVIS papers when using DAVIS-derived data.\n","Warp-as-History 是一个基于单个训练视频生成可控视角视频的项目。其核心功能在于通过交互式相机轨迹跟随和视角操作，实现从单一注释视频中生成新的视频内容。该项目使用 Python 编写，并支持 PyTorch 框架，利用了 FlashAttention 等技术以提高性能。适合于需要根据已有视频素材创建新视角或动态路径的应用场景，如虚拟现实、增强现实以及视频编辑等领域。","2026-06-11 03:54:19","CREATED_QUERY"]