[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72478":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":13,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":13,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},72478,"ttt-video-dit","test-time-training\u002Fttt-video-dit","test-time-training","Official PyTorch implementation of One-Minute Video Generation with Test-Time Training","https:\u002F\u002Ftest-time-training.github.io\u002Fvideo-dit",null,"Python",2425,8,26,0,5,6,15,68.66,"MIT License",false,"main",true,[],"2026-06-12 04:01:06","# TTT-Video\n\u003Cimg src=\".\u002Fdocs\u002Ffigures\u002Fhero.png\" alt=\"Hero\" style=\"width:100%;\"\u002F>\nTTT-Video is a repository for finetuning diffusion transformers for style transfer and context extension. We use Test-Time Training (TTT) layers to handle long-range relationships across the global context, while reusing the original pretrained model's attention layers for local attention on each three second segment.  \u003Cbr> \u003Cbr>\nIn this repository, we include training and inference code for 63 second video generation. We finetune our model first at the original pretrained 3 second video length for style transfer and incorporating TTT layers. Then, we train in stages at video lengths of 9 sec, 18 sec, 30 sec, and 63 sec for context extension.\n\n## Model Architecture\n![Architecture](.\u002Fdocs\u002Ffigures\u002Fintegration.jpg)\n\nOur architecture adapts the [CogVideoX](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCogVideo) 5B model, a diffusion transformer for text-to-video generation, by incorporating TTT layers. The original pretrained attention layers are retained for local attention on each 3-second segment and its corresponding text. In addition, TTT layers are inserted to process the global sequence as well as its reversed version, with their outputs gated through a residual connection.\n\nTo extend context beyond the pretrained 3-second segments, each segment is interleaved with text and video embeddings.\n\nFor a more detailed explanation of our architecture, please refer to our [paper](https:\u002F\u002Ftest-time-training.github.io\u002Fvideo-dit\u002Fassets\u002Fttt_cvpr_2025.pdf).\n\n\n## Setup\n\n### Dependencies\nYou can install dependencies needed for this project with conda (recommended) or a virtual environment.\n\n#### Conda\n```bash\nconda env create -f environment.yaml\nconda activate ttt-video\n```\n\n#### Pip\n```bash\npip install -e .\n```\n\n### Kernel Installation\nAfter installing the dependencies, you must install the TTT-MLP kernel.\n\n```bash\ngit submodule update --init --recursive\n(cd ttt-tk && python setup.py install)\n```\n\n> **Note**: You must have cuda toolkit (12.3+) and gcc11+ installed to build the TTT-MLP kernel. We only support training on H100s for TTT-MLP. You can install cuda toolkit [here](https:\u002F\u002Fanaconda.org\u002Fnvidia\u002Fcuda-toolkit).\n\n### CogVideoX Pretrained Model\nPlease follow the instructions [here](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FCogVideo\u002Fblob\u002Fmain\u002Fsat\u002FREADME.md) to get the VAE and T5 encoder. To get the pretrained weights, download the `diffusion_pytorch_model-00001-of-00002.safetensors` and `diffusion_pytorch_model-00002-of-00002.safetensors` files from [HuggingFace](https:\u002F\u002Fhuggingface.co\u002FTHUDM\u002FCogVideoX-5b\u002Ftree\u002Fmain\u002Ftransformer).\n\n> **Note**: We only use the 5B weights, not the 2B weights. Make sure you are downloading the correct model.\n\n> **Note**: These are the pretrained model weights, not the final model weights used to generate our demonstration videos.\n\n## Other Documentation\n- [Dataset](.\u002Fdocs\u002Fdataset.md)\n- [Training](.\u002Fdocs\u002Ftraining.md)\n- [Sampling](.\u002Fdocs\u002Fsampling.md)\n","TTT-Video 是一个基于 PyTorch 的项目，用于通过微调扩散变换器实现视频风格迁移和上下文扩展。该项目采用测试时训练（TTT）层来处理全局上下文中的长距离关系，同时保留预训练模型的注意力层以对每个三秒片段进行局部注意力处理。其核心功能包括63秒视频生成、风格迁移及上下文扩展，技术上则依托于CogVideoX 5B模型，并通过插入TTT层改进了原有架构。适合需要在短时间内生成高质量视频内容的应用场景，如创意视频制作或动态广告生成等。",2,"2026-06-11 03:42:13","high_star"]