[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2705":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},2705,"FramePack","lllyasviel\u002FFramePack","lllyasviel","Lets make video diffusion practical!","",null,"Python",17010,1694,137,454,0,6,44,210,46,44.69,"Apache License 2.0",false,"main",[],"2026-06-12 02:00:43","\u003Cp align=\"center\">\n    \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F2cc030b4-87e1-40a0-b5bf-1b7d6b62820b\" width=\"300\">\n\u003C\u002Fp>\n\n# FramePack\n\nOfficial implementation and desktop software for [\"Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models\"](https:\u002F\u002Flllyasviel.github.io\u002Fframe_pack_gitpage\u002F).\n\nLinks: [**Paper**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2504.12626), [**Project Page**](https:\u002F\u002Flllyasviel.github.io\u002Fframe_pack_gitpage\u002F)\n\nFramePack is a next-frame (next-frame-section) prediction neural network structure that generates videos progressively. \n\nFramePack compresses input contexts to a constant length so that the generation workload is invariant to video length.\n\nFramePack can process a very large number of frames with 13B models even on laptop GPUs.\n\nFramePack can be trained with a much larger batch size, similar to the batch size for image diffusion training.\n\n**Video diffusion, but feels like image diffusion.**\n\n# News\n\n**2025 July 14:** Some pure text2video anti-drifting stress-test results of FramePack-P1 are uploaded [here,](https:\u002F\u002Flllyasviel.github.io\u002Fframe_pack_gitpage\u002Fp1\u002F#text-to-video-stress-tests) using common prompts without any reference images.\n\n**2025 June 26:** Some results of FramePack-P1 are uploaded [here.](https:\u002F\u002Flllyasviel.github.io\u002Fframe_pack_gitpage\u002Fp1) The FramePack-P1 will be the next version of FramePack with two designs: Planned Anti-Drifting and History Discretization.\n\n**2025 May 03:** The FramePack-F1 is released. [Try it here.](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FFramePack\u002Fdiscussions\u002F459)\n\nNote that this GitHub repository is the only official FramePack website. We do not have any web services. All other websites are spam and fake, including but not limited to `framepack.co`, `frame_pack.co`, `framepack.net`, `frame_pack.net`, `framepack.ai`, `frame_pack.ai`, `framepack.pro`, `frame_pack.pro`, `framepack.cc`, `frame_pack.cc`,`framepackai.co`, `frame_pack_ai.co`, `framepackai.net`, `frame_pack_ai.net`, `framepackai.pro`, `frame_pack_ai.pro`, `framepackai.cc`, `frame_pack_ai.cc`, and so on. Again, they are all spam and fake. **Do not pay money or download files from any of those websites.**\n\n# Requirements\n\nNote that this repo is a functional desktop software with minimal standalone high-quality sampling system and memory management.\n\n**Start with this repo before you try anything else!**\n\nRequirements:\n\n* Nvidia GPU in RTX 30XX, 40XX, 50XX series that supports fp16 and bf16. The GTX 10XX\u002F20XX are not tested.\n* Linux or Windows operating system.\n* At least 6GB GPU memory.\n\nTo generate 1-minute video (60 seconds) at 30fps (1800 frames) using 13B model, the minimal required GPU memory is 6GB. (Yes 6 GB, not a typo. Laptop GPUs are okay.)\n\nAbout speed, on my RTX 4090 desktop it generates at a speed of 2.5 seconds\u002Fframe (unoptimized) or 1.5 seconds\u002Fframe (teacache). On my laptops like 3070ti laptop or 3060 laptop, it is about 4x to 8x slower. [Troubleshoot if your speed is much slower than this.](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FFramePack\u002Fissues\u002F151#issuecomment-2817054649)\n\nIn any case, you will directly see the generated frames since it is next-frame(-section) prediction. So you will get lots of visual feedback before the entire video is generated.\n\n# Installation\n\n**Windows**:\n\n[>>> Click Here to Download One-Click Package (CUDA 12.6 + Pytorch 2.6) \u003C\u003C\u003C](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FFramePack\u002Freleases\u002Fdownload\u002Fwindows\u002Fframepack_cu126_torch26.7z)\n\nAfter you download, you uncompress, use `update.bat` to update, and use `run.bat` to run.\n\nNote that running `update.bat` is important, otherwise you may be using a previous version with potential bugs unfixed.\n\n![image](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002Fstable-diffusion-webui-forge\u002Fassets\u002F19834515\u002Fc49bd60d-82bd-4086-9859-88d472582b94)\n\nNote that the models will be downloaded automatically. You will download more than 30GB from HuggingFace.\n\n**Linux**:\n\nWe recommend having an independent Python 3.10.\n\n    pip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu126\n    pip install -r requirements.txt\n\nTo start the GUI, run:\n\n    python demo_gradio.py\n\nNote that it supports `--share`, `--port`, `--server`, and so on.\n\nThe software supports PyTorch attention, xformers, flash-attn, sage-attention. By default, it will just use PyTorch attention. You can install those attention kernels if you know how. \n\nFor example, to install sage-attention (linux):\n\n    pip install sageattention==1.0.6\n\nHowever, you are highly recommended to first try without sage-attention since it will influence results, though the influence is minimal.\n\n# GUI\n\n![ui](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8c5cdbb1-b80c-4b7e-ac27-83834ac24cc4)\n\nOn the left you upload an image and write a prompt.\n\nOn the right are the generated videos and latent previews.\n\nBecause this is a next-frame-section prediction model, videos will be generated longer and longer.\n\nYou will see the progress bar for each section and the latent preview for the next section.\n\nNote that the initial progress may be slower than later diffusion as the device may need some warmup.\n\n# Sanity Check\n\nBefore trying your own inputs, we highly recommend going through the sanity check to find out if any hardware or software went wrong. \n\nNext-frame-section prediction models are very sensitive to subtle differences in noise and hardware. Usually, people will get slightly different results on different devices, but the results should look overall similar. In some cases, if possible, you'll get exactly the same results.\n\n## Image-to-5-seconds\n\nDownload this image:\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff3bc35cf-656a-4c9c-a83a-bbab24858b09\" width=\"150\">\n\nCopy this prompt:\n\n`The man dances energetically, leaping mid-air with fluid arm swings and quick footwork.`\n\nSet like this:\n\n(all default parameters, with teacache turned off)\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F0071fbb6-600c-4e0f-adc9-31980d540e9d)\n\nThe result will be:\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fbc74f039-2b14-4260-a30b-ceacf611a185\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n**Important Note:**\n\nAgain, this is a next-frame-section prediction model. This means you will generate videos frame-by-frame or section-by-section.\n\n**If you get a much shorter video in the UI, like a video with only 1 second, then it is totally expected.** You just need to wait. More sections will be generated to complete the video.\n\n## Know the influence of TeaCache and Quantization\n\nDownload this image:\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F42293e30-bdd4-456d-895c-8fedff71be04\" width=\"150\">\n\nCopy this prompt:\n\n`The girl dances gracefully, with clear movements, full of charm.`\n\nSet like this:\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4274207d-5180-4824-a552-d0d801933435)\n\nTurn off teacache:\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F53b309fb-667b-4aa8-96a1-f129c7a09ca6)\n\nYou will get this:\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F04ab527b-6da1-4726-9210-a8853dda5577\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\nNow turn on teacache:\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F16ad047b-fbcc-4091-83dc-d46bea40708c)\n\nAbout 30% users will get this (the other 70% will get other random results depending on their hardware):\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F149fb486-9ccc-4a48-b1f0-326253051e9b\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>A typical worse result.\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\nSo you can see that teacache is not really lossless and sometimes can influence the result a lot.\n\nWe recommend using teacache to try ideas and then using the full diffusion process to get high-quality results.\n\nThis recommendation also applies to sage-attention, bnb quant, gguf, etc., etc.\n\n## Image-to-1-minute\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F820af6ca-3c2e-4bbc-afe8-9a9be1994ff5\" width=\"150\">\n\n`The girl dances gracefully, with clear movements, full of charm.`\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8c34fcb2-288a-44b3-a33d-9d2324e30cbd)\n\nSet video length to 60 seconds:\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F5595a7ea-f74e-445e-ad5f-3fb5b4b21bee)\n\nIf everything is in order you will get some result like this eventually.\n\n60s version:\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc3be4bde-2e33-4fd4-b76d-289a036d3a47\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n6s version:\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F37fe2c33-cb03-41e8-acca-920ab3e34861\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n# More Examples\n\nMany more examples are in [**Project Page**](https:\u002F\u002Flllyasviel.github.io\u002Fframe_pack_gitpage\u002F).\n\nBelow are some more examples that you may be interested in reproducing.\n\n---\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F99f4d281-28ad-44f5-8700-aa7a4e5638fa\" width=\"150\">\n\n`The girl dances gracefully, with clear movements, full of charm.`\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F0e98bfca-1d91-4b1d-b30f-4236b517c35e)\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fcebe178a-09ce-4b7a-8f3c-060332f4dab1\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F853f4f40-2956-472f-aa7a-fa50da03ed92\" width=\"150\">\n\n`The girl suddenly took out a sign that said “cute” using right hand`\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fd51180e4-5537-4e25-a6c6-faecae28648a)\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F116069d2-7499-4f38-ada7-8f85517d1fbb\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F6d87c53f-81b2-4108-a704-697164ae2e81\" width=\"150\">\n\n`The girl skateboarding, repeating the endless spinning and dancing and jumping on a skateboard, with clear movements, full of charm.`\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc2cfa835-b8e6-4c28-97f8-88f42da1ffdf)\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fd9e3534a-eb17-4af2-a8ed-8e692e9993d2\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F6e95d1a5-9674-4c9a-97a9-ddf704159b79\" width=\"150\">\n\n`The girl dances gracefully, with clear movements, full of charm.`\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F7412802a-ce44-4188-b1a4-cfe19f9c9118)\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe1b3279e-e30d-4d32-b55f-2fb1d37c81d2\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F90fc6d7e-8f6b-4f8c-a5df-ee5b1c8b63c9\" width=\"150\">\n\n`The man dances flamboyantly, swinging his hips and striking bold poses with dramatic flair.`\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F1dcf10a3-9747-4e77-a269-03a9379dd9af)\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Faaa4481b-7bf8-4c64-bc32-909659767115\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F62ecf987-ec0c-401d-b3c9-be9ffe84ee5b\" width=\"150\">\n\n`The woman dances elegantly among the blossoms, spinning slowly with flowing sleeves and graceful hand movements.`\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F396f06bc-e399-4ac3-9766-8a42d4f8d383)\n\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Ff23f2f37-c9b8-45d5-a1be-7c87bd4b41cf\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F4f740c1a-2d2f-40a6-9613-d6fe64c428aa\" width=\"150\">\n\n`The young man writes intensely, flipping papers and adjusting his glasses with swift, focused movements.`\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc4513c4b-997a-429b-b092-bb275a37b719)\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"300\">\n      \u003Cvideo \n        src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F62e9910e-aea6-4b2b-9333-2e727bccfc64\" \n        controls \n        style=\"max-width:100%;\">\n      \u003C\u002Fvideo>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\">\n      \u003Cem>Video may be compressed by GitHub\u003C\u002Fem>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n# Prompting Guideline\n\nMany people would ask how to write better prompts. \n\nBelow is a ChatGPT template that I personally often use to get prompts:\n\n    You are an assistant that writes short, motion-focused prompts for animating images.\n\n    When the user sends an image, respond with a single, concise prompt describing visual motion (such as human activity, moving objects, or camera movements). Focus only on how the scene could come alive and become dynamic using brief phrases.\n\n    Larger and more dynamic motions (like dancing, jumping, running, etc.) are preferred over smaller or more subtle ones (like standing still, sitting, etc.).\n\n    Describe subject, then motion, then other things. For example: \"The girl dances gracefully, with clear movements, full of charm.\"\n\n    If there is something that can dance (like a man, girl, robot, etc.), then prefer to describe it as dancing.\n\n    Stay in a loop: one image in, one motion prompt out. Do not explain, ask questions, or generate multiple options.\n\nYou paste the instruct to ChatGPT and then feed it an image to get prompt like this:\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F586c53b9-0b8c-4c94-b1d3-d7e7c1a705c3)\n\n*The man dances powerfully, striking sharp poses and gliding smoothly across the reflective floor.*\n\nUsually this will give you a prompt that works well. \n\nYou can also write prompts yourself. Concise prompts are usually preferred, for example:\n\n*The girl dances gracefully, with clear movements, full of charm.*\n\n*The man dances powerfully, with clear movements, full of energy.*\n\nand so on.\n\n# Cite\n\n    @inproceedings{zhang2025framepack,\n        title={Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models},\n        author={Lvmin Zhang and Shengqu Cai and Muyang Li and Gordon Wetzstein and Maneesh Agrawala},\n        booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},\n        year={2025},\n    }\n\n    @article{zhang2025framepackv1,\n        title={Packing Input Frame Contexts in Next-Frame Prediction Models for Video Generation},\n        author={Lvmin Zhang and Maneesh Agrawala},\n        journal={Arxiv},\n        year={2025}\n    }\n","FramePack 是一个用于视频扩散模型的下一帧预测神经网络结构，旨在让视频生成更加实用。它通过将输入上下文压缩到固定长度来保持生成工作负载与视频长度无关，从而能够使用13B规模的模型在笔记本GPU上处理大量帧数。此外，FramePack支持更大的批量训练，类似于图像扩散模型的训练方式，使得视频扩散过程体验接近于图像扩散。该项目适用于需要高效、高质量视频生成的应用场景，如创意内容制作、视觉特效等领域。要求用户具备Nvidia RTX 30XX\u002F40XX\u002F50XX系列显卡（至少6GB显存）、Linux或Windows操作系统。",2,"2026-06-11 02:50:59","top_language"]