[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72439":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},72439,"ComfyUI-HunyuanVideoWrapper","kijai\u002FComfyUI-HunyuanVideoWrapper","kijai",null,"Python",2594,204,31,294,0,1,6,3,56.04,false,"main",true,[],"2026-06-12 04:01:05","# ComfyUI wrapper nodes for [HunyuanVideo](https:\u002F\u002Fgithub.com\u002FTencent\u002FHunyuanVideo)\n\n# Update 5\n\nSo I know I said I'd stop working on this, but with all the new stuff out I wanted to work on those and have included the official I2V, it's \"fixed\" version 2 and the [LoRAs](https:\u002F\u002Fhuggingface.co\u002FKijai\u002FHunyuanVideo_comfy\u002Fblob\u002Fmain\u002Fhyvid_I2V_lora_embrace.safetensors) they included in the release\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F8ce4b1ee-fb63-49a2-83b4-ba8ef1a8b842\n\n\n\n\nand the [dashtoon keyframe LoRA](https:\u002F\u002Fgithub.com\u002Fdashtoon\u002Fhunyuan-video-keyframe-control-lora).\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F2b6e32e4-470f-4feb-b299-5a453e2b4fa1\n\nAlso because there's been so much trouble in using the transformer model for text encoding, I figured a way to use the text embeds from native ComfyUI text encoding, like this:\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F80b23087-a66d-4937-bb2c-d15d5a20304b)\n\nNot that it does give somewhat different results and using these nodes like that can't be considered as original implementation wrapper anymore.\n\n# Update 4, the non-update:\n\n\nAs the native implementation exists, and has support for most features by now, I will mostly stop working on these nodes for anything but it's main purpose: early access and testing of potential new features that are difficult (at least for me) to implement natively.\n\n## Some resources for native workflows:\n\nFlowedit and enhance-a-video can be found from these nodes: https:\u002F\u002Fgithub.com\u002Flogtd\u002FComfyUI-HunyuanLoom\n\nTeaCache equilevant FirstBlockCache, as well as torch.compile with LoRA support: https:\u002F\u002Fgithub.com\u002Fchengzeyi\u002FComfy-WaveSpeed\n\nSageattention can be enabled by `--use-sage-attention` startup argument for ComfyUI, or with a patcher node found in [KJNodes](https:\u002F\u002Fgithub.com\u002Fkijai\u002FComfyUI-KJNodes) as well as some other node packs.\n\nLeapfusion I2V can be used with my patcher node found in the KJNodes as well, example workflow: https:\u002F\u002Fgithub.com\u002Fkijai\u002FComfyUI-KJNodes\u002Fblob\u002Fmain\u002Fexample_workflows\u002Fleapfusion_hunyuuanvideo_i2v_native_testing.json\n\nWhat remains missing from native implementation currently:\n- context windowing\n- direct image embed support through IP2V\n- manual memory management\n\n# Update 3:\n\nIt's been hectic couple of weeks with this model, I've lost track of what has happened since the start, but I'll try to present some of the more important updates:\n\n## Official scaled fp8 weights were released:\n\nhttps:\u002F\u002Fhuggingface.co\u002Ftencent\u002FHunyuanVideo\u002Fblob\u002Fmain\u002Fhunyuan-video-t2v-720p\u002Ftransformers\u002Fmp_rank_00_model_states_fp8.pt\n\nEven if this file is .pt it's completely safe and it is loaded with weights_only, the scale map is included with the nodes. To use this model you have to use the `fp8_scaled` -quantization option in the model loader.\nThe quality of these weights is much closer to the original bf16, downside is that they do not currently support fp8 fast mode, or LoRAs.\n\n## Almost free quality increase with [Enhance-A-Video](https:\u002F\u002Fgithub.com\u002FNUS-HPC-AI-Lab\u002FEnhance-A-Video):\n\nThis has a very slight hit on inference speed and zero hit on memory use, initial tests indicate it's absolutely worth using.\n\n![image](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F68f0b5eb-aa23-49e1-a48f-fd3c4b1108ed)\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fe19b30e1-5f67-4e75-9c73-716d4569c319\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F083353a2-e9aa-43e9-a916-ff3af1d581c1\n\n\n\n# Update 2: Experimental IP2V - Image Prompting to Video via VLM by @Dango233\n## WORK IN PROGRESS - But it should work now!\n\nNow you can feed image to the VLM as condition of generations! This is different from image2video where the image become the first frame of the video. IP2V uses image as a part of the prompt, to extract the concept and style of the image.\nSo - very much like IPAdapter - but VLM will do the heavy lifting for you!\n\nNow this is a tuning free approach but with further task specific tuning we can expand the use scenarios.\n\n## Guide to Using `xtuner\u002Fllava-llama-3-8b-v1_1-transformers` for Image-Text Tasks\n\n## Step 1: Model Selection\nUse the original `xtuner\u002Fllava-llama-3-8b-v1_1-transformers` model which includes the vision tower. You have two options:\n- Download the model and place it in the `models\u002FLLM` folder.\n- Rely on the auto-download mechanism.\n\n**Note:** It's recommended to offload the text encoder since the vision tower requires additional VRAM.\n\n## Step 2: Load and Connect Image\n- Use the comfy native node to load the image.\n- Connect the loaded image to the `Hunyuan TextImageEncode` node.\n  - You can connect up to 2 images to this node.\n\n## Step 3: Prompting with Images\n- Reference the image in your prompt by including `\u003Cimage>`.\n- The number of `\u003Cimage>` tags should match the number of images provided to the sampler.\n  - Example prompt: `Describe this \u003Cimage> in great detail.`\n\nYou can also choose to give CLIP a prompt that does not reference the image separately.\n\n## Step 4: Advanced Configuration - `image_token_selection_expression`\nThis expression is for advanced users and serves as a boolean mask to select which part of the image hidden state will be used for conditioning. Here are some details and recommendations:\n\n- The hidden state sequence length (or number of tokens) per image in llava-llama-3 is 576.\n- The default setting is `::4`, meaning every four tokens, one token goes into conditioning, interleaved, resulting in 144 tokens per image.\n- Generally, more tokens lean more towards the conditional image.\n- However, too many tokens (especially if the overall token count exceeds 256) will degrade generation quality. It's recommended not to use more than half the tokens (`::2`).\n- Interleaved tokens generally perform better, but you might also want to try the following expressions:\n  - `:128` - First 128 tokens.\n  - `-128:` - Last 128 tokens.\n  - `:128, -128:` - First 128 tokens and last 128 tokens.\n- With a proper prompting strategy, even not passing in any image tokens (leaving the expression blank) can yield decent effects.\n\n# Update\n\nScaled dot product attention (sdpa) should now be working (only tested on Windows, torch 2.5.1+cu124 on 4090), sageattention is still recommended for speed, but should not be necessary anymore making installation much easier.\n\nVid2vid test:\n[source video](https:\u002F\u002Fwww.pexels.com\u002Fvideo\u002Fa-4x4-vehicle-speeding-on-a-dirt-road-during-a-competition-15604814\u002F)\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F12940721-4168-4e2b-8a71-31b4b0432314\n\n\ntext2vid (old test):\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F3750da65-9753-4bd2-aae2-a688d2b86115\n\n\nTransformer and VAE (single files, no autodownload):\n\nhttps:\u002F\u002Fhuggingface.co\u002FKijai\u002FHunyuanVideo_comfy\u002Ftree\u002Fmain\n\nGo to the usual ComfyUI folders (diffusion_models and vae)\n\nLLM text encoder (has autodownload):\n\nhttps:\u002F\u002Fhuggingface.co\u002FKijai\u002Fllava-llama-3-8b-text-encoder-tokenizer\n\nFiles go to `ComfyUI\u002Fmodels\u002FLLM\u002Fllava-llama-3-8b-text-encoder-tokenizer`\n\nClip text encoder (has autodownload)\n\nEither use any Clip_L model supported by ComfyUI by disabling the clip_model in the text encoder loader and plugging in ClipLoader to the text encoder node, or \nallow the autodownloader to fetch the original clip model from:\n\nhttps:\u002F\u002Fhuggingface.co\u002Fopenai\u002Fclip-vit-large-patch14, (only need the .safetensor from the weights, and all the config files) to:\n\n`ComfyUI\u002Fmodels\u002Fclip\u002Fclip-vit-large-patch14`\n\nMemory use is entirely dependant on resolution and frame count, don't expect to be able to go very high even on 24GB. \n\nGood news is that the model can do functional videos even at really low resolutions.\n","ComfyUI-HunyuanVideoWrapper 项目为腾讯的 HunyuanVideo 提供了 ComfyUI 的封装节点。该项目主要功能包括集成官方I2V、多种LoRA模型支持以及使用ComfyUI原生文本编码的方法，从而增强视频处理能力。技术上，它允许用户通过ComfyUI界面更方便地访问和测试HunyuanVideo的新特性，同时提供了一些优化如SageAttention的支持与Leapfusion I2V的兼容性。适用于需要探索或利用HunyuanVideo最新进展的研究者及开发者，特别是在早期接入新功能或者进行复杂视频编辑任务时。",2,"2026-06-11 03:42:03","high_star"]