[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81025":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":12,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},81025,"ANIMA_BOOSTER","BlackSnowSkill\u002FANIMA_BOOSTER","BlackSnowSkill","High-performance optimization suite for Anima DiT 2B model in ComfyUI",null,"Python",47,1,29,0,4,15,18,12,0.9,false,"main",true,[],"2026-06-12 02:04:09","# ANIMA_BOOSTER (BSS)\n\n🇷🇺 [Читать на русском языке](README_RU.md)\n\n**ANIMA_BOOSTER (BSS)** is a high-performance optimization suite for the **Anima DiT 2B** model in ComfyUI. It is designed to deliver maximum performance, reduce VRAM usage, and accelerate generation speeds on NVIDIA graphics cards.\n\n> [!IMPORTANT]\n> **Author and Developer:** **blacksnowskill (BSS)**\n> **© 2026 blacksnowskill (BSS). All rights reserved.**\n> This project is protected by copyright. Any unauthorized copying, modification without attribution, or representing this code as your own product is strictly prohibited.\n\nThis package allows you to achieve a **total acceleration of 3.5× to 5.0×** compared to the default Anima workflow in ComfyUI, with no noticeable loss in visual quality.\n\n> [!TIP]\n> **Ultimate Quality & Detail Recovery:**  \n> While extreme optimization can sometimes lead to a slight loss in micro-details, we have a perfect solution for that. We highly recommend pairing **ANIMA_BOOSTER** with our companion node **[FLSampler (BSS)](https:\u002F\u002Fgithub.com\u002FBlackSnowSkill\u002FComfyUI-BSS_FLSampler)**. FLS perfectly restores any lost details without sacrificing your speed gains, producing even sharper and more coherent details than the original unoptimized model!\n\n---\n\n## ⚡ Key Features\n\n* **Integrated JIT Compilation (`torch.compile`):** Safe, one-click compilation of DiT blocks built directly into the loaders. Runs on the stable `inductor` backend without CUDA Graphs, ensuring 100% stability and a speed boost of **20% to 40%**.\n* **SageAttention:** Built-in support for ultra-fast 8-bit attention tailored for DiT models, significantly accelerating computations while reducing memory consumption.\n* **Adaptive TeaCache:** Intelligent latent state caching. Dynamically adjusts the caching threshold: automatically lowering it in early steps to preserve geometry, and raising it in later steps for maximum acceleration.\n* **BSS Premium UI:** An integrated, high-contrast dark theme for the package's nodes. Features a fully redesigned full-width slider control, adaptive visibility for inactive preset inputs, and complete suppression of intrusive tooltips for a distraction-free experience.\n\n---\n\n## ⚡ Performance Quick Overview\n\n| Configuration | Average Acceleration | Description |\n|---|---|---|\n| fp16 Only | **1.4×** | Baseline precision optimization |\n| fp16 + **SageAttention** | **1.8–2.5×** | Ultra-fast 8-bit attention for DiT |\n| fp16 + **Adaptive TeaCache** | **1.5–2.0×** | Intelligent step caching |\n| **fp16 + SageAttn + TeaCache** | 🚀 **2.5–3.5×** | Perfect balance of speed and quality |\n| **+ Integrated torch.compile** | 💎 **3.5–5.0×** | Maximum performance boost (after a 2-3 step warm-up phase) |\n\n---\n\n## 📥 Installation\n\n### Method 1: Via ComfyUI Manager (Recommended)\n1. Open ComfyUI and click on the **Manager** button.\n2. Click **Install via Git URL**.\n3. Paste this repository URL: `https:\u002F\u002Fgithub.com\u002FBlackSnowSkill\u002FANIMA_BOOSTER`\n4. Click **Install**, wait for the process to complete, and restart ComfyUI.\n\n### Method 2: Manual Installation (Git Clone)\n1. Open your terminal and navigate to your ComfyUI custom nodes directory:\n   ```bash\n   cd ComfyUI\u002Fcustom_nodes\n   ```\n2. Clone this repository:\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002FBlackSnowSkill\u002FANIMA_BOOSTER.git\n   ```\n3. Restart ComfyUI.\n\n---\n\n## 🧩 `BSS\u002FAnimaBooster` Node List\n\nAll nodes are registered under the `BSS\u002FAnimaBooster` category:\n\n1. 📥 **Anima Booster Loader (BSS)** (class `AnimaBoosterLoader`)\n   - Loads the Anima DiT model in the optimized fp16 format.\n   - **SageAttention**: Automatically applies accelerated 8-bit attention if installed in the system. If unavailable, it seamlessly falls back to built-in PyTorch SDPA.\n   - **Torch Compile**: An integrated toggle for safe JIT compilation of individual transformer blocks.\n2. 🎛️ **Anima TeaCache (BSS)** (class `AnimaTeaCache`)\n   - Implements adaptive latent state caching based on denoising steps (TeaCache).\n   - **Version Selector (teacache_version)**: Allows you to choose between two modes:\n     * `v1 (Legacy Fast)` (Default): Restores the highly requested aggressive caching behavior with a fixed timestep normalizer. Delivers an instant 2.0× speedup out-of-the-box on SDE samplers (such as `er_sde`, `sde gpu`), though it might introduce minor artifacts on Euler A.\n     * `v2 (Standard Precise)`: Mathematically precise, dynamic timestep normalization that adapts to any sampler and scheduler. Fully protects early structural steps and guarantees perfect image quality.\n3. 🖼️ **Anima Latent Image (BSS)** (class `AnimaLatentImage`)\n   - A utility for generating empty latents with automatic size alignment to the Anima DiT patch grid (2x2), preventing tensor dimension mismatch errors. Provides predefined aspect ratio presets.\n\n---\n\n## 📈 Adaptive TeaCache Threshold\n\nUnlike standard TeaCache implementations with a fixed threshold, the **BSS** version uses a **dynamic adaptive threshold** that evolves during the denoising process:\n- **In early steps** (high noise, image structure formation), the threshold is automatically lowered to ensure maximum rendering precision and geometric accuracy.\n- **In later steps** (details have stabilized, micro-texturing takes place), the threshold is raised, allowing up to 80% of block computations to be safely skipped without quality loss.\n\n> [!TIP]\n> **Choosing the Timestep Normalization Mode (in v1.3.0):**\n> * If you want **uncompromised extreme speed** out-of-the-box on SDE samplers and enjoy experimenting, select `v1 (Legacy Fast)`.\n> * If you are running Euler A, require **maximum geometric precision**, or want to fine-tune quality using `early_steps_factor` and `late_steps_factor`, select `v2 (Standard Precise)`.\n\n---\n\n## 📋 Recommended Workflow\n\nTo achieve the best results, connect the nodes in the following sequence:\n\n```\n[ Anima Booster Loader (BSS) ] ── (Enable sage_attention: auto and torch_compile: True)\n             ↓ (MODEL)\n[ Anima TeaCache (BSS) ] (Recommended: threshold: 0.15, adaptive: ON)\n             ↓ (MODEL)\n        [ KSampler ]\n```\n\n---\n\n## 🛠️ Installation & Dependencies on Windows\n\nAll resource-heavy optimization libraries (**SageAttention** and JIT **Triton**) are **entirely optional**. The package is designed with *Graceful Degradation* in mind: if the libraries are not installed, the nodes will automatically disable patches or compile features, transitioning seamlessly to standard PyTorch mechanisms and guaranteeing crash-free execution in ComfyUI.\n\n> [!IMPORTANT]\n> **Triton is required for both SageAttention and JIT Compilation (`torch.compile`)**!\n> If you plan to enable `torch_compile` (which yields up to a 40% speed boost) or use SageAttention on Windows, you **must** install `triton-windows`. Without Triton, `torch.compile` will be safely disabled with a warning in the console to avoid crashes.\n\n### 📦 Installing Triton and SageAttention on Windows (for Portable Builds)\n\nPortable ComfyUI builds (which use an isolated `python_embeded` environment on Python 3.12 or 3.13) do not have C++ compilation tools (MSVC \u002F Build Tools) installed. As a result, the standard `pip install sageattention` command will fail with a compilation error.\n\nTo install it successfully, use precompiled binary packages (`.whl` wheels):\n\n1. Open a command prompt (CMD or PowerShell) in your main ComfyUI folder.\n2. Install Triton for Windows:\n   ```bash\n   .\\python_embeded\\python.exe -m pip install triton-windows\n   ```\n3. Download the precompiled `.whl` file for SageAttention that matches your Python version (e.g., `cp312` or `cp313`) and CUDA version (e.g., `cu124` \u002F `cu128`):\n   - Precompiled builds can be found in the releases of this repository: **[sdbds\u002FSageAttention-for-windows](https:\u002F\u002Fgithub.com\u002Fsdbds\u002FSageAttention-for-windows)**.\n   - Precompiled wheels are also published in the project: **[wildminder\u002FAI-windows-whl](https:\u002F\u002Fgithub.com\u002Fwildminder\u002FAI-windows-whl)**.\n4. Install the downloaded file into the portable environment:\n   ```bash\n   .\\python_embeded\\python.exe -m pip install \u003Cpath_to_downloaded_file.whl>\n   ```\n\n---\n\n## 🐧 Installation & Dependencies on Linux (Ubuntu)\n\nOn Linux\u002FUbuntu systems, installing dependencies is much more straightforward than on Windows since compilation tools and build pipelines are natively supported.\n\n### 📦 Prerequisites\n\nEnsure you have the CUDA Toolkit installed and available in your environment (`nvcc --version`).\n\n### 📦 Installing SageAttention 2.x\n\nSageAttention 2.x is highly recommended for newer GPUs (Ampere, Lovelace, Blackwell, e.g., RTX 30xx, 40xx, 50xx).\n\n1. Activate your ComfyUI virtual environment:\n   ```bash\n   source \u002Fpath\u002Fto\u002FComfyUI\u002Fvenv\u002Fbin\u002Factivate\n   ```\n2. Install SageAttention directly via PyPI with the `--no-build-isolation` flag to prevent dependency mismatches with PyTorch:\n   ```bash\n   pip install sageattention --no-build-isolation\n   ```\n   *If PyPI fails, you can compile from source:*\n   ```bash\n   git clone https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FSageAttention.git\n   cd SageAttention\n   pip install \"setuptools\u003C=75.8.2\"\n   pip install --no-build-isolation -e .\n   ```\n\n---\n\n## 🔍 Troubleshooting: Triton JIT & `PassManager::run failed` Errors\n\nIf you enable `torch_compile` in the loader node and encounter a runtime crash in KSampler with Triton raising `RuntimeError: PassManager::run failed` (specifically under `_attn_fwd` during Triton compilation on Ubuntu 24.04), this is a known compatibility issue between Triton's code generator, LLVM, and local system compiler tools.\n\nTo resolve or bypass this error, try the following solutions:\n\n### 1. Disable `torch_compile` (Recommended)\nThe easiest and most robust workaround is to simply set **`torch_compile` to `False`** in the **Anima Booster Loader (BSS)** or **Anima Checkpoint Loader (BSS)** node.\n* *Why:* The nodes are fully optimized, and you will still get massive acceleration (**2.5× to 3.5×**) using just **SageAttention** and **TeaCache** without invoking Triton's JIT compiler.\n\n### 2. Disable Triton JIT Optimizations\nYou can bypass the failing compilation pass by running ComfyUI with the Triton optimization disable flag. Run ComfyUI like this:\n```bash\nexport TRITON_JIT_DISABLE_OPT=1\npython main.py\n```\n\n### 3. Clear Triton Compiler Cache\nSometimes Triton's cached kernels get corrupted or miscompiled. Clear the cache directory:\n```bash\nrm -rf ~\u002F.triton\u002Fcache\nrm -rf ~\u002F.cache\u002Ftriton\nrm -rf \u002Ftmp\u002Ftorchinductor_*\n```\n\n### 4. Reinstall compatible Triton\nEnsure your Triton version is strictly compatible with PyTorch:\n```bash\npip install --upgrade --force-reinstall triton\n```\n\n\n## 🧑‍💻 Technical Implementation Details for Developers\n\n- **Model Base**: Anima DiT is based on the `MiniTrainDIT` architecture with the `LLMAdapter` wrapper.\n- **SageAttention Integration Point**: The patch is applied via the standard `transformer_options[\"optimized_attention_override\"]` parameter dictionary.\n- **Torch Compile Integration Point**: Invokes the built-in `set_torch_compile_wrapper` function from `comfy_api.torch_helpers` at the level of individual transformer blocks (ensuring LoRA compatibility and reducing compilation overhead).\n- **Isolation**: All graph and weight modifications are performed strictly on a cloned copy of the model (`model.clone()`). This prevents conflicts and leaves the original model in the ComfyUI cache untouched.\n\n---\n\n## ☕ Support & Development\n\nIf you love my work and want to support the development of future optimizations, nodes, and custom models, please consider supporting me:\n- **Boosty**: [Support & Exclusive Models](https:\u002F\u002Fboosty.to\u002Fblacksnowskill)\n\n---\n\n## 📄 License & Usage\n\n© 2026 blacksnowskill (BSS). All rights reserved.\n\nThis software is an experimental release. Feedback is highly welcome.\n**Notice:** This project is protected by copyright. Any unauthorized copying, distribution, merging with other projects, or hosting on other repositories\u002Fwebsites without the explicit written permission of the author is strictly prohibited.\n","ANIMA_BOOSTER (BSS) 是一个针对 ComfyUI 中 Anima DiT 2B 模型的高性能优化套件。该项目通过集成 JIT 编译、SageAttention 和 Adaptive TeaCache 等技术，显著提升模型运行速度，减少显存使用，并在 NVIDIA 显卡上加速生成过程。核心功能包括安全的一键式 DiT 块编译、专为 DiT 模型定制的超快速 8 位注意力机制以及智能潜状态缓存策略，能够实现高达 3.5 至 5.0 倍的整体加速效果，同时保持良好的视觉质量。此外，项目还提供了一个高对比度的暗色主题 UI，增强了用户体验。此工具非常适合需要在不牺牲图像细节的情况下提高渲染效率的场景，如游戏开发、动画制作和高质量图像生成等领域。",2,"2026-06-11 04:03:15","CREATED_QUERY"]