[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75531":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},75531,"10S-Comfy-nodes","TenStrip\u002F10S-Comfy-nodes","TenStrip","Vibe coded nodes for LTX2.3",null,"Python",145,19,5,6,0,12,24,111,36,3.9,false,"main",true,[],"2026-06-12 02:03:34","# 10S Nodes for ComfyUI\n\nA collection of custom nodes for ComfyUI focused on identity preservation, latent-space stabilization, and **upscale-pass quality** for the **LTX2 video diffusion model** by Lightricks. Nodes operate via PyTorch forward hooks on the DiT backbone — no model retraining required.\n\n> **Compatibility note:** these nodes are specifically tuned for LTX2\u002FLTX-AV (the dual-stream video+audio DiT). They will not work as-is on other diffusion models — the hooks rely on LTX2's specific block structure (`BasicAVTransformerBlock`).\n\n---\n\n## Installation\n\n```bash\ncd ComfyUI\u002Fcustom_nodes\ngit clone https:\u002F\u002Fgithub.com\u002FTenStrip\u002F10S-Comfy-nodes.git 10S_Nodes\n```\n\nRestart ComfyUI. The new nodes appear under `10S Nodes\u002F` in the node-add menu.\n\nTo update later:\n```bash\ncd ComfyUI\u002Fcustom_nodes\u002F10S_Nodes\ngit pull\n```\n\nOptional dependency: **MediaPipe** is used by the Latent Face Detector for face bbox detection. Without it, the detector falls back to OpenCV's Haar cascades (less accurate). Install with:\n\n```bash\npip install mediapipe\n```\n\nAll other dependencies come from ComfyUI's existing environment (PyTorch, comfy package).\n\n---\n\n## Headline Nodes\n\nThese two solve the most common LTX2 quality issues directly. If you're only going to use one or two nodes from this package, these are them.\n\n### 🎲 LTX Tiled Sampler — *the upscale-pass fix*\n\n**Category:** `10S Nodes\u002FSampling`\n\nSpatially-tiled drop-in replacement for `SamplerCustomAdvanced`. Solves the **broad hue shift \u002F conditioning drift problem** that occurs when running a second sampling pass on an upscaled latent.\n\n**The problem this solves:** when the upscale pass runs on a 2× upscaled latent (4× more spatial tokens), each video token receives ~1\u002F4 the per-token text-conditioning influence the model was trained with. The model also operates outside its trained spatial-token-count range. This manifests as broad hue shifts, color drift, and prompt adherence loss in the upscale-pass output. After exhaustive investigation it became clear: this is induced by the *sampler*, not the upsampler or VAE.\n\n**The mechanism:** split the latent spatially along its longer axis with overlap, sample each tile through the full pipeline at training-distribution token count, blend tiles with cosine-windowed Hann overlap. Each tile stays within the model's \"comfortable\" extent. Optionally, one chosen tile can carry the audio for the pass, preserving video-audio cross-attention for lipsync.\n\n**Key parameters:**\n- `tile_axis` — `auto` (default; splits the longer dimension), `H`, or `W`\n- `n_tiles` — number of tiles along the chosen axis (default 2; sufficient for most aspect ratios)\n- `tile_overlap` — overlap in latent tokens (default 8; hides seams reliably)\n- `max_size_for_no_tile` — auto-skip tiling if both dims are at or below this (default 24)\n- `audio_pass` — `passthrough` (audio unchanged) or `tile_carrying` (audio sampled through the carrier tile)\n- `audio_carrier_tile` — `first` (top tile, ideal for vertical talking-face content), `middle` (centered for very large outputs), `last`\n\n**Recommended config for vertical talking-face upscale:**\n```\ntile_axis           = auto\nn_tiles             = 2\ntile_overlap        = 8\naudio_pass          = tile_carrying\naudio_carrier_tile  = first\n```\n\n**For large landscape outputs (4K\u002F8K):**\n```\ntile_axis           = auto\nn_tiles             = 2 to 4\ntile_overlap        = 8\naudio_pass          = tile_carrying\naudio_carrier_tile  = middle\n```\n\n**Compatibility note:** the upscale-pass workflow this targets uses light denoising (typical sigmas like `[0.85, 0.7, 0.4, 0]`) with `euler_ancestral_cfg_pp` and distilled CFG=1. The node is not suited for heavy-denoise-from-pure-noise generation, where per-tile divergence would produce visible seams.\n\n---\n\n### 🎯 LTX Latent Anchor Aware — *content-aware identity stabilization*\n\n**Category:** `10S Nodes\u002FIdentity`\n\nInference-time regularizer that improves prompt + image conditioning adherence, scene composition consistency, and physical sensibility across long sampling chains. Adds optional spatial weighting from an external reference image.\n\n**How it works:** snapshots the model's representation of the anchor frame at a chosen sampling step, then pulls all subsequent computation toward that cached state. The reference image (when connected) provides per-position energy weighting — high-energy regions of the reference get more anchor pull, low-energy regions get less.\n\n**Key parameters:**\n- `sigmas` (SIGMAS input) — connect for predictable cache timing\n- `strength` — pull magnitude (typical 0.05-0.15)\n- `cache_at_step` — sampling step at which to lock the anchor (typical 3-9)\n- `similarity_threshold` — cosine sim cutoff (default 0.50)\n- `decay_with_distance` — per-frame strength decay (default 0.0)\n- `reference_image` (IMAGE) — optional external reference for spatial energy\n- `vae` (VAE) — required if `reference_image` is connected\n- `energy_threshold` — gating cutoff (default 0.30; 0 = uniform, 0.50 = above-median energy only)\n\n**Recommended starting config:**\n```\nsigmas               = [connected]\nstrength             = 0.10\ncache_at_step        = 6\nsimilarity_threshold = 0.50\nenergy_threshold     = 0.30\nreference_image      = [face crop or composition reference]\nvae                  = [LTX VAE]\n```\n\n**Simple\u002Fadvanced mode:** the simple-mode default exposes essential knobs. Toggle `advanced_mode=True` for `cache_mode`, `forwards_per_step`, `depth_curve`, `block_index_filter`, and other research parameters.\n\n---\n\n## Likeness Suite\n\nThese nodes work together to provide face-region identity preservation for prompts the model already partially knows. **For unique faces unknown to the base model, a subject-specific LoRA trained via LTX-Video-Trainer (~30 images, ~15 minutes) provides significantly better quality than any inference-time method in this package.** The Likeness suite is best used as a complement to LoRA workflows for stabilization during difficult prompts (rapid expression changes, hard motion), or for faces the model partially recognizes.\n\n### 🎯 LTX Likeness Guide\n\n**Category:** `10S Nodes\u002FIdentity`\n\nEncodes a reference face into the conditioning pipeline. Auto-detects the face bbox via MediaPipe (with OpenCV Haar fallback), produces `reference_info` metadata that downstream nodes (LikenessAnchor, LikenessSemanticClamp) read from.\n\n**Key parameters:**\n- `image` (IMAGE) — reference face image\n- `vae` (VAE) — LTX VAE for encoding\n- `positive`, `negative` (CONDITIONING) — passed through with attention metadata\n- `emit_latent` — `passthrough` (default, recommended) or `extend_latent` (legacy)\n- `face_detect` — `auto` (MediaPipe→Haar fallback), `manual`, or `disabled`\n- `manual_face_bbox` — `\"x1,y1,x2,y2\"` normalized 0-1 when `face_detect=manual`\n- `reference_mask_mode` — `bbox_softfade` (default), `bbox_hard`, or `uniform`\n\nThe `emit_latent=passthrough` default is critical: extending the latent triggers learned end-keyframe behavior in the model regardless of conditioning placement. Pass the guide's effects through `reference_info` instead.\n\n---\n\n### 🪪 LTX Likeness Anchor\n\n**Category:** `10S Nodes\u002FIdentity`\n\nPer-block attn1 hook that pulls face-bbox video tokens toward reference identity features. Reads bbox from `reference_info` (LikenessGuide) or directly from `latent_frame_0`.\n\n**Key parameters:**\n- `model` (MODEL) — chain after LikenessGuide\n- `strength` — pull magnitude (typical 0.10-0.30)\n- `pull_mode` — `directional` (default, magnitude-preserving) or `additive` (legacy)\n- `reference_source` — `auto`, `guide`, or `latent_frame_0`\n- `sim_threshold` — cosine threshold for token matching (default 0.50)\n- `late_block_falloff` — strength reduction on last 12 blocks (default 0.0; raise to 0.3-0.4 if face appears rigid)\n- `depth_curve` — `flat` (default), `middle`, `late_focus`, `ramp_up`, `ramp_down`\n- `bypass` (BOOLEAN) — when True, properly removes prior hooks (no leak across runs)\n\n**Recommended config:**\n```\nstrength            = 0.10-0.18\npull_mode           = directional\nreference_source    = auto\nlate_block_falloff  = 0.4\ndepth_curve         = flat\n```\n\n**Important:** if chaining with Latent Anchor Aware (which also hooks attn1), the strengths *compound additively*. Each node's pull adds to the same attn1 residual. If using both, reduce individual strengths (e.g., AwareAnchor 0.08, LikenessAnchor 0.10 = combined effective ~0.18). At very high combined values, token-distribution variance can narrow visibly (color desaturation) — back off strength.\n\n---\n\n### 🪪 LTX Likeness Crop\n\n**Category:** `10S Nodes\u002FIdentity`\n\nStandalone face bbox cropper. Outputs cropped IMAGE of the detected face region plus the normalized bbox as STRING. Useful for previewing detection results or feeding face-only crops to other workflows.\n\n---\n\n### 🔎 LTX Latent Face Detector\n\n**Category:** `10S Nodes\u002FDiagnostic`\n\nStandalone face detection node. Returns the normalized bbox as STRING (`\"x1,y1,x2,y2\"` format) for manual wiring into LikenessGuide's `manual_face_bbox` or LikenessAnchor's `override_face_bbox`. MediaPipe with OpenCV Haar fallback.\n\n---\n\n### 🧠 LTX Likeness Semantic Clamp ⚠ *experimental*\n\n**Category:** `10S Nodes\u002FIdentity`\n\nSemantic-aware text-token suppression for face-modifier vocabulary. Identifies which positive-prompt tokens are face-modifier-like (smiling, frowning, expressions) via embedding-space correspondence to a vocabulary, then selectively suppresses those tokens' attention contribution to face-bbox video tokens.\n\n**Status:** Experimental. The mechanism is sound but the LTX2 text encoder's contextual blending produces flatter token similarities than typical CLIP-style encoders, making correspondence search noisier than ideal. Works best with `auto_threshold=p95` adaptive thresholding. Effects are subtle on most prompts; may be more pronounced on prompts with explicit expression directives.\n\n**Key parameters:**\n- `clip` (CLIP) — same encoder that produced your positive conditioning\n- `positive` (CONDITIONING) — analyzed for face-modifier tokens\n- `reference_info` (REFERENCE_INFO) — wire from LikenessGuide for bbox\n- `suppression_strength` — 0.3-0.8 (default 0.5)\n- `face_modifier_text` — comma-separated modifier vocabulary (default works for common prompts)\n- `auto_threshold` — `p95` (default, recommended), `p98`, `p99`, or `disabled`\n- `suppression_floor` — hard cutoff below which weights become 0 (default 0.3)\n- `top_k` — confirming matches required (default 3)\n\nDiagnostic output shows the per-token suppression distribution; a healthy run shows ~3-8% of tokens getting strong suppression. If diagnostic shows >40% suppressed, the threshold is too low for your encoder; if \u003C1%, too high.\n\n---\n\n### 💨 LTX Action Amplifier ⚠ *experimental*\n\n**Category:** `10S Nodes\u002FConditioning`\n\nSelectively amplifies action \u002F motion verb tokens in the positive prompt to make i2v output more responsive to verb-driven motion. Symmetric inverse of LikenessSemanticClamp — same correspondence-search backbone, but boosts matched tokens rather than suppressing them.\n\n**Status:** Experimental. Replaces the deprecated blanket Text Amplifier approach with token-selective scaling. Capped boost ceiling (default `scale_ceiling=0.30`, so max +30% K\u002FV scaling per matched token) keeps modifications controlled. Effect is subtle by design.\n\n**Key parameters:**\n- `clip` (CLIP), `positive` (CONDITIONING) — same as Semantic Clamp\n- `amplification_strength` — 0.0-1.0 (default 0.3)\n- `scale_ceiling` — maximum K\u002FV scale factor delta (default 0.30 = max +30%)\n- `auto_threshold` — `p95` default\n- `action_vocabulary_text` — comma-separated verb vocabulary\n\nUnlike Semantic Clamp, this applies uniformly across all video tokens (no bbox) — actions affect the whole frame.\n\n---\n\n## Supporting Nodes\n\n### 🎯 LTX Latent Anchor\n\n**Category:** `10S Nodes\u002FIdentity`\n\nContent-blind variant of Latent Anchor Aware — same caching mechanism without the reference-image energy weighting. Use when you want the anchoring effect without supplying an external reference. The Aware variant supersedes this for most use cases.\n\n---\n\n### 🔍 LTX Latent Upsampler (Tiled)\n\n**Category:** `10S Nodes\u002FLatent`\n\nDrop-in replacement for ComfyUI's stock `LTXVLatentUpsampler`. Adds spatial tiling with cosine-windowed overlap blending to address upscale-model failure modes at extreme aspect ratios. Auto-detects upscale ratio (works with x1.5, x2, etc.).\n\n**Key parameters:**\n- `tile_size` — spatial tile size in latent tokens (default 24, ~768 pixels)\n- `overlap` — overlap between adjacent tiles (default 8)\n- `max_size_for_no_tile` — skip tiling if both spatial dims ≤ this (default 32)\n- `rotate_for_landscape` — transpose H\u002FW before upscaling, then rotate back (experimental)\n\nFor most cases the defaults work. Lower `max_size_for_no_tile` to force tiling on smaller inputs for testing.\n\n---\n\n### 🔊 LTX Text Attention Amplifier ⚠ *deprecated*\n\n**Category:** `10S Nodes\u002FIdentity`\n\nOriginal blanket text cross-attention amplifier. Multiplies attn2 output uniformly. **Deprecated** in favor of the token-selective Action Amplifier — blanket amplification was found to produce noise at meaningful strength values. Kept for backward compatibility with existing workflows.\n\n---\n\n### 🔍 LTX Model Inspector\n\n**Category:** `10S Nodes\u002FDiagnostic`\n\nDiagnostic node for inspecting LTX2 model structure — useful when developing new hook-based nodes. Lists transformer modules, prints parameter counts, traces tensor shapes through the forward chain.\n\n---\n\n## Recommended Workflow Patterns\n\n### Two-pass I2V with full upscale-pass quality recovery\n\n```\nFirst-pass model\n        ↓\n   KSampler (first pass at native resolution)\n        ↓\n  Upsample (LTX Latent Upsampler Tiled)\n        ↓\n  Conditioning re-application\n        ↓\n  LTX Tiled Sampler (light refinement, audio_pass=tile_carrying)\n        ↓\n  VAE Decode\n```\n\nThis is the workflow that motivated most of this package's development. The Tiled Sampler at the second-pass position is what converts a previously broken upscale pass into clean, lipsync-preserved output.\n\n### Single-pass I2V with identity preservation\n\n```\nLoadModel → LTX Latent Anchor Aware → KSampler → VAE Decode\n                  ↓\n            sigmas, reference_image, vae\n```\n\nFor face-targeted preservation, chain the Likeness suite:\n\n```\nLikenessGuide(image) ──→ pos', neg', reference_info\n                              ↓\nLikenessAnchor(model, reference_info, strength=0.10-0.18, pull_mode=directional)\n                              ↓\nKSampler → VAE Decode\n```\n\n### Maximum identity preservation: LoRA + inference-time stabilization\n\nFor unique faces (subjects unknown to the base model), the right approach is to train a subject-specific LoRA via Lightricks' [LTX-Video-Trainer](https:\u002F\u002Fgithub.com\u002FLightricks\u002FLTX-Video) (typically 30 images, ~15 minutes). Then use this package's nodes as a stabilization layer on top:\n\n```\nBase model + Subject LoRA\n        ↓\nLikenessGuide → LikenessAnchor → KSampler → VAE Decode\n```\n\nLoRA provides the identity; LikenessAnchor stabilizes against drift during difficult prompts. This combination produces better results than either approach alone.\n\n### Combined identity + scene stabilization (chained)\n\n```\nModel → LTX Latent Anchor Aware → LTX Likeness Anchor → KSampler\n```\n\nBoth nodes hook `attn1` with different sentinel attributes — they coexist. Important: their pull strengths *compound additively*, so reduce individual strengths when chained (try AwareAnchor 0.08 + LikenessAnchor 0.10).\n\n---\n\n## Architecture Notes\n\nAll identity nodes operate via PyTorch `register_forward_hook` on `transformer_blocks[i].attn1` (or `attn2` for the experimental token-selective nodes). Hooks return modified output tensors that flow forward to the next block's input.\n\n**Why hook intervention works:** LTX2's DiT is content-blind to its own intermediate states. Modifying attention output adds a residual that the rest of the block's computation (cross-attention, FFN) integrates naturally. Strength values that look small numerically (0.10-0.20) compound across 48 blocks per step into substantive effects.\n\n**The inference-time ceiling:** activation-level interventions can stabilize identity that the model already knows, but cannot impart new identity knowledge. For unique faces, a short LoRA training run (~15 minutes) imparts the identity at the weight level and produces dramatically better preservation. This package complements LoRA workflows rather than replacing them.\n\n**Per-frame centered cosine similarity** is used throughout for matching. Raw cosine similarity is dominated by common-mode features (positional encoding, scaffold features) that all tokens share. Subtracting the per-frame mean leaves identity-specific deviations that discriminate cleanly.\n\n**Cache-and-broadcast** (used by Latent Anchor Aware) snapshots the model's representation at peak conditioning alignment (mid-sampling) and uses it as a stable pull target. Conceptually adjacent to self-distillation and self-conditioning in diffusion, but operationalized as inference-time intervention rather than training-time signal.\n\n**Tiled Sampler architecture** does not use MultiDiffusion-style per-step coordination. Instead, each tile runs the full sampling pipeline as a standalone clip at training-distribution token count, then results blend with cosine windows. This works for light refinement passes (low denoise, few steps) where per-tile divergence is bounded. The carrier tile uses a single wrapper sampling pass for video and audio together — the model's video-audio cross-attention runs naturally during that pass, and we extract both modalities from the flattened combined output.\n\n**Bypass-safe hook management:** all hook-based nodes store PyTorch handle references and actively remove prior hooks when bypassed or re-applied. This prevents hooks from prior runs leaking into subsequent runs when `model.clone()` shares the underlying transformer blocks by reference.\n\n---\n\n## Compatibility & Limitations\n\n- **LTX2-specific.** Hooks rely on `LTXAVModel.transformer_blocks` and `BasicAVTransformerBlock` structure. Will not work on other DiT-based video models without adaptation.\n- **Distilled CFG=1 setup tested most extensively.** Standard CFG works but compounds hook calls per step (cond + uncond passes), changing effective strength. Set `forwards_per_step=2` in advanced mode.\n- **Tiled Sampler is for light refinement only.** Heavy-denoise-from-noise generation produces tile divergence that the cosine blend can't reconcile. Use the full sampler for first-pass; tiled for upscale refinement only.\n- **Inference-time ceiling for unique faces.** Activation-level interventions stabilize known identity; they don't teach new identity. For unique faces unknown to the base model, train a LoRA. The Likeness suite is best as a stabilization complement to LoRA workflows.\n\n---\n\n## Version History\n\n**v1.6.0** — Likeness suite + experimental conditioning nodes\n- Added LTX Likeness Guide \u002F Anchor \u002F Crop (face-region identity preservation, complement to LoRA)\n- Added LTX Likeness Semantic Clamp (experimental, token-selective text-side suppression)\n- Added LTX Action Amplifier (experimental, token-selective verb amplification)\n- Added LTX Latent Face Detector (MediaPipe + OpenCV bbox detection)\n- Removed LTX Face Attention Anchor (replaced by the Likeness Anchor architecture)\n- Marked LTX Text Attention Amplifier as deprecated (blanket-scaling approach superseded by Action Amplifier's token-selective approach)\n- Bypass-safe hook management across all identity nodes (stored handles, active cleanup)\n\n**v1.2.0** — Audio-aware tiled sampling release\n- Added LTX Tiled Sampler v2.0 (whole-pipeline spatial tiling with carrier-tile audio capture for lipsync preservation)\n- Removed `LTXLatentColorRestore` and `LTXLatentOutlierSuppress` — these were investigative attempts at the upscale-pass color drift problem; the Tiled Sampler addresses the root cause directly so they're no longer needed\n- Added LTX Text Attention Amplifier v1.1 (alternative approach to upscale-pass dilution)\n- Latent Upsampler Tiled v1.1 (auto-detect upscale ratio, fix window-fade math for partial overlaps)\n\n**v1.0.0** — Initial release\n- Face Attention Anchor (v4.0)\n- Latent Anchor (v1.4)\n- Latent Anchor Aware (v2.3)\n- Model Inspector\n\n---\n\n## License\n\nMIT\n\n## Author\n\nTenStrip · [github.com\u002FTenStrip](https:\u002F\u002Fgithub.com\u002FTenStrip)\n","10S-Comfy-nodes 是一个专为 LTX2.3 视频扩散模型设计的自定义节点集合，旨在提升身份保留、潜空间稳定性和放大处理质量。该项目通过在 DiT 主干上使用 PyTorch 前向钩子来实现功能，无需重新训练模型。核心节点如 LTX Tiled Sampler 通过分块采样解决了放大过程中出现的颜色偏移和条件漂移问题，确保视频和音频的同步。这些节点特别适用于需要高质量输出的视频生成场景，特别是对于基于 LTX2\u002FLTX-AV 的双流视频+音频处理。用户可以通过简单的安装步骤将其集成到 ComfyUI 中，并利用 MediaPipe 进行更准确的脸部检测以增强效果。",2,"2026-06-11 03:53:01","CREATED_QUERY"]