Last updated: 2026-06-05
Note
Experimental repo — it moves fast and changes often. Treat these as experimental nodes: fork it, customize it, or point your agent of choice at the codebase and take the patterns you like (and skip the ones you don't).
Custom ComfyUI nodes for full-length music video generation with LTX 2.3.
Drives loop timing from integer-latent counts, freezes audio via
noise_mask=0, pre-encodes prompts once outside the loop. Originally built this repo as a few helper nodes for experimenting with
kijai's LTX 2.3 long-loop extension - thanks to Kijai for all his work, and for giving me some fun ideas to explore.
Two cuts of the same 1000-step experiment, released as
LTX-2.3-22b-IC-LoRA-Audio-Only-Context
— an IC-LoRA where the in-context reference is only audio (no image, no
video). Load either with the audio IC-LoRA nodes via the single-pass workflow
(see the workflow table below); background + node
mechanics: docs/audio_iclora/index.md.
| Adapter | What it's supposed to do (in theory) | Suggested start |
|---|---|---|
cross_modal_step_01000 |
Adapts the audio stack and the cross-modal bridges — gives the audio reference a direct path into the video stream. Default pick if you want the audio to move the picture (mannerisms, energy, expression timing). | Strength ~0.5 (working band ~0.3–0.75, reference-dependent). With the per-stream loader, push bridge_strength above audio_strength to amplify the audio→video coupling. |
audio_only_step_01000 |
Adapts the audio stack only — the reference shapes the generated audio; the video follows via the frozen base coupling. Subtler, more emergent video effect; perturbs the base video path least. | Strength ~0.5 (same band). No bridge keys, so bridge_strength is inert. |
Both are early proof-of-concepts: many variables influence the end result (reference level/quality/content, prompt, seed), each cut has its own strengths, weaknesses, and tradeoffs, and both may behave differently when retrained on a better dataset. Reproduce or extend: data recipe · training config · LTX-2 train fork (audio-only IC-LoRA strategy).
First proof-of-concept run (the pitch "Helium" probe): LTX-2.3-22b-IC-LoRA-Helium.
Click play — sound on. The audio is frozen and drives the picture; the motion is generated against the waveform, not added after.
heartbeat_30s_github.mp4
30s of one continuous 2:53 render — a painted heart pulsing to a drum
loop. Workflow: audio_reactive_loop.json ·
writeup
▶ Audio-reactive, second take
dino_20s_web.mp4
Same drumbeat as the heart, different init — the stomps land on the beat.
▶ One 5-second audio clip, four takes — a sketch line drives the whole scene
sketch_take_a_github.mp4
sketch_take_b_github.mp4
sketch_take_c_github.mp4
sketch_take_d_github.mp4
Four generations from the same ~5s comedy-sketch audio + one init image — the audio alone sets the performance: timing, delivery, gesture.
▶ Full-length music video — the default workflow
bodyremembers_open30s_github.mp4
Opening 30s of one continuous 2:51 render from
audio-loop-music-video_latent.json —
one song + one init image + a timestamp prompt schedule.
▶ One image, three songs — the audio reshapes everything
audiocompare_a_20s_github.mp4
audiocompare_b_20s_github.mp4
astronaut_open30s_github.mp4
Three full renders through the default workflow from the same single init image; the first two share essentially the same prompt (the astronaut run differs). The audio is the change — and it redraws pacing, motion, and mood.
Three ways in:
- "Just show me." Demos above; more variants in the workflow table below. The model-card examples cover the audio-steering (IC-LoRA) side. To run one yourself: open the default workflow (Quick start), drop a song + an image, run.
- "I want to use it." Quick start below, then the docs hub:
docs/README.md— the task-first index ("I want to do X, which doc?"). Power-user repo; assumes you know ComfyUI. - "I want to verify, reproduce, or extend it." Architecture walkthrough:
docs/architecture_overview.md, then per-node docstrings +docs/reference/. The audio IC-LoRA training story:docs/audio_iclora/index.md+ the trained adapters above (data recipe + config + train fork). Invariants are enforced as code — the pytest suite and the workflow-topology audit (scripts/audit_workflows.py) run in CI.
Open example_workflows/audio-loop-music-video_latent.json in ComfyUI.
The workflow documents itself via group titles, node titles, and Note nodes.
Four things to set:
- LoadAudio — drop your song.
- LoadImage — drop the init image (any size; auto-resized adaptively; matches the first scene visually).
- start_seed — any int.
- TimestampPromptScheduleBatchEncode — paste the schedule. The initial-render prompt is read from the
0:00entry.
Generate the schedule from your song:
uv sync --group analysis
uv run --group analysis python scripts/analyze_audio_features.py your_song.wav \
--subject "your scene description" --trim 5Run. LoRAs + IC-LoRA scaffolding ship bypassed-by-default — un-bypass when you
need them; knobs (like first_frame_guide_strength: 1.0 = max identity lock,
lower for expressivity) are annotated in the workflow itself. For the rest:
- Prompt authoring — verb choice, token budget, continuation framing:
docs/guides/prompt_creation_guide.md - Audio analysis — all flags, scene-diversity tiers, JSON export:
docs/guides/audio_analysis_guide.md - End-to-end LLM schedule workflow (init image → VLM → schedule):
docs/guides/prompt_workflow_end_to_end.md
Required custom nodes:
| Repo | Provides |
|---|---|
| ComfyUI-LTXVideo | LTX 2.3 nodes (LTXVAddLatentGuide, LTXVCropGuides, LTXVPreprocess, IC-LoRA) |
| ComfyUI-NativeLooping_testing | TensorLoopOpen / TensorLoopClose |
| ComfyUI-KJNodes | Set/Get nodes, LTX2_NAG, LTXVImgToVideoInplaceKJ, ImageResizeKJv2, and more |
| ComfyUI-VideoHelperSuite | VHS_LoadVideo, VHS_VideoCombine |
Optional: ComfyUI-MelBandRoFormer for vocal separation (bypassed by default). Companion repo: fblissjr/comfy-workbench (shared tooling + conventions across my ComfyUI work).
The sister fork fblissjr/SageAttention-ada
is tested and optimized for RTX 4090 / Ada architectures. You don't need
it — unless you use our sage node: the shipped workflows wire
AudioLoopHelperSageAttention (auto mode), which expects this build. No
build, or different hardware? Bypass the node (mode=4) or swap in KJNodes
sage — everything else works without it. Deep dive:
docs/reference/sage_attention.md.
Shipped at top-level example_workflows/:
| File | What it does | Detail |
|---|---|---|
audio-loop-music-video_latent.json |
Default — start here. Full-length music video: i2v init + your full audio track frozen; loops overlapping windows so the video tracks the song end-to-end. | docs/architecture_overview.md |
audio-loop-music-video_latent_av_inversion.json |
Video → audio. Dialogue replacement / voice-clone dub over held footage. | docs/guides/dialogue_replacement_guide.md |
audio-loop-music-video_latent_keyframe.json |
Per-section keyframe re-anchoring — combats drift on long renders; scene changes synced to song structure. | example_workflows/working_docs/keyframe_iter_anchor_design.md |
audio-loop-music-video_retake.json |
Regenerate one section — re-roll a [start, end] window, rest held as fixed context. |
docs/guides/retake_guide.md |
audio_reactive_loop.json |
Audio-driven motion — init image animated so its motion tracks the (frozen) audio. | docs/experimental/audio_reactive_workflows.md |
audio-ic-lora_single-pass.json |
Audio-reference IC-LoRA (single pass) — steer a render from a reference audio clip, using the trained adapters above. | docs/audio_iclora/index.md |
decode-latent-to-video.json |
Crash recovery — loop workflows save the assembled latent before the final decode; if a render dies there, this decodes the saved .latent to the finished video in temporal chunks (bounded RAM at any song length; copy it into ComfyUI's input dir first). |
docs/reference/benchmarking_memory_pressure.md |
More variants in example_workflows/experimental/ (paired with run logs in
docs/experiments/; inventory in docs/experimental/README.md);
retired ones in example_workflows/archive/. Design notes for the shipped
variants live in example_workflows/working_docs/.
# topology checks + generic invariants across shipped workflows
uv run --group dev python scripts/audit_workflows.py/diagnose-workflow is the canonical first-pass when something won't run.
Tooling reference: docs/reference/debug_tools.md.
Symptom-first quality troubleshooting: docs/guides/debugging_guide.md.
Opt-in, env-var-gated JSONL instruments for your own performance debugging
(attention trace, exec log, offline summarizer). Local files only.
Details: docs/reference/telemetry_and_tracing.md.
nodes*.py runtime nodes (entry: comfy_entrypoint() in nodes.py)
scripts/ apply scripts + audit + analysis utilities
docs/ public docs — task-first nav at docs/README.md
example_workflows/ shipped workflow variants (+ working_docs/ design notes)
internal/ gitignored design + analysis + experiment notes
.claude/ shared Claude Code harness (subagents, skills, hooks)
Per-node API + wiring: each runtime class's docstring + docs/reference/ltx23_model_reference.md.
Project conventions for editing this repo: CLAUDE.md.
See LICENSE.
