Skip to content

fblissjr/ComfyUI-AudioLoopHelper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

732 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ComfyUI-AudioLoopHelper

ComfyUI-AudioLoopHelper

Last updated: 2026-06-05

Note

Experimental repo — it moves fast and changes often. Treat these as experimental nodes: fork it, customize it, or point your agent of choice at the codebase and take the patterns you like (and skip the ones you don't).

Custom ComfyUI nodes for full-length music video generation with LTX 2.3. Drives loop timing from integer-latent counts, freezes audio via noise_mask=0, pre-encodes prompts once outside the loop. Originally built this repo as a few helper nodes for experimenting with kijai's LTX 2.3 long-loop extension - thanks to Kijai for all his work, and for giving me some fun ideas to explore.

Trained adapters (audio-reference IC-LoRA)

Two cuts of the same 1000-step experiment, released as LTX-2.3-22b-IC-LoRA-Audio-Only-Context — an IC-LoRA where the in-context reference is only audio (no image, no video). Load either with the audio IC-LoRA nodes via the single-pass workflow (see the workflow table below); background + node mechanics: docs/audio_iclora/index.md.

Adapter What it's supposed to do (in theory) Suggested start
cross_modal_step_01000 Adapts the audio stack and the cross-modal bridges — gives the audio reference a direct path into the video stream. Default pick if you want the audio to move the picture (mannerisms, energy, expression timing). Strength ~0.5 (working band ~0.3–0.75, reference-dependent). With the per-stream loader, push bridge_strength above audio_strength to amplify the audio→video coupling.
audio_only_step_01000 Adapts the audio stack only — the reference shapes the generated audio; the video follows via the frozen base coupling. Subtler, more emergent video effect; perturbs the base video path least. Strength ~0.5 (same band). No bridge keys, so bridge_strength is inert.

Both are early proof-of-concepts: many variables influence the end result (reference level/quality/content, prompt, seed), each cut has its own strengths, weaknesses, and tradeoffs, and both may behave differently when retrained on a better dataset. Reproduce or extend: data recipe · training config · LTX-2 train fork (audio-only IC-LoRA strategy).

First proof-of-concept run (the pitch "Helium" probe): LTX-2.3-22b-IC-LoRA-Helium.

Demos

Click play — sound on. The audio is frozen and drives the picture; the motion is generated against the waveform, not added after.

heartbeat_30s_github.mp4

30s of one continuous 2:53 render — a painted heart pulsing to a drum loop. Workflow: audio_reactive_loop.json · writeup

▶ Audio-reactive, second take
dino_20s_web.mp4

Same drumbeat as the heart, different init — the stomps land on the beat.

▶ One 5-second audio clip, four takes — a sketch line drives the whole scene
sketch_take_a_github.mp4
sketch_take_b_github.mp4
sketch_take_c_github.mp4
sketch_take_d_github.mp4

Four generations from the same ~5s comedy-sketch audio + one init image — the audio alone sets the performance: timing, delivery, gesture.

▶ Full-length music video — the default workflow
bodyremembers_open30s_github.mp4

Opening 30s of one continuous 2:51 render from audio-loop-music-video_latent.json — one song + one init image + a timestamp prompt schedule.

▶ One image, three songs — the audio reshapes everything
audiocompare_a_20s_github.mp4
audiocompare_b_20s_github.mp4
astronaut_open30s_github.mp4

Three full renders through the default workflow from the same single init image; the first two share essentially the same prompt (the astronaut run differs). The audio is the change — and it redraws pacing, motion, and mood.

Three ways in:

  • "Just show me." Demos above; more variants in the workflow table below. The model-card examples cover the audio-steering (IC-LoRA) side. To run one yourself: open the default workflow (Quick start), drop a song + an image, run.
  • "I want to use it." Quick start below, then the docs hub: docs/README.md — the task-first index ("I want to do X, which doc?"). Power-user repo; assumes you know ComfyUI.
  • "I want to verify, reproduce, or extend it." Architecture walkthrough: docs/architecture_overview.md, then per-node docstrings + docs/reference/. The audio IC-LoRA training story: docs/audio_iclora/index.md + the trained adapters above (data recipe + config + train fork). Invariants are enforced as code — the pytest suite and the workflow-topology audit (scripts/audit_workflows.py) run in CI.

Quick start

Open example_workflows/audio-loop-music-video_latent.json in ComfyUI. The workflow documents itself via group titles, node titles, and Note nodes. Four things to set:

  1. LoadAudio — drop your song.
  2. LoadImage — drop the init image (any size; auto-resized adaptively; matches the first scene visually).
  3. start_seed — any int.
  4. TimestampPromptScheduleBatchEncode — paste the schedule. The initial-render prompt is read from the 0:00 entry.

Generate the schedule from your song:

uv sync --group analysis
uv run --group analysis python scripts/analyze_audio_features.py your_song.wav \
  --subject "your scene description" --trim 5

Run. LoRAs + IC-LoRA scaffolding ship bypassed-by-default — un-bypass when you need them; knobs (like first_frame_guide_strength: 1.0 = max identity lock, lower for expressivity) are annotated in the workflow itself. For the rest:

Dependencies

Required custom nodes:

Repo Provides
ComfyUI-LTXVideo LTX 2.3 nodes (LTXVAddLatentGuide, LTXVCropGuides, LTXVPreprocess, IC-LoRA)
ComfyUI-NativeLooping_testing TensorLoopOpen / TensorLoopClose
ComfyUI-KJNodes Set/Get nodes, LTX2_NAG, LTXVImgToVideoInplaceKJ, ImageResizeKJv2, and more
ComfyUI-VideoHelperSuite VHS_LoadVideo, VHS_VideoCombine

Optional: ComfyUI-MelBandRoFormer for vocal separation (bypassed by default). Companion repo: fblissjr/comfy-workbench (shared tooling + conventions across my ComfyUI work).

SageAttention fork (optional)

The sister fork fblissjr/SageAttention-ada is tested and optimized for RTX 4090 / Ada architectures. You don't need it — unless you use our sage node: the shipped workflows wire AudioLoopHelperSageAttention (auto mode), which expects this build. No build, or different hardware? Bypass the node (mode=4) or swap in KJNodes sage — everything else works without it. Deep dive: docs/reference/sage_attention.md.

Workflow variants

Shipped at top-level example_workflows/:

File What it does Detail
audio-loop-music-video_latent.json Default — start here. Full-length music video: i2v init + your full audio track frozen; loops overlapping windows so the video tracks the song end-to-end. docs/architecture_overview.md
audio-loop-music-video_latent_av_inversion.json Video → audio. Dialogue replacement / voice-clone dub over held footage. docs/guides/dialogue_replacement_guide.md
audio-loop-music-video_latent_keyframe.json Per-section keyframe re-anchoring — combats drift on long renders; scene changes synced to song structure. example_workflows/working_docs/keyframe_iter_anchor_design.md
audio-loop-music-video_retake.json Regenerate one section — re-roll a [start, end] window, rest held as fixed context. docs/guides/retake_guide.md
audio_reactive_loop.json Audio-driven motion — init image animated so its motion tracks the (frozen) audio. docs/experimental/audio_reactive_workflows.md
audio-ic-lora_single-pass.json Audio-reference IC-LoRA (single pass) — steer a render from a reference audio clip, using the trained adapters above. docs/audio_iclora/index.md
decode-latent-to-video.json Crash recovery — loop workflows save the assembled latent before the final decode; if a render dies there, this decodes the saved .latent to the finished video in temporal chunks (bounded RAM at any song length; copy it into ComfyUI's input dir first). docs/reference/benchmarking_memory_pressure.md

More variants in example_workflows/experimental/ (paired with run logs in docs/experiments/; inventory in docs/experimental/README.md); retired ones in example_workflows/archive/. Design notes for the shipped variants live in example_workflows/working_docs/.

Validation + debugging

# topology checks + generic invariants across shipped workflows
uv run --group dev python scripts/audit_workflows.py

/diagnose-workflow is the canonical first-pass when something won't run. Tooling reference: docs/reference/debug_tools.md. Symptom-first quality troubleshooting: docs/guides/debugging_guide.md.

Local profiling (off by default)

Opt-in, env-var-gated JSONL instruments for your own performance debugging (attention trace, exec log, offline summarizer). Local files only. Details: docs/reference/telemetry_and_tracing.md.

Layout

nodes*.py             runtime nodes (entry: comfy_entrypoint() in nodes.py)
scripts/              apply scripts + audit + analysis utilities
docs/                 public docs — task-first nav at docs/README.md
example_workflows/    shipped workflow variants (+ working_docs/ design notes)
internal/             gitignored design + analysis + experiment notes
.claude/              shared Claude Code harness (subagents, skills, hooks)

Per-node API + wiring: each runtime class's docstring + docs/reference/ltx23_model_reference.md. Project conventions for editing this repo: CLAUDE.md.

License

See LICENSE.

About

ComfyUI-AudioLoopHelper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors