Skip to content

Fix LTX2 connector token/register layout (regression from #13564)#13931

Open
Boffee wants to merge 2 commits into
huggingface:mainfrom
Boffee:fix-ltx2-connector-register-layout
Open

Fix LTX2 connector token/register layout (regression from #13564)#13931
Boffee wants to merge 2 commits into
huggingface:mainfrom
Boffee:fix-ltx2-connector-register-layout

Conversation

@Boffee

@Boffee Boffee commented Jun 12, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes #13930.

The problem

LTX2ConnectorTransformer1d.forward replaces the padding slots of the (left-padded) text sequence with learnable registers and then front-aligns the embeddings with torch.flip(hidden_states, dims=[1]). The flip does move the embeddings to the front, but it also reverses the token order and the register tile. The connector blocks apply RoPE, so this layout is part of what the LTX models were trained on — the original implementation (ltx_core _replace_padded_with_learnable_registers, matched by ComfyUI) front-aligns the valid tokens in their original order and fills the tail with the tiled registers by absolute position.

Toy example — 8 slots, 3 valid tokens t1 t2 t3 (left-padded), register tile R0 R1 R2 R3:

reference (ltx_core / ComfyUI):  [t1 t2 t3 | R3 R0 R1 R2 R3]
main (flip):                     [t3 t2 t1 | R0 R3 R2 R1 R0]

Even a full-length prompt with no padding comes out reversed. Short prompts (typically the negative prompt, whose 1024-slot context is mostly registers) are distorted the worst, so CFG quality is hit hardest. Measured with the real diffusers/LTX-2.3-Diffusers connector weights, main's output correlates with the reference layout's output at only 0.11–0.34 in the token region; with this fix the output matches ComfyUI's independent implementation of the same checkpoint at correlation 1.000.

Where it was introduced

#12915 ported this correctly (per-row boolean-mask gather). #13564 (ebaa1871) replaced the gather — which forces a GPU→CPU sync due to data-dependent indexing — with the vectorized masked-write + flip, unintentionally changing the semantics. The regression is on main only (v0.38.0 predates it), so fixing it now keeps it out of the next release.

The fix

Replace the masked-write + flip with a stable argsort of the inverted mask + torch.gather + torch.where: valid tokens move to the front in original order (stable sort preserves relative order), registers fill the tail by absolute position, computed per batch row (the pipelines batch negative+positive prompts of different lengths). All ops are fixed-shape device ops, so #13564's sync-elimination goal is preserved — no data-dependent indexing, no host sync.

Tests

tests/pipelines/ltx2/test_ltx2_connectors.py checks the exact layout semantics through the module forward (num_layers=0 reduces forward to layout + final RMSNorm) for left-padded, mixed-length batch, fully-valid, and single-token inputs. The tests fail on main and pass with the fix:

$ python -m pytest tests/pipelines/ltx2/test_ltx2_connectors.py -q   # on main
4 failed in 0.40s

$ python -m pytest tests/pipelines/ltx2/test_ltx2_connectors.py -q   # with this PR
4 passed in 0.57s

$ python -m pytest tests/pipelines/ltx2/ -q                          # with this PR
169 passed, 5 skipped, 34 warnings in 519.62s (0:08:39)

Before submitting

This investigation and fix were developed with AI assistance (Claude Code) and verified end-to-end as described above and in #13930.

Who can review?

@dg845 @sayakpaul

🤖 Generated with Claude Code

Boffee and others added 2 commits June 12, 2026 14:39
…tation

The connector replaced left-padding positions with the tiled registers and
then flipped the whole sequence, which put the prompt tokens at the front in
reversed order and the register tile reversed within each block. The original
LTX implementation (ltx-core _replace_padded_with_learnable_registers, also
matched by ComfyUI) front-aligns the valid tokens in their original order and
fills the tail with registers indexed by absolute position.

Since the connector blocks apply RoPE, the reversed layout produces
off-distribution embeddings; short prompts (e.g. negative prompts, whose
context is mostly registers) are hit hardest, which manifests as overblown
CFG: at cfg > 1 (or CFG++ samplers at cfg 1) the unconditional branch is
computed from a mostly-register context with scrambled positions.

Replace the fill+flip with a stable-argsort gather (valid tokens to the
front, order preserved, per batch row) and fill the tail with the
absolute-position register tile.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LTX2 text connectors pass reversed prompt tokens and misplaced registers to the transformer (regression from #13564)

1 participant