Fix LTX2 connector token/register layout (regression from #13564) by Boffee · Pull Request #13931 · huggingface/diffusers

Boffee · 2026-06-12T19:52:08Z

What does this PR do?

The problem

LTX2ConnectorTransformer1d.forward replaces the padding slots of the (left-padded) text sequence with learnable registers and then front-aligns the embeddings with torch.flip(hidden_states, dims=[1]). The flip does move the embeddings to the front, but it also reverses the token order and the register tile. The connector blocks apply RoPE, so this layout is part of what the LTX models were trained on — the original implementation (ltx_core _replace_padded_with_learnable_registers, matched by ComfyUI) front-aligns the valid tokens in their original order and fills the tail with the tiled registers by absolute position.

Toy example — 8 slots, 3 valid tokens t1 t2 t3 (left-padded), register tile R0 R1 R2 R3:

reference (ltx_core / ComfyUI):  [t1 t2 t3 | R3 R0 R1 R2 R3]
main (flip):                     [t3 t2 t1 | R0 R3 R2 R1 R0]

Even a full-length prompt with no padding comes out reversed. Short prompts (typically the negative prompt, whose 1024-slot context is mostly registers) are distorted the worst, so CFG quality is hit hardest. Measured with the real diffusers/LTX-2.3-Diffusers connector weights, main's output correlates with the reference layout's output at only 0.11–0.34 in the token region; with this fix the output matches ComfyUI's independent implementation of the same checkpoint at correlation 1.000.

Where it was introduced

#12915 ported this correctly (per-row boolean-mask gather). #13564 (ebaa1871) replaced the gather — which forces a GPU→CPU sync due to data-dependent indexing — with the vectorized masked-write + flip, unintentionally changing the semantics. The regression is on main only (v0.38.0 predates it), so fixing it now keeps it out of the next release.

The fix

Replace the masked-write + flip with a stable argsort of the inverted mask + torch.gather + torch.where: valid tokens move to the front in original order (stable sort preserves relative order), registers fill the tail by absolute position, computed per batch row (the pipelines batch negative+positive prompts of different lengths). All ops are fixed-shape device ops, so #13564's sync-elimination goal is preserved — no data-dependent indexing, no host sync.

Tests

tests/pipelines/ltx2/test_ltx2_connectors.py checks the exact layout semantics through the module forward (num_layers=0 reduces forward to layout + final RMSNorm) for left-padded, mixed-length batch, fully-valid, and single-token inputs. The tests fail on main and pass with the fix:

$ python -m pytest tests/pipelines/ltx2/test_ltx2_connectors.py -q   # on main
4 failed in 0.40s

$ python -m pytest tests/pipelines/ltx2/test_ltx2_connectors.py -q   # with this PR
4 passed in 0.57s

$ python -m pytest tests/pipelines/ltx2/ -q                          # with this PR
169 passed, 5 skipped, 34 warnings in 519.62s (0:08:39)

Before submitting

Did you read the contributor guideline?
Was this discussed via a GitHub issue? LTX2 text connectors pass reversed prompt tokens and misplaced registers to the transformer (regression from #13564) #13930
Did you write any new necessary tests?

This investigation and fix were developed with AI assistance (Claude Code) and verified end-to-end as described above and in #13930.

Who can review?

@dg845 @sayakpaul

🤖 Generated with Claude Code

…tation The connector replaced left-padding positions with the tiled registers and then flipped the whole sequence, which put the prompt tokens at the front in reversed order and the register tile reversed within each block. The original LTX implementation (ltx-core _replace_padded_with_learnable_registers, also matched by ComfyUI) front-aligns the valid tokens in their original order and fills the tail with registers indexed by absolute position. Since the connector blocks apply RoPE, the reversed layout produces off-distribution embeddings; short prompts (e.g. negative prompts, whose context is mostly registers) are hit hardest, which manifests as overblown CFG: at cfg > 1 (or CFG++ samplers at cfg 1) the unconditional branch is computed from a mostly-register context with scrambled positions. Replace the fill+flip with a stable-argsort gather (valid tokens to the front, order preserved, per batch row) and fill the tail with the absolute-position register tile. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Boffee and others added 2 commits June 12, 2026 14:39

Add register-layout regression tests for the LTX2 text connectors

4c061aa

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions Bot added tests size/M PR with diff < 200 LOC pipelines fixes-issue labels Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LTX2 connector token/register layout (regression from #13564)#13931

Fix LTX2 connector token/register layout (regression from #13564)#13931
Boffee wants to merge 2 commits into
huggingface:mainfrom
Boffee:fix-ltx2-connector-register-layout

Boffee commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Boffee commented Jun 12, 2026

What does this PR do?

The problem

Where it was introduced

The fix

Tests

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant