Fix LTX2 connector token/register layout (regression from #13564)#13931
Open
Boffee wants to merge 2 commits into
Open
Fix LTX2 connector token/register layout (regression from #13564)#13931Boffee wants to merge 2 commits into
Boffee wants to merge 2 commits into
Conversation
…tation The connector replaced left-padding positions with the tiled registers and then flipped the whole sequence, which put the prompt tokens at the front in reversed order and the register tile reversed within each block. The original LTX implementation (ltx-core _replace_padded_with_learnable_registers, also matched by ComfyUI) front-aligns the valid tokens in their original order and fills the tail with registers indexed by absolute position. Since the connector blocks apply RoPE, the reversed layout produces off-distribution embeddings; short prompts (e.g. negative prompts, whose context is mostly registers) are hit hardest, which manifests as overblown CFG: at cfg > 1 (or CFG++ samplers at cfg 1) the unconditional branch is computed from a mostly-register context with scrambled positions. Replace the fill+flip with a stable-argsort gather (valid tokens to the front, order preserved, per batch row) and fill the tail with the absolute-position register tile. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #13930.
The problem
LTX2ConnectorTransformer1d.forwardreplaces the padding slots of the (left-padded) text sequence with learnable registers and then front-aligns the embeddings withtorch.flip(hidden_states, dims=[1]). The flip does move the embeddings to the front, but it also reverses the token order and the register tile. The connector blocks apply RoPE, so this layout is part of what the LTX models were trained on — the original implementation (ltx_core_replace_padded_with_learnable_registers, matched by ComfyUI) front-aligns the valid tokens in their original order and fills the tail with the tiled registers by absolute position.Toy example — 8 slots, 3 valid tokens
t1 t2 t3(left-padded), register tileR0 R1 R2 R3:Even a full-length prompt with no padding comes out reversed. Short prompts (typically the negative prompt, whose 1024-slot context is mostly registers) are distorted the worst, so CFG quality is hit hardest. Measured with the real
diffusers/LTX-2.3-Diffusersconnector weights,main's output correlates with the reference layout's output at only 0.11–0.34 in the token region; with this fix the output matches ComfyUI's independent implementation of the same checkpoint at correlation 1.000.Where it was introduced
#12915 ported this correctly (per-row boolean-mask gather). #13564 (
ebaa1871) replaced the gather — which forces a GPU→CPU sync due to data-dependent indexing — with the vectorized masked-write + flip, unintentionally changing the semantics. The regression is onmainonly (v0.38.0 predates it), so fixing it now keeps it out of the next release.The fix
Replace the masked-write + flip with a stable argsort of the inverted mask +
torch.gather+torch.where: valid tokens move to the front in original order (stable sort preserves relative order), registers fill the tail by absolute position, computed per batch row (the pipelines batch negative+positive prompts of different lengths). All ops are fixed-shape device ops, so #13564's sync-elimination goal is preserved — no data-dependent indexing, no host sync.Tests
tests/pipelines/ltx2/test_ltx2_connectors.pychecks the exact layout semantics through the module forward (num_layers=0reduces forward to layout + final RMSNorm) for left-padded, mixed-length batch, fully-valid, and single-token inputs. The tests fail onmainand pass with the fix:Before submitting
This investigation and fix were developed with AI assistance (Claude Code) and verified end-to-end as described above and in #13930.
Who can review?
@dg845 @sayakpaul
🤖 Generated with Claude Code