[New Pipeline/Model] Add JoyAI-Echo multi-shot audio-video generation pipeline

### Model/Pipeline/Scheduler description

**We are the JoyAI Team (JD.com)**, proposing the integration of JoyAI-Echo into Diffusers.

**JoyAI-Echo** is a unified framework for long-form audio-visual generation, designed to support minute-level video creation with synchronized audio, strong temporal consistency, and real-time interaction.

Key innovations:
- **Cross-modal audio-visual memory bank**: preserves character appearance and voice timbre across long sequences (up to minutes)
- **DMD-distilled few-step inference**: ~7.5× faster than baseline while improving alignment and visual quality
- **Joint audio-video generation**: a single pipeline produces synchronized video and audio
- **Multi-shot story generation**: generates coherent sequences of shots from prompt lists

The architecture builds on LTX-2 and adds the JoyAI-Echo DMD denoising schedule plus a paired audio-video memory bank for cross-shot consistency.

### Open source status

- [x] The model implementation is available.
- [x] The model weights are available (Only relevant if addition is not a scheduler).

### Provide useful links for the implementation

- GitHub Repository: https://github.com/jd-opensource/JoyAI-Echo
- HuggingFace Weights: https://huggingface.co/jdopensource/JoyAI-Echo
- Diffusers implementation PR: (will link after PR is created)

### Additional context

We (JoyAI Team) previously contributed JoyAI-Image-Edit to Diffusers (PR #13444, merged). This follows the same pattern — official team providing a complete, tested implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Pipeline/Model] Add JoyAI-Echo multi-shot audio-video generation pipeline #13909

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[New Pipeline/Model] Add JoyAI-Echo multi-shot audio-video generation pipeline #13909

Description

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions