Fix client-disconnect session leaks in PyTorch MP engine by grimoire · Pull Request #4655 · InternLM/lmdeploy

grimoire · 2026-06-06T16:56:02Z

Summary

Fix session leaks when API clients disconnect or Ctrl+C during PyTorch MP serving.

This patch makes MP stream startup cancellation-safe and makes serve-side terminal session cleanup idempotent, so dropped requests no longer leave backend sessions or API session mappings alive forever.

Bug to fix

client Ctrl+C
  -> HTTP streaming task cancelled
  -> safe_run starts cleanup
  -> async_end waits for MP stream init event
  -> stream init event may never be set if cancellation happened before stream id reply
  -> END_SESSION is not sent
  -> backend session remains alive

Changes

Add MP stream startup barrier via SessionState.init_done.
Ensure async_end() waits until backend ADD_MESSAGE has been enqueued before sending END_SESSION.
Make ZMQ and Ray MP streaming startup cancellation-safe with shielded startup tasks.
Add abandoned stream cleanup for ZMQ/Ray stream tasks.
Make serve/API session cleanup idempotent on normal completion, cancellation, prompt errors, and client disconnect.
Add ZMQ backend-death handling so pending RPC futures do not hang forever.
Keep expected client disconnect logs at INFO; real exception paths still log tracebacks.

Requirement

Fix qwen3.5 mtp #4652

Copilot

Pull request overview

This PR targets leaked backend sessions and stale session mappings when API clients disconnect (or cancel) during PyTorch MP serving, by making MP streaming startup cancellation-safe and making serve-side cleanup idempotent.

Changes:

Add a stream-startup barrier (SessionState.init_done) and make MP streaming startup robust to cancellation (ZMQ/Ray), including abandoned stream cleanup and backend-death wakeups.
Make session cleanup idempotent across engine/generator/API wrapper paths, and tighten AsyncEngine.generate() session removal behavior under cancellations/errors.
Fix TP-local Q/KV head metadata usage for FlashAttention/FlashMLA and correct KV metadata handling for last-chunk spec-decode input rewriting; add regression tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/test_lmdeploy/serve/test_session_cleanup.py	New regression tests for idempotent session-map cleanup across normal exit/cancel paths.
tests/pytorch/spec_decode/test_spec_agent.py	Update regression test to ensure long-context KV metadata stays aligned after input rewriting.
tests/pytorch/paging/test_state_manager.py	New tests for reserved state-cache row and `num_state_caches=None` behavior.
tests/pytorch/engine/test_zmq_rpc.py	Add tests for stream-start barrier, cancel safety, backend-death wakeups, and idempotent stream output handling.
tests/pytorch/engine/test_ray_mp_engine.py	Add tests for Ray stream startup cancellation and idempotent result retrieval after drop.
tests/pytorch/config/test_model_config.py	New tests for TP-local head count computation and dist_config preservation.
lmdeploy/serve/openai/api_server.py	Add streaming/non-streaming request wrapper that ensures generator/session cleanup on disconnect/cancel.
lmdeploy/serve/managers/session_manager.py	Add request-exit-driven session removal and make `SessionManager.remove()` idempotent/stale-safe.
lmdeploy/serve/core/async_engine.py	Make safe_run cancellation-safe; ensure session removal happens consistently on prompt/cancel/error paths.
lmdeploy/pytorch/spec_decode/spec_agent.py	Keep aggregate KV metadata unchanged for last-chunk input rewriting.
lmdeploy/pytorch/paging/state_manager.py	Ensure reserved state row is excluded from allocatable IDs; handle `num_state_caches=None`.
lmdeploy/pytorch/models/utils/cudagraph.py	Use TP-local head counts for FlashAttention/FlashMLA metadata.
lmdeploy/pytorch/envs.py	Add `LMDEPLOY_FAKE_CUDA_GRAPH_CAPTURE` env flag.
lmdeploy/pytorch/engine/mp_engine/zmq_rpc.py	Add backend-death handling, cancellation-safe stream startup, abandoned stream drop, and streaming startup barrier plumbing.
lmdeploy/pytorch/engine/mp_engine/zmq_engine.py	Wire backend liveness callbacks/sentinel; make port-wait robust to early backend exit.
lmdeploy/pytorch/engine/mp_engine/ray_engine.py	Add cancellation-safe Ray stream startup and abandoned-stream drop support.
lmdeploy/pytorch/engine/mp_engine/base.py	Replace `is_exists` with `init_done` barrier; make `async_end()` wait for startup completion.
lmdeploy/pytorch/engine/mp_engine/base_worker.py	Add `EngineOutputGather.discard()` to drop abandoned stream buffers.
lmdeploy/pytorch/config.py	Add `dist_config` to `ModelConfig`; implement `get_num_qkv_head_by_tp()` and preserve dist_config in `from_hf_config()`.
lmdeploy/pytorch/backends/cuda/op_backend.py	Use TP-local head counts when building FlashAttention/FlashMLA metadata.
lmdeploy/pytorch/backends/cuda/graph_runner.py	Add “fake capture” path to bypass actual CUDA graph capture for debugging/padding behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lvhan028 · 2026-06-08T08:57:20Z

+            except (asyncio.CancelledError, GeneratorExit):
+                remove_session_once()
+                raise


Can it be merged to except Exception branch for the sake of metrics_processor

Good point. I kept the CancelledError/GeneratorExit branch separate from the generic Exception branch so cancellation still propagates correctly instead of being converted into an error GenOut.

But the metrics concern is valid: generate() has already incremented num_total_reqs, so prompt-processing cancellation should also update failed metrics. I added metrics_processor.increase_failed_requests('cancel') before cleanup/re-raise, and added a unit test to verify num_total_reqs, num_cancelled_reqs, and num_uncompleted_reqs stay balanced after prompt cancellation.

Powered by codex

# Conflicts: # lmdeploy/serve/anthropic/endpoints/messages.py # tests/test_lmdeploy/serve/anthropic/test_endpoints.py

Copilot AI review requested due to automatic review settings June 6, 2026 16:56

Copilot started reviewing on behalf of grimoire June 6, 2026 16:56 View session

grimoire force-pushed the fix-session-end branch from ec46479 to 4b9017c Compare June 6, 2026 16:57

Copilot AI reviewed Jun 6, 2026

View reviewed changes

grimoire force-pushed the fix-session-end branch 2 times, most recently from 9de66db to 38c9bc7 Compare June 8, 2026 08:45

lvhan028 added the Bug:P0 label Jun 8, 2026

lvhan028 reviewed Jun 8, 2026

View reviewed changes

fix client disconnect session cleanup

2a32747

grimoire force-pushed the fix-session-end branch from 38c9bc7 to 2a32747 Compare June 8, 2026 09:20

RunningLeon reviewed Jun 8, 2026

View reviewed changes

Comment thread lmdeploy/serve/openai/api_server.py

RunningLeon reviewed Jun 8, 2026

View reviewed changes

Comment thread lmdeploy/serve/core/async_engine.py

grimoire added 2 commits June 8, 2026 20:59

fix

90de652

Merge remote-tracking branch 'upstream/main' into fix-session-end

cf7a691

# Conflicts: # lmdeploy/serve/anthropic/endpoints/messages.py # tests/test_lmdeploy/serve/anthropic/test_endpoints.py

lvhan028 approved these changes Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix client-disconnect session leaks in PyTorch MP engine#4655

Fix client-disconnect session leaks in PyTorch MP engine#4655
grimoire wants to merge 3 commits into
InternLM:mainfrom
grimoire:fix-session-end

grimoire commented Jun 6, 2026 •

edited by lvhan028

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvhan028 Jun 8, 2026 •

edited

Loading

Uh oh!

grimoire Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

grimoire commented Jun 6, 2026 • edited by lvhan028 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug to fix

Changes

Requirement

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvhan028 Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grimoire Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

grimoire commented Jun 6, 2026 •

edited by lvhan028

Loading

lvhan028 Jun 8, 2026 •

edited

Loading

grimoire Jun 8, 2026 •

edited

Loading