Optimize XML tool parsers with incremental streaming and fast-path buffering by lvhan028 · Pull Request #4664 · InternLM/lmdeploy

lvhan028 · 2026-06-09T14:41:41Z

Motivation

XML tool parsers (GLM47, Qwen3Coder) run on the API server after each streamed decode step. The previous implementation rescanned the entire accumulated payload on every chunk and re-coerced all parameters, giving O(n²) CPU cost as tool argument values grow or arrive token-by-token.

In real streaming workloads—especially long parameter values split across hundreds or thousands of small chunks—this added unnecessary CPU overhead on the API server hot path. The goal is to keep parsing semantics identical while reducing per-chunk work to roughly O(n) and skipping work entirely while plain value text is still being buffered.

Modification

xml_tool_parser.py: Added incremental streaming infrastructure:
- Payload buffering via _payload_parts with deferred join
- Fast path for in-progress parameter values that contain no XML structural characters (<, >, /)
- Coercion cache (_coerced_args) so only newly completed parameters are coerced
- Cheaper string coercion: skip json.loads unless the value looks like a JSON string
glm47_tool_parser.py / qwen3coder_tool_parser.py:
- Incremental parse state (function name, open param/value, scan cursor)
- Complete open parameters only when closing tags arrive
- Fix scan cursor advancing past incomplete parameter headers during split-chunk streaming
- Qwen3Coder: strip outer <tool_call> tag once instead of on every chunk
benchmark/benchmark_xml_tool_parser.py: Added benchmark comparing legacy (main-branch) vs optimized parsers on long-value and tokenized streaming scenarios (~3.5x average speedup on 2048-char values).

…ffering

Copilot

Pull request overview

This PR optimizes the XML-like tool parsers used during streamed decoding by adding incremental parsing state, buffering plain value text to avoid repeated full rescans, and caching schema coercion so only newly completed parameters are coerced.

Changes:

Added shared incremental streaming infrastructure to XmlToolParser (payload part buffering, “in-progress value” fast-path, and coerced-args caching).
Reworked Glm47ToolParser and Qwen3CoderToolParser to maintain incremental parse state and only finalize parameters when closing tags arrive.
Added a benchmark script to compare legacy vs optimized implementations under long-value and tokenized streaming scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
lmdeploy/serve/parsers/tool_parser/xml_tool_parser.py	Adds shared buffering / incremental parsing hooks and coercion caching for streamed XML tool parsing.
lmdeploy/serve/parsers/tool_parser/glm47_tool_parser.py	Implements incremental state tracking for GLM-4.7 XML tool payloads.
lmdeploy/serve/parsers/tool_parser/qwen3coder_tool_parser.py	Implements incremental state tracking and outer-tag stripping changes for Qwen3Coder XML tool payloads.
benchmark/benchmark_xml_tool_parser.py	Adds a local benchmark harness comparing legacy vs optimized streaming behavior and performance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    def _payload_text(self) -> str:
+        if not self._payload_parts:
+            return self._tool_payload
+        return ''.join(self._payload_parts)
+


+        search_idx = 0
+        while True:


+        search_idx = 0
+        while True:


+    def _strip_outer_open_tag_once(self, payload: str) -> str:
+        if self._qwen_open_tag_stripped:
+            return payload
+        open_tag = self.get_tool_open_tag()
+        if open_tag and payload.startswith(open_tag):
+            self._qwen_open_tag_stripped = True
+            return payload[len(open_tag):]
+        return payload


Optimize XML tool parsers with incremental streaming and fast-path bu…

d26c3af

…ffering

Copilot AI review requested due to automatic review settings June 9, 2026 14:41

lvhan028 marked this pull request as draft June 9, 2026 14:41

Copilot started reviewing on behalf of lvhan028 June 9, 2026 14:41 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize XML tool parsers with incremental streaming and fast-path buffering#4664

Optimize XML tool parsers with incremental streaming and fast-path buffering#4664
lvhan028 wants to merge 1 commit into
InternLM:mainfrom
lvhan028:optimize-xml-tool-parser

lvhan028 commented Jun 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lvhan028 commented Jun 9, 2026

Motivation

Modification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants