docs: BYO OpenAI-compatible gateway (Bifrost/LiteLLM/vLLM) + required vLLM tool-calling flags#2003
Conversation
Signed-off-by: AmitChaubey <amit.katyayana@gmail.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new user-guide flow for running kagent through an OpenAI-compatible gateway (e.g., Bifrost/LiteLLM) to a self-hosted vLLM backend, including copy/paste manifests.
Changes:
- Added an end-to-end guide documenting gateway ↔ vLLM configuration and troubleshooting.
- Added an example Kubernetes manifest for a gateway-backed
ModelConfigand anAgent. - Added a user-guide index README linking to the new guide.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| examples/modelconfig-openai-gateway-vllm.yaml | Provides a ready-to-apply Secret + ModelConfig + Agent example for gateway→vLLM. |
| docs/user-guide/openai-compatible-gateway-vllm.md | Documents setup steps, required vLLM flags for tool calling, and troubleshooting. |
| docs/user-guide/README.md | Adds a user-guide landing page linking to the new gateway→vLLM guide. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Guide | Description | | ||
| | --- | --- | | ||
| | [OpenAI-compatible gateway with vLLM](openai-compatible-gateway-vllm.md) | Wire kagent to Bifrost, LiteLLM, or another OpenAI-compatible gateway fronting self-hosted vLLM | |
| | Field | What to set | | ||
| | --- | --- | | ||
| | `provider` | Always `OpenAI` for OpenAI-compatible gateways | | ||
| | `model` | Gateway routing identifier (for example `vllm/Qwen/...`) | | ||
| | `openAI.baseUrl` | Gateway OpenAI API base URL | | ||
| | `apiKeySecret` / `apiKeySecretKey` | Secret holding the gateway API key | |
There was a problem hiding this comment.
Checked the raw source — both tables use standard single-pipe syntax (| Guide | Description |); there's no || in the files. This looks like a diff-rendering artifact, and the tables render correctly on GitHub. No change needed.
| stringData: | ||
| key: sk-gateway-example # replace with your gateway API key |
| stringData: | ||
| key: sk-gateway-example # replace with your gateway key, or any placeholder if auth is disabled |
There was a problem hiding this comment.
Good catch — replaced sk-gateway-example with REPLACE_WITH_GATEWAY_API_KEY in both the example manifest and the guide to avoid secret-scanner false positives. Fixed in the latest commit.
Signed-off-by: AmitChaubey <amit.katyayana@gmail.com>
Summary
The BYO OpenAI-compatible guide covers hosted providers (Cohere) but not the common
self-hosted pattern: kagent → OpenAI-compatible gateway (Bifrost, LiteLLM) → vLLM.
Two things bite users that aren't documented anywhere:
vLLM must be launched with
--enable-auto-tool-choice --tool-call-parser <parser>.kagent's declarative runtime unconditionally registers the built-in
ask_usertool(
python/packages/kagent-adk/src/kagent/adk/types.py), so every chat completionrequest carries a non-empty
toolsarray withtool_choice: "auto"— even for agentswith zero user-configured tools. Without the flags, every agent turn fails with a
generic
provider API error (status 400)that gives no hint of the cause.ModelConfig.spec.modelmust be the gateway's routing identifier (oftenprovider-prefixed, e.g.
vllm/Qwen/...), which differs from the bare HF namevLLM serves under.
This PR adds:
docs/user-guide/openai-compatible-gateway-vllm.md— end-to-end guide with atroubleshooting section mapping the 400 to the missing parser flags
docs/user-guide/README.md— user-guide indexexamples/modelconfig-openai-gateway-vllm.yaml— copy-paste Secret + ModelConfig +Agent manifest (validated against the v1alpha2 types on main)
Verified against a live setup: Bifrost gateway fronting vLLM serving
Qwen/Qwen2.5-7B-Instruct with
--tool-call-parser hermes.Test plan