implement agent reliability features from ADK by peterj · Pull Request #2001 · kagent-dev/kagent

peterj · 2026-06-11T23:12:15Z

Adds reliability features toagents and ModelConfig.

Agent CRD has the reliability field with the following config:

toolRetries (1–10): reflect-and-retry on failed tool calls — the runtime injects structured reflection guidance into the model context so the agent can self-correct instead of repeating the same failing call
maxLLMCalls (≥1): cost safety rail capping model calls per request; the run stops with a clear error instead of looping (runtime default 500)
debugLogging: verbose logging of every LLM request/response and tool call to the agent pod logs (off by default)

ModelConfig retry field:

attempts (0–20): automatic retries of failed LLM HTTP requests (429/408/5xx) with exponential backoff via the provider SDK. Supported for OpenAI, Azure OpenAI, Anthropic, and Gemini; other providers log a warning

UI is updated to show the settings in the advanced section:

Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>

chromatic-com · 2026-06-11T23:13:12Z

Warning

Testing paused

Monthly snapshot limit reached. Update your plan for additional snapshots and to resume testing.

Copilot

Pull request overview

This PR adds end-to-end “reliability” and “retry” configuration across CRDs, translators, runtimes (Python + Go), and UI so operators can control tool-call self-healing, model-call caps, debug logging, and provider SDK HTTP retry behavior.

Changes:

Adds spec.declarative.reliability to Agent/SandboxAgent (tool retries, max model calls, debug logging) and wires it through translators into runtime config.
Adds spec.retry.attempts to ModelConfig and maps it to max_retries in generated runtime config, including provider-specific wiring and warnings for unsupported providers.
Updates UI forms to expose these settings under an Advanced section; adds unit/golden test coverage for the new plumbing.

Reviewed changes

Copilot reviewed 39 out of 40 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
ui/src/types/index.ts	Adds UI type definitions for `retry` and agent `reliability`.
ui/src/components/AgentsProvider.tsx	Extends agent form data shape with reliability fields.
ui/src/app/models/new/page.tsx	Adds Advanced UI for model retry attempts and payload wiring.
ui/src/app/agents/new/page.tsx	Adds Advanced UI for agent reliability settings and payload wiring.
ui/src/app/actions/agents.ts	Maps agent form reliability fields into Agent/SandboxAgent specs.
python/packages/kagent-adk/tests/unittests/test_reliability_config.py	Adds unit tests for reliability parsing and reflect-and-retry behavior.
python/packages/kagent-adk/tests/unittests/test_model_retry.py	Adds unit tests for model retry/max_retries wiring into provider SDK clients.
python/packages/kagent-adk/tests/unittests/models/test_sap_ai_core.py	Removes a stray blank line in test file.
python/packages/kagent-adk/src/kagent/adk/types.py	Adds `ReliabilityConfig`, `max_retries`, and wires retry into provider LLM constructors.
python/packages/kagent-adk/src/kagent/adk/models/_openai.py	Plumbs `max_retries` into OpenAI/Azure OpenAI SDK client construction.
python/packages/kagent-adk/src/kagent/adk/models/_anthropic.py	Plumbs `max_retries` into Anthropic SDK client construction.
python/packages/kagent-adk/src/kagent/adk/converters/request_converter.py	Adds `max_llm_calls` to ADK `RunConfig` construction.
python/packages/kagent-adk/src/kagent/adk/cli.py	Enables debug logging + reflect-and-retry plugins based on reliability config.
python/packages/kagent-adk/src/kagent/adk/_reflect_retry_plugin.py	Adds MCP `isError: true` handling to reflect-and-retry tool plugin.
python/packages/kagent-adk/src/kagent/adk/_agent_executor.py	Surfaces clearer user error when max LLM calls limit is exceeded.
python/packages/kagent-adk/src/kagent/adk/_a2a.py	Passes reliability max LLM call cap into executor config.
helm/kagent-crds/templates/kagent.dev_sandboxagents.yaml	Adds CRD schema for `spec.declarative.reliability` (SandboxAgent).
helm/kagent-crds/templates/kagent.dev_modelconfigs.yaml	Adds CRD schema for `spec.retry.attempts` (ModelConfig).
helm/kagent-crds/templates/kagent.dev_agents.yaml	Adds CRD schema for `spec.declarative.reliability` (Agent).
go/core/internal/controller/translator/agent/testdata/outputs/modelconfig_with_retry.json	Adds golden output validating ModelConfig retry translation.
go/core/internal/controller/translator/agent/testdata/outputs/agent_with_reliability.json	Adds golden output validating Agent reliability translation.
go/core/internal/controller/translator/agent/testdata/inputs/modelconfig_with_retry.yaml	Adds translator test input covering ModelConfig retry.
go/core/internal/controller/translator/agent/testdata/inputs/agent_with_reliability.yaml	Adds translator test input covering Agent reliability.
go/core/internal/controller/translator/agent/compiler.go	Translates agent reliability config into ADK config output.
go/core/internal/controller/translator/agent/adk_api_translator.go	Renames TLS helper to populate shared BaseModel fields and adds retry translation.
go/api/v1alpha2/zz_generated.deepcopy.go	Adds deepcopy support for new API fields.
go/api/v1alpha2/modelconfig_types.go	Adds `Retry *ModelRetryConfig` to ModelConfigSpec.
go/api/v1alpha2/agent_types.go	Adds `Reliability *ReliabilityConfig` to DeclarativeAgentSpec.
go/api/config/crd/bases/kagent.dev_sandboxagents.yaml	Adds generated base CRD schema for sandbox agent reliability.
go/api/config/crd/bases/kagent.dev_modelconfigs.yaml	Adds generated base CRD schema for model retry.
go/api/config/crd/bases/kagent.dev_agents.yaml	Adds generated base CRD schema for agent reliability.
go/api/adk/types.go	Adds `max_retries` on BaseModel and `reliability` in AgentConfig.
go/adk/pkg/runner/maxllmcalls.go	Implements Go-runtime max LLM calls limiter plugin.
go/adk/pkg/runner/adapter.go	Builds Go ADK plugins for reliability config (logging, retry-and-reflect, max calls).
go/adk/pkg/runner/adapter_test.go	Adds tests for reliability plugin building and max call limiter behavior.
go/adk/pkg/models/openai.go	Wires max retries into OpenAI/Azure OpenAI Go SDK options.
go/adk/pkg/models/base.go	Adds `MaxRetries` to transport config shape.
go/adk/pkg/models/anthropic.go	Wires max retries into Anthropic Go SDK options.
go/adk/pkg/agent/agent.go	Logs when retry config is ignored for unsupported providers; passes max retries in transport config.
docs/architecture/crds-and-types.md	Updates architecture doc to include new CRD/type layers and Go runtime parity note.

Files not reviewed (1)

go/api/v1alpha2/zz_generated.deepcopy.go: Generated file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>

implement agent reliability features from ADK

318933d

Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>

Copilot AI review requested due to automatic review settings June 11, 2026 23:12

peterj requested review from EItanya, ilackarms, iplay88keys, jmhbh, supreme-gg-gg and yuval-k as code owners June 11, 2026 23:12

Copilot started reviewing on behalf of peterj June 11, 2026 23:12 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

fix pr feedback

cffa7fa

Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>

peterj mentioned this pull request Jun 11, 2026

add reliability docs kagent-dev/website#384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement agent reliability features from ADK#2001

implement agent reliability features from ADK#2001
peterj wants to merge 2 commits into
mainfrom
peterj/addreliabilityfeatures

peterj commented Jun 11, 2026

Uh oh!

chromatic-com Bot commented Jun 11, 2026

Testing paused

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

peterj commented Jun 11, 2026

Uh oh!

chromatic-com Bot commented Jun 11, 2026

Testing paused

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants