diff --git a/content/manuals/ai/docker-agent/model-providers.md b/content/manuals/ai/docker-agent/model-providers.md index 6ebe19f5ea83..82865c10190f 100644 --- a/content/manuals/ai/docker-agent/model-providers.md +++ b/content/manuals/ai/docker-agent/model-providers.md @@ -1,7 +1,7 @@ --- title: Model providers description: Get API keys and configure cloud model providers for Docker Agent -keywords: [docker agent, model providers, api keys, anthropic, openai, google, gemini] +keywords: [docker agent, model providers, api keys, anthropic, openai, google, gemini, groq, deepseek, cerebras, fireworks, together, moonshot, openrouter, ovhcloud, baseten, vercel, cloudflare] weight: 10 --- @@ -16,9 +16,21 @@ models with Docker Model Runner](local-models.md). Docker Agent supports these cloud model providers: -- Anthropic - Claude models -- OpenAI - GPT models -- Google - Gemini models +- Anthropic — Claude models +- Baseten — open-weight models via Baseten +- Cerebras — fast inference for open-weight models +- Cloudflare AI Gateway — multi-provider gateway with caching and observability +- Cloudflare Workers AI — open-weight models on the Cloudflare edge +- DeepSeek — DeepSeek chat and reasoning models +- Fireworks AI — fast inference for open-weight models +- Google — Gemini models +- Groq — ultra-low-latency open-weight models +- Moonshot AI — Kimi models +- OpenAI — GPT models +- OpenRouter — unified gateway to hundreds of models +- OVHcloud — EU-hosted open-weight models +- Together AI — large catalog of open-weight models +- Vercel AI Gateway — unified gateway to OpenAI, Anthropic, Google, and more ## Anthropic @@ -54,37 +66,193 @@ Available models include: - `anthropic/claude-opus-4-5` - `anthropic/claude-haiku-4-5` -## OpenAI +## Baseten -OpenAI provides the GPT family of models, including GPT-5 and GPT-5 mini. +[Baseten](https://www.baseten.co/) provides AI models through an +OpenAI-compatible API. It is a good choice for deploying your own models or +accessing hosted open-weight models. -To get an API key: +1. Get an API key from [Baseten](https://www.baseten.co/). +2. Set the environment variable: -1. Go to [platform.openai.com/api-keys](https://platform.openai.com/api-keys). -2. Sign up or sign in to your account. -3. Navigate to the API Keys section. -4. Create a new API key. -5. Copy the key. +```console +$ export BASETEN_API_KEY=your_key_here +``` -Set your API key as an environment variable: +Use Baseten models in your agent configuration: + +```yaml +agents: + root: + model: baseten/deepseek-ai/DeepSeek-V3.1 + instruction: You are a helpful assistant +``` + +Or with a named model for more control: + +```yaml +models: + baseten_model: + provider: baseten + model: deepseek-ai/DeepSeek-V3.1 + max_tokens: 8192 + +agents: + root: + model: baseten_model + instruction: You are a helpful assistant +``` + +## Cerebras + +[Cerebras](https://www.cerebras.ai/) serves open-weight models (including +GPT-OSS and GLM) on its wafer-scale hardware through an OpenAI-compatible API, +delivering some of the highest inference speeds available. + +1. Create an API key from the [Cerebras Cloud console](https://cloud.cerebras.ai/). +2. Set the environment variable: ```console -$ export OPENAI_API_KEY=your_key_here +$ export CEREBRAS_API_KEY=your_key_here ``` -Use OpenAI models in your agent configuration: +Use Cerebras models in your agent configuration: ```yaml agents: root: - model: openai/gpt-5 + model: cerebras/gpt-oss-120b + instruction: You are a helpful assistant +``` + +Available models include: + +- `cerebras/gpt-oss-120b` +- `cerebras/zai-glm-4.7` + +## Cloudflare AI Gateway + +[Cloudflare AI Gateway](https://developers.cloudflare.com/ai-gateway/) is a +single OpenAI-compatible endpoint that routes to models from OpenAI, Anthropic, +Workers AI, and more, with caching, rate limiting, and observability. + +The gateway endpoint is account- and gateway-scoped. Three environment variables +are required: + +```console +$ export CLOUDFLARE_ACCOUNT_ID=your_account_id +$ export CLOUDFLARE_GATEWAY_ID=your_gateway_id +$ export CLOUDFLARE_API_TOKEN=your_api_token +``` + +Use Cloudflare AI Gateway in your agent configuration: + +```yaml +agents: + root: + model: cloudflare-ai-gateway/workers-ai/@cf/meta/llama-3.1-8b-instruct + instruction: You are a helpful assistant +``` + +Or with a named model: + +```yaml +models: + cf_gateway_model: + provider: cloudflare-ai-gateway + model: "workers-ai/@cf/meta/llama-3.1-8b-instruct" + +agents: + root: + model: cf_gateway_model + instruction: You are a helpful assistant +``` + +> [!NOTE] +> The alias sends your token in the standard `Authorization: Bearer` header, +> which works for unauthenticated gateways (the default) routing to Workers AI +> models. Gateways with authentication enabled require the `cf-aig-authorization` +> header, which is not supported by this alias. For that setup, use a [custom +> provider](reference/config.md#models) instead. + +## Cloudflare Workers AI + +[Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai/) runs +open-weight models (Llama, Mistral, Qwen, Gemma, and more) on Cloudflare's +global edge network. + +Workers AI is account-scoped, so two environment variables are required: + +```console +$ export CLOUDFLARE_ACCOUNT_ID=your_account_id +$ export CLOUDFLARE_API_TOKEN=your_api_token +``` + +Use Cloudflare Workers AI in your agent configuration: + +```yaml +agents: + root: + model: cloudflare-workers-ai/@cf/meta/llama-3.1-8b-instruct + instruction: You are a helpful assistant +``` + +Available models include `@cf/meta/llama-3.1-8b-instruct`, +`@cf/mistralai/mistral-small-3.1-24b-instruct`, and more. See the +[Workers AI models catalog](https://developers.cloudflare.com/workers-ai/models/) +for the full list. + +## DeepSeek + +[DeepSeek](https://www.deepseek.com/) serves its frontier chat and reasoning +models through an OpenAI-compatible API, with strong price/performance on coding +and reasoning tasks. + +1. Create an API key from the [DeepSeek Platform](https://platform.deepseek.com/api_keys). +2. Set the environment variable: + +```console +$ export DEEPSEEK_API_KEY=your_key_here +``` + +Use DeepSeek models in your agent configuration: + +```yaml +agents: + root: + model: deepseek/deepseek-chat instruction: You are a helpful coding assistant ``` Available models include: -- `openai/gpt-5` -- `openai/gpt-5-mini` +- `deepseek/deepseek-chat` — DeepSeek-V3, general-purpose chat and tool calling +- `deepseek/deepseek-reasoner` — DeepSeek-R1, extended-reasoning model + +## Fireworks AI + +[Fireworks AI](https://fireworks.ai/) is a fast inference host for open-weight +models, serving Kimi K2, Llama, Qwen, DeepSeek, GLM, and others through an +OpenAI-compatible API. + +1. Create an API key from the [Fireworks dashboard](https://fireworks.ai/account/api-keys). +2. Set the environment variable: + +```console +$ export FIREWORKS_API_KEY=your_key_here +``` + +Use Fireworks AI models in your agent configuration: + +```yaml +agents: + root: + model: fireworks/accounts/fireworks/models/kimi-k2-instruct + instruction: You are a helpful assistant +``` + +Fireworks model IDs use the `accounts/fireworks/models/` form. See the +[Fireworks model library](https://fireworks.ai/models) for current IDs. ## Google Gemini @@ -117,6 +285,198 @@ Available models include: - `google/gemini-2.5-flash` - `google/gemini-2.5-pro` +## Groq + +[Groq](https://groq.com/) serves open-weight models on its LPU inference engine +through an OpenAI-compatible API, with a focus on very low latency. + +1. Create an API key from the [Groq Console](https://console.groq.com/keys). +2. Set the environment variable: + +```console +$ export GROQ_API_KEY=your_key_here +``` + +Use Groq models in your agent configuration: + +```yaml +agents: + root: + model: groq/llama-3.3-70b-versatile + instruction: You are a helpful assistant +``` + +Available models include `llama-3.3-70b-versatile`, `llama-3.1-8b-instant`, and +more. See the [Groq models documentation](https://console.groq.com/docs/models) +for current model IDs. + +## Moonshot AI + +[Moonshot AI](https://www.moonshot.ai/) serves its Kimi model family through an +OpenAI-compatible API. Kimi K2 models are well-suited for coding and agentic +tasks. + +1. Create an API key from the [Moonshot AI console](https://platform.moonshot.ai/console/api-keys). +2. Set the environment variable: + +```console +$ export MOONSHOT_API_KEY=your_key_here +``` + +Use Moonshot AI models in your agent configuration: + +```yaml +agents: + root: + model: moonshot/kimi-k2-0905-preview + instruction: You are a helpful assistant +``` + +Available models include: + +- `moonshot/kimi-k2-0905-preview` +- `moonshot/kimi-k2-turbo-preview` +- `moonshot/kimi-k2-thinking` + +## OpenAI + +OpenAI provides the GPT family of models, including GPT-5 and GPT-5 mini. + +To get an API key: + +1. Go to [platform.openai.com/api-keys](https://platform.openai.com/api-keys). +2. Sign up or sign in to your account. +3. Navigate to the API Keys section. +4. Create a new API key. +5. Copy the key. + +Set your API key as an environment variable: + +```console +$ export OPENAI_API_KEY=your_key_here +``` + +Use OpenAI models in your agent configuration: + +```yaml +agents: + root: + model: openai/gpt-5 + instruction: You are a helpful coding assistant +``` + +Available models include: + +- `openai/gpt-5` +- `openai/gpt-5-mini` + +## OpenRouter + +[OpenRouter](https://openrouter.ai/) provides access to hundreds of models from +many providers through a single OpenAI-compatible API, with automatic failover +and unified billing. + +1. Get an API key from [OpenRouter](https://openrouter.ai/settings/keys). +2. Set the environment variable: + +```console +$ export OPENROUTER_API_KEY=your_key_here +``` + +Use OpenRouter in your agent configuration: + +```yaml +agents: + root: + model: openrouter/meta-llama/llama-3.3-70b-instruct + instruction: You are a helpful assistant +``` + +OpenRouter model IDs include the upstream provider name (for example +`anthropic/claude-sonnet-4-5` or `meta-llama/llama-3.3-70b-instruct`). Docker +Agent preserves the full upstream model ID after the first slash. See the +[OpenRouter models list](https://openrouter.ai/models) for available models. + +## OVHcloud + +[OVHcloud AI Endpoints](https://endpoints.ai.cloud.ovh.net/) serves open-weight +models through an OpenAI-compatible API, hosted in the EU. Several models are +available on a rate-limited free tier with no billing setup required. + +1. Create an access token from the + [OVHcloud AI Endpoints portal](https://endpoints.ai.cloud.ovh.net/). +2. Set the environment variable: + +```console +$ export OVH_AI_ENDPOINTS_ACCESS_TOKEN=your_token_here +``` + +Use OVHcloud models in your agent configuration: + +```yaml +agents: + root: + model: ovhcloud/Qwen3.5-397B-A17B + instruction: You are a helpful assistant +``` + +Available models include `Qwen3.5-397B-A17B`, `Qwen3-32B`, +`Meta-Llama-3_3-70B-Instruct`, and more. See the +[AI Endpoints catalogue](https://endpoints.ai.cloud.ovh.net/) for current model +IDs. + +## Together AI + +[Together AI](https://www.together.ai/) is one of the largest hosts of +open-weight models, serving Llama, Qwen, DeepSeek, Kimi, GLM, and others +through an OpenAI-compatible API. + +1. Create an API key from the [Together AI settings](https://api.together.ai/settings/api-keys). +2. Set the environment variable: + +```console +$ export TOGETHER_API_KEY=your_key_here +``` + +Use Together AI models in your agent configuration: + +```yaml +agents: + root: + model: together/meta-llama/Llama-3.3-70B-Instruct-Turbo + instruction: You are a helpful assistant +``` + +See the [Together AI model library](https://docs.together.ai/docs/serverless-models) +for current model IDs. + +## Vercel AI Gateway + +[Vercel AI Gateway](https://vercel.com/docs/ai-gateway) is a unified +OpenAI-compatible endpoint that routes to models from OpenAI, Anthropic, Google, +xAI, and more at list price with no markup. + +1. Create an API key from the [Vercel AI Gateway dashboard](https://vercel.com/docs/ai-gateway). +2. Set the environment variable: + +```console +$ export AI_GATEWAY_API_KEY=your_key_here +``` + +Use Vercel AI Gateway in your agent configuration: + +```yaml +agents: + root: + model: vercel/openai/gpt-5 + instruction: You are a helpful assistant +``` + +Vercel AI Gateway model IDs use the `creator/model` form (for example +`openai/gpt-5` or `anthropic/claude-sonnet-4.5`). See the +[Vercel AI Gateway documentation](https://vercel.com/docs/ai-gateway) for the +current model list. + ## OpenAI-compatible providers You can use the `openai` provider type to connect to any model or provider that diff --git a/content/manuals/ai/docker-agent/reference/config.md b/content/manuals/ai/docker-agent/reference/config.md index a5447b86e5a9..70e5ea56ecca 100644 --- a/content/manuals/ai/docker-agent/reference/config.md +++ b/content/manuals/ai/docker-agent/reference/config.md @@ -34,7 +34,7 @@ rag: # Optional - RAG sources docs: [./documents] strategies: [...] -metadata: # Optional - author, license, readme +metadata: # Optional - author, license, readme, version, tags author: Your Name ``` @@ -171,8 +171,10 @@ structured_output: | `parallel_tool_calls` | boolean | Enable parallel tool execution (default: true) | No | | `token_key` | string | Authentication token key | No | | `track_usage` | boolean | Track token usage | No | -| `thinking_budget` | mixed | Reasoning effort (provider-specific) | No | -| `provider_opts` | object | Provider-specific options | No | +| `thinking_budget` | mixed | Reasoning effort (provider-specific) | No | +| `compaction_model` | string | Model to use for session compaction (summary generation); defaults to the agent's own model | No | +| `bypass_models_gateway` | boolean | Connect directly to the provider, bypassing any configured models gateway | No | +| `provider_opts` | object | Provider-specific options | No | ### Alloy models @@ -182,6 +184,74 @@ Use multiple models in rotation by separating names with commas: model: anthropic/claude-sonnet-4-5,openai/gpt-5 ``` +### Compaction model + +By default, when a session is compacted (via `/compact`, the proactive +threshold trigger, or post-overflow recovery), Docker Agent uses the agent's +own model to summarize the conversation. That summary call ingests the entire +conversation and is the slowest, most expensive call in a session. + +You can point `compaction_model` at a smaller, faster model to make compaction +cheaper without changing the model that runs the conversation: + +```yaml +models: + primary: + provider: anthropic + model: claude-sonnet-4-5 + compaction_model: fast # use the cheaper model for compaction + fast: + provider: anthropic + model: claude-haiku-4-5 + +agents: + root: + model: primary + instruction: You are a helpful assistant. +``` + +The value can be a model name from the `models` section or an inline +`provider/model` spec (for example `openai/gpt-5-mini`). When omitted, +compaction reuses the agent's own model. + +> [!NOTE] +> If the compaction model has a smaller context window than the primary model, +> Docker Agent triggers compaction against the smaller window, so the summary +> call can always ingest the full conversation. Pair the primary with a +> compaction model whose context window is at least as large to keep the +> proactive trigger aligned with the primary's window. + +### Bypass models gateway + +When a models gateway is configured (via `--models-gateway` or +`CAGENT_MODELS_GATEWAY`), Docker Agent routes all model requests through it by +default. Set `bypass_models_gateway: true` on a specific model to make it +connect directly to its provider instead: + +```yaml +models: + # Routed through the gateway when one is configured. + gateway-model: + provider: openai + model: gpt-5 + + # Always connects directly to Anthropic, even when a gateway is configured. + # Requires ANTHROPIC_API_KEY to be set. + direct-model: + provider: anthropic + model: claude-sonnet-4-5 + bypass_models_gateway: true + +agents: + root: + model: direct-model + instruction: You are a helpful assistant. +``` + +A bypassed model authenticates with its own provider credentials +(`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or the explicit `token_key`) rather +than the gateway's token. + ### Thinking budget Controls reasoning depth. Configuration varies by provider: @@ -534,16 +604,20 @@ Results: Documentation and sharing information: -| Property | Type | Description | -| --------- | ------ | ------------------------------- | -| `author` | string | Author name | -| `license` | string | License (e.g., MIT, Apache-2.0) | -| `readme` | string | Usage documentation | +| Property | Type | Description | +| --------- | -------- | ----------------------------------------- | +| `author` | string | Author name | +| `license` | string | License (e.g., MIT, Apache-2.0) | +| `readme` | string | Usage documentation | +| `version` | string | Semantic version string | +| `tags` | []string | Tags for categorization and discovery | ```yaml metadata: author: Your Name license: MIT + version: "1.0.0" + tags: [coding, review] readme: | Description and usage instructions ```