diff --git a/cli/azd/extensions/azure.ai.agents/README.md b/cli/azd/extensions/azure.ai.agents/README.md index 0573957e075..9031eac6efc 100644 --- a/cli/azd/extensions/azure.ai.agents/README.md +++ b/cli/azd/extensions/azure.ai.agents/README.md @@ -13,6 +13,14 @@ Use `--no-inspector` to run only the local agent process: azd ai agent run --no-inspector ``` +## Private networking for `host: microsoft.foundry` + +Foundry services can be provisioned as network-secured, VNet-bound accounts by +adding a `network:` block to `azure.yaml`. See +[Private networking for `host: microsoft.foundry`](docs/private-networking.md) +for the schema reference, BYO-image requirements, and VNet deployment +cheatsheet. + ## Local Development ### Prerequisites diff --git a/cli/azd/extensions/azure.ai.agents/docs/private-networking.md b/cli/azd/extensions/azure.ai.agents/docs/private-networking.md new file mode 100644 index 00000000000..7992ccdf992 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/docs/private-networking.md @@ -0,0 +1,249 @@ +# Private networking for `host: azure.ai.agent` + +A Foundry service can be provisioned as a **network-secured (VNet-bound)** +account by adding a `network:` block to the service body in `azure.yaml`. When +`network:` is omitted the account uses public networking (unchanged behavior). + +When `network:` is present, azd always provisions an **account private +endpoint** and disables public network access — the data plane is never left +public. Dependent stores (Cosmos DB, AI Search, Storage) stay platform-managed. + +The block models two orthogonal axes: + +- **Egress** (agent runtime network) — set `agentSubnet` to inject the agent + into your subnet (BYO VNet), or omit it to use the Microsoft-managed network. + `isolationMode` tunes the managed network's outbound posture and is valid only + when `agentSubnet` is omitted. +- **Ingress** (account data plane) — `peSubnet` is **required** and always + yields an account private endpoint, so callers (`azd deploy`, + `azd ai agent invoke`) must reach the account from inside the VNet, a peered + VNet, or VPN. + +```yaml +services: + my-project: + host: azure.ai.agent + network: + # ----- Egress: agent runtime network (pick ONE) ----- + # + # (a) Managed egress (shown live below): omit agentSubnet so the agent + # runs in the Microsoft-managed network. isolationMode is valid only + # in this mode. + isolationMode: AllowOnlyApprovedOutbound # or AllowInternetOutbound (default) + # + # (b) BYO egress: inject the agent into your subnet instead. Replace the + # isolationMode line above with an agentSubnet block (same VNet as + # peSubnet in v1): + # agentSubnet: + # vnet: ${AZURE_VNET_ID} + # name: agent-subnet + # prefix: 192.168.10.0/24 # omit prefix to reference an existing subnet + + # ----- Ingress: account private endpoint (REQUIRED) ----- + peSubnet: + vnet: ${AZURE_VNET_ID} # ARM id of the VNet (must already exist) + name: pe-subnet + prefix: 192.168.11.0/24 # omit prefix to reference an existing subnet + + # ----- Private DNS (optional) ----- + dns: + resourceGroup: rg-private-dns # omit to let azd create + link the zones + subscription: ${AZURE_DNS_SUBSCRIPTION_ID} # optional; defaults to the deployment subscription + agents: + - name: my-agent + kind: hosted + project: src/my-agent + image: myprivacr.azurecr.io/agents/my-agent:v1 # BYO image required +``` + +> You do not hand-author the `agents:` entry above. Run +> `azd ai agent init --no-prompt --agent-name my-agent --image ` +> to scaffold it (it writes `agent.yaml`); then add the `network:` block to the +> generated service. + +> The example above uses **managed egress** so every field — including +> `isolationMode` — is shown as valid YAML. For **BYO egress**, swap the +> `isolationMode` line for an `agentSubnet` block (see comment `(b)` and the BYO +> cheatsheet below); `isolationMode` is then invalid and must be removed. + +### Field reference + +| Field | Rule | +| --- | --- | +| `agentSubnet` | Optional. Present: the agent is injected into this customer subnet (BYO egress). Absent: the agent uses the Microsoft-managed network (managed egress). | +| `peSubnet` | **Required.** Subnet for the account private endpoint. Establishes the private data plane (public access disabled). | +| `isolationMode` | Optional. `AllowInternetOutbound` or `AllowOnlyApprovedOutbound`. Valid **only** when `agentSubnet` is omitted (managed egress). | +| subnet `vnet` | Required. ARM id of the VNet that holds (or will hold) the subnet. Supports `${VAR}`. When `agentSubnet` is present, it must reference the same VNet as `peSubnet`. | +| subnet `name` | Required. Subnet name. | +| subnet `prefix` | Optional. Omit to reference an existing subnet; set to create the subnet with that CIDR. | +| `dns.resourceGroup` | Omitted: azd creates and links the AI private DNS zones. Set: azd references existing zones in that resource group. Requires `peSubnet`. | +| `dns.subscription` | Optional. Defaults to the deployment subscription. Accepts a bare GUID or `${VAR}`. | + +### Environment variables + +Network fields support `${VAR}` references resolved client-side from the azd +environment (run `azd env set `). The variable names are +user-chosen; the example above uses: + +| Variable | Format | Used by | +| --- | --- | --- | +| `AZURE_VNET_ID` | ARM resource id of an existing `Microsoft.Network/virtualNetworks` | subnet `vnet` | +| `AZURE_DNS_SUBSCRIPTION_ID` | bare GUID or `/subscriptions/` | `network.dns.subscription` | + +### Requirements and limits + +- **`peSubnet` is mandatory.** A network-bound account always gets a private + endpoint; there is no public data-plane fallback. Run `azd deploy` / + `azd ai agent invoke` from inside the VNet, a peered VNet, or VPN. +- **Single VNet (v1).** When `agentSubnet` is present it must live in the same + VNet as `peSubnet`. +- **BYO container image required.** Secured agents must reference a pre-built + image via `agents[].image` (`registry/image:tag`); the developer owns the + registry's SKU, private endpoint, DNS, and firewall. Local build into a + private ACR is not supported in v1. +- **Brownfield (`endpoint:`) ignores `network:`.** When `endpoint:` is set the + account's network posture is fixed by whoever created it; azd warns and does + not reconcile `network:`. + +### Known limitations + +- **BYO egress is single-VNet (v1).** When `agentSubnet` is set it must + reference the same VNet as `peSubnet`; azd errors otherwise. Cross-VNet + topologies (agent injected in one VNet, account private endpoint in another) + are deferred: they require customer-managed VNet **peering** between the two + VNets — so the agent can route to the account private endpoint — plus private + DNS zone links to *both* VNets. azd does not provision or validate that + peering, so the data path would silently fail. Managed egress is unaffected: + the agent reaches the account over Microsoft-managed connectivity and never + the customer ingress VNet, so it needs only the single `peSubnet` VNet. + +- **One default-DNS account per VNet.** By default (no `dns:` block) azd + creates the three `privatelink.*` AI zones and **links them to your VNet**. + Azure allows a VNet to be linked to only one zone per namespace, so a second + Foundry account that also owns its DNS cannot reuse the same VNet — the link + fails with `A virtual network cannot be linked to multiple zones with + overlapping namespaces`. If the VNet is already linked to those zones (a + second account, or a brownfield hub that pre-links the AI privatelink zones), + set the `dns:` block to **reference** the existing zones; reference mode binds + the account private endpoint to them and skips creating a new VNet link. + +### Cheatsheet: managed-egress account (private data plane) + +Omit `agentSubnet` so the hosted-agent runtime uses a Microsoft-managed network +instead of your VNet. `peSubnet` is still required: the account data plane stays +private behind an account private endpoint in your VNet, reachable from inside +the VNet / VPN. + +Scaffold the agent with a pre-built (BYO) image (writes `azure.yaml` and +`agent.yaml`): + +```bash +azd ai agent init --no-prompt --agent-name my-agent \ + --image myprivacr.azurecr.io/agents/my-agent:v1 +``` + +Then add a `network:` block to the generated service in `azure.yaml` (omit +`agentSubnet` for managed egress; `isolationMode` is valid only in this mode): + +```yaml +name: my-agent +infra: + provider: microsoft.foundry + +services: + my-agent: + host: azure.ai.agent + deployments: [] + network: + isolationMode: AllowInternetOutbound # managed-egress outbound posture + peSubnet: + vnet: ${AZURE_VNET_ID} + name: pe-subnet + prefix: 192.168.11.0/24 +``` + +`azd ai agent init --image` already created and selected an azd environment and +set `AZD_AGENT_SKIP_ACR=true` (BYO image → no ACR build). Set the deployment +inputs on that environment and provision: + +```bash +azd env set AZURE_SUBSCRIPTION_ID "" +azd env set AZURE_LOCATION westus +azd env set AZURE_RESOURCE_GROUP "" +azd env set AZURE_VNET_ID "" +azd provision --no-prompt +``` + +Grant the Foundry project MI ACR pull permission, then run deploy/invoke from a +host that can reach the account private endpoint: + +```bash +azd deploy --no-prompt +azd ai agent invoke --new-session "hello" +``` + +> **`isolationMode` note.** When set, azd provisions the account's V2 +> managed network (`managednetworks/default`) with the chosen isolation mode. +> `AllowOnlyApprovedOutbound` additionally requires approved outbound rules for +> the agent to reach dependent resources; for the platform-managed stores used +> here those are managed by the Foundry platform. + +### Cheatsheet: BYO image + VNet hosted agent (BYO egress) + +ACR requirements: + +- The BYO image must be pullable by the Foundry **project managed identity**. +- For ABAC-enabled ACR, grant the project MI `Container Registry Repository Reader`. +- For private-only ACR, use Premium SKU, an ACR private endpoint, and a + `privatelink.azurecr.io` DNS zone linked to the VNet. Disable public access + only after the image is pushed. + +Scaffold the agent with a pre-built (BYO) image — this writes `azure.yaml` and +`agent.yaml` for you, so there is no hand-edited manifest to keep in sync: + +```bash +azd ai agent init --no-prompt --agent-name my-agent \ + --image myprivacr.azurecr.io/agents/my-agent:v1 +``` + +Then add a `network:` block to the generated service in `azure.yaml`: + +```yaml +services: + my-agent: + host: azure.ai.agent + network: + agentSubnet: # omit the whole block for managed egress + vnet: ${AZURE_VNET_ID} + name: agent-subnet + prefix: 192.168.10.0/24 # omit prefix to reference an existing subnet + peSubnet: # required: makes the data plane private + vnet: ${AZURE_VNET_ID} + name: pe-subnet + prefix: 192.168.11.0/24 +``` + +Configure and provision (`init --image` already created/selected the env and set +`AZD_AGENT_SKIP_ACR=true`): + +```bash +azd env set AZURE_SUBSCRIPTION_ID "" +azd env set AZURE_LOCATION westus +azd env set AZURE_RESOURCE_GROUP "" +azd env set AZURE_VNET_ID "" +azd provision --no-prompt +``` + +Deploy and invoke from a host that can reach the Foundry private endpoint: + +```bash +azd deploy --no-prompt +azd ai agent invoke --new-session "hello" +``` + +Common failures: + +- `403 Public access is disabled`: the data plane is private in every + network-bound mode — run deploy/invoke from inside the VNet, a peered VNet, or + VPN. +- `ImageError: registry authentication failed`: grant ACR pull permission to the Foundry project MI. diff --git a/cli/azd/extensions/azure.ai.agents/go.mod b/cli/azd/extensions/azure.ai.agents/go.mod index 9d95076fb6e..c380b965d97 100644 --- a/cli/azd/extensions/azure.ai.agents/go.mod +++ b/cli/azd/extensions/azure.ai.agents/go.mod @@ -35,7 +35,11 @@ require ( require github.com/denormal/go-gitignore v0.0.0-20180930084346-ae8ad1d07817 -require golang.org/x/term v0.41.0 +require ( + go.opentelemetry.io/otel v1.43.0 + go.opentelemetry.io/otel/trace v1.43.0 + golang.org/x/term v0.41.0 +) require ( dario.cat/mergo v1.0.2 // indirect @@ -107,10 +111,8 @@ require ( github.com/yuin/goldmark v1.7.16 // indirect github.com/yuin/goldmark-emoji v1.0.6 // indirect go.opentelemetry.io/auto/sdk v1.2.1 // indirect - go.opentelemetry.io/otel v1.43.0 // indirect go.opentelemetry.io/otel/metric v1.43.0 // indirect go.opentelemetry.io/otel/sdk v1.43.0 // indirect - go.opentelemetry.io/otel/trace v1.43.0 // indirect go.uber.org/atomic v1.11.0 // indirect go.uber.org/multierr v1.11.0 // indirect golang.org/x/crypto v0.49.0 // indirect diff --git a/cli/azd/extensions/azure.ai.agents/internal/cmd/init_infra.go b/cli/azd/extensions/azure.ai.agents/internal/cmd/init_infra.go index 37f8328d326..c20030b02c9 100644 --- a/cli/azd/extensions/azure.ai.agents/internal/cmd/init_infra.go +++ b/cli/azd/extensions/azure.ai.agents/internal/cmd/init_infra.go @@ -93,6 +93,10 @@ func ejectInfra(projectRoot string) error { RawAzureYAML: rawYAML, ServiceName: svcName, AcceptedHosts: project.FoundryServiceHosts, + // Eject writes a static infra/ tree. Keep ${VAR} references verbatim so + // the ejected main.parameters.json stays environment-portable; the + // on-disk provision flow resolves them from the azd environment. + PreserveVarRefs: true, }) if err != nil { // Reuse the provider's vocabulary so eject and provision report diff --git a/cli/azd/extensions/azure.ai.agents/internal/cmd/init_infra_test.go b/cli/azd/extensions/azure.ai.agents/internal/cmd/init_infra_test.go index c2251f06ff3..0d8b4214b7d 100644 --- a/cli/azd/extensions/azure.ai.agents/internal/cmd/init_infra_test.go +++ b/cli/azd/extensions/azure.ai.agents/internal/cmd/init_infra_test.go @@ -194,6 +194,9 @@ func TestEjectInfra_HappyPath_WritesExpectedFiles(t *testing.T) { filepath.Join("infra", "main.bicep"), filepath.Join("infra", "abbreviations.json"), filepath.Join("infra", "modules", "acr.bicep"), + filepath.Join("infra", "modules", "network.bicep"), + filepath.Join("infra", "modules", "subnet.bicep"), + filepath.Join("infra", "modules", "private-endpoint-dns.bicep"), filepath.Join("infra", "main.parameters.json"), } for _, rel := range expected { @@ -308,6 +311,47 @@ services: assert.Equal(t, false, doc.Parameters["includeAcr"].Value) } +func TestEjectInfra_PreservesNetworkVarRefs(t *testing.T) { + // See TestEjectInfra_HappyPath_WritesExpectedFiles for why this is not parallel. + // Eject must keep ${VAR} references verbatim in main.parameters.json so the + // ejected tree stays environment-portable; the on-disk provision flow + // resolves them from the azd environment at provision time. + dir := t.TempDir() + mustWriteFile(t, filepath.Join(dir, "azure.yaml"), `name: my-project +services: + my-foundry: + host: azure.ai.agent + network: + peSubnet: {vnet: "${AZURE_VNET_ID}", name: pe-subnet} + dns: + resourceGroup: rg-dns + subscription: "${AZURE_DNS_SUBSCRIPTION_ID}" + deployments: [] + agents: + - name: my-agent + image: registry.io/myorg/myagent:latest +`) + + withCapturedStdout(t, func() { + require.NoError(t, ejectInfra(dir)) + }) + + raw, err := os.ReadFile(filepath.Join(dir, "infra", "main.parameters.json")) //nolint:gosec // G304: test file path from t.TempDir() + require.NoError(t, err) + var doc struct { + Parameters map[string]struct { + Value any `json:"value"` + } `json:"parameters"` + } + require.NoError(t, json.Unmarshal(raw, &doc)) + + assert.Equal(t, "${AZURE_VNET_ID}", doc.Parameters["vnetId"].Value, + "vnet id ${VAR} must be preserved for provision-time resolution") + assert.Equal(t, "${AZURE_DNS_SUBSCRIPTION_ID}", doc.Parameters["dnsZonesSubscription"].Value, + "dns subscription ${VAR} must be preserved for provision-time resolution") + assert.Equal(t, true, doc.Parameters["enableNetworkIsolation"].Value) +} + func TestEjectInfra_RefusesWhenInfraIsAFile(t *testing.T) { t.Parallel() // Pre-existing `infra` as a regular file (not a directory) hits the diff --git a/cli/azd/extensions/azure.ai.agents/internal/project/foundry_provisioning_provider.go b/cli/azd/extensions/azure.ai.agents/internal/project/foundry_provisioning_provider.go index 4cda4411750..525a7661dd6 100644 --- a/cli/azd/extensions/azure.ai.agents/internal/project/foundry_provisioning_provider.go +++ b/cli/azd/extensions/azure.ai.agents/internal/project/foundry_provisioning_provider.go @@ -28,6 +28,8 @@ import ( "github.com/azure/azure-dev/cli/azd/pkg/grpcbroker" "github.com/azure/azure-dev/cli/azd/pkg/input" "github.com/azure/azure-dev/cli/azd/pkg/tools/bicep" + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/trace" "go.yaml.in/yaml/v3" ) @@ -135,9 +137,12 @@ func (p *FoundryProvisioningProvider) Initialize( RawAzureYAML: rawYAML, ServiceName: svcName, AcceptedHosts: FoundryServiceHosts, + Env: p.networkEnvMap(ctx), }) switch { case errors.Is(err, synthesis.ErrEndpointBrownfield): + // network: has no effect in brownfield mode; warn if both are present. + warnNetworkIgnoredInBrownfield(rawYAML, svcName) return exterrors.Validation( exterrors.CodeBrownfieldNotSupported, "endpoint: is set on the foundry service; existing-project (brownfield) "+ @@ -158,6 +163,7 @@ func (p *FoundryProvisioningProvider) Initialize( ) } p.synthResult = res + log.Printf("[debug] foundry provider: network mode = %q", res.NetworkMode) tmplBytes, err := synthesis.ARMTemplate() if err != nil { @@ -178,7 +184,58 @@ func (p *FoundryProvisioningProvider) Initialize( return p.resolveEnv(ctx) } -// onDiskTemplatePresent returns true when either infra/main.bicepparam +// networkEnvMap returns a best-effort name -> value map of the azd environment +// for ${VAR} substitution in network fields during synthesis. It does not +// require resolveEnv to have run; on any failure it returns nil and the +// synthesizer falls back to the process environment. +func (p *FoundryProvisioningProvider) networkEnvMap(ctx context.Context) map[string]string { + if p.azdClient == nil { + return nil + } + envClient := p.azdClient.Environment() + if envClient == nil { + return nil + } + curr, err := envClient.GetCurrent(ctx, &azdext.EmptyRequest{}) + if err != nil || curr.GetEnvironment() == nil { + return nil + } + resp, err := envClient.GetValues(ctx, &azdext.GetEnvironmentRequest{Name: curr.GetEnvironment().GetName()}) + if err != nil { + log.Printf("[debug] foundry provider: GetValues failed (%s); network ${VAR} uses process env only", err) + return nil + } + out := make(map[string]string, len(resp.GetKeyValues())) + for _, kv := range resp.GetKeyValues() { + if kv != nil { + out[kv.Key] = kv.Value + } + } + return out +} + +// warnNetworkIgnoredInBrownfield logs a warning when a service declares both +// endpoint: (brownfield) and network:. The account's network posture is fixed +// by whoever created it, so the network: block has no effect. +func warnNetworkIgnoredInBrownfield(rawYAML []byte, svcName string) { + type svc struct { + Endpoint string `yaml:"endpoint,omitempty"` + Network yaml.Node `yaml:"network,omitempty"` + } + type root struct { + Services map[string]svc `yaml:"services"` + } + var r root + if err := yaml.Unmarshal(rawYAML, &r); err != nil { + return + } + s := r.Services[svcName] + if s.Endpoint != "" && !s.Network.IsZero() { + log.Printf("[warn] foundry provider: service %q sets both endpoint: and network:; "+ + "network: is ignored in brownfield mode (the account's network posture is fixed)", svcName) + } +} + // or infra/main.bicep exists under p.projectPath. Stat-only. func (p *FoundryProvisioningProvider) onDiskTemplatePresent() bool { infraDir := filepath.Join(p.projectPath, onDiskInfraDir) @@ -356,6 +413,15 @@ func (p *FoundryProvisioningProvider) Deploy( ) (*azdext.ProvisioningDeployResult, error) { progress("Preparing Foundry provisioning template...") + // provision.network_mode telemetry: none | byo | managed. Lets us measure + // secured-agent adoption and the BYO-vs-managed split. + networkMode := synthesis.NetworkModeNone + if p.synthResult != nil && p.synthResult.NetworkMode != "" { + networkMode = p.synthResult.NetworkMode + } + trace.SpanFromContext(ctx).SetAttributes( + attribute.String("provision.network_mode", networkMode)) + src, err := p.resolveTemplate(ctx, progress) if err != nil { return nil, err @@ -900,6 +966,8 @@ var canonicalOutputNames = []string{ "AZURE_CONTAINER_REGISTRY_ENDPOINT", "AZURE_CONTAINER_REGISTRY_RESOURCE_ID", "AZURE_AI_PROJECT_ACR_CONNECTION_NAME", + "AZURE_FOUNDRY_NETWORK_MODE", + "AZURE_FOUNDRY_MANAGED_ISOLATION_MODE", } // --- helpers --- diff --git a/cli/azd/extensions/azure.ai.agents/internal/project/foundry_provisioning_provider_test.go b/cli/azd/extensions/azure.ai.agents/internal/project/foundry_provisioning_provider_test.go index 9ad4a5c19b3..42fd31b4633 100644 --- a/cli/azd/extensions/azure.ai.agents/internal/project/foundry_provisioning_provider_test.go +++ b/cli/azd/extensions/azure.ai.agents/internal/project/foundry_provisioning_provider_test.go @@ -191,6 +191,16 @@ func TestArmOutputsToProto_RepairsMangledKeyCase(t *testing.T) { inKey: "foundrY_PROJECT_ENDPOINT", wantKey: "FOUNDRY_PROJECT_ENDPOINT", }, + { + name: "ARM-mangled AZURE_FOUNDRY_NETWORK_MODE -> canonical", + inKey: "azurE_FOUNDRY_NETWORK_MODE", + wantKey: "AZURE_FOUNDRY_NETWORK_MODE", + }, + { + name: "ARM-mangled AZURE_FOUNDRY_MANAGED_ISOLATION_MODE -> canonical", + inKey: "azurE_FOUNDRY_MANAGED_ISOLATION_MODE", + wantKey: "AZURE_FOUNDRY_MANAGED_ISOLATION_MODE", + }, { name: "already-canonical key passes through unchanged", inKey: "AZURE_AI_ACCOUNT_NAME", diff --git a/cli/azd/extensions/azure.ai.agents/internal/synthesis/schema_test.go b/cli/azd/extensions/azure.ai.agents/internal/synthesis/schema_test.go new file mode 100644 index 00000000000..dc9bca97d24 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/internal/synthesis/schema_test.go @@ -0,0 +1,106 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +package synthesis + +import ( + "bytes" + "encoding/json" + "os" + "os/exec" + "path/filepath" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +// schemaPath is the editor-tooling JSON schema for the Foundry service body, +// resolved from this package directory. +const schemaPath = "../../schemas/microsoft.foundry.json" + +// TestSchema_NetworkStructuralInvariants guards the network surface of the +// hand-maintained JSON schema against drift from the synthesizer's contract: +// peSubnet is mandatory, the old mode/byo/managed shape is gone, and every +// subnet requires an explicit vnet + name. +func TestSchema_NetworkStructuralInvariants(t *testing.T) { + raw, err := os.ReadFile(schemaPath) + require.NoError(t, err) + + var doc struct { + Properties struct { + Network struct { + Required []string `json:"required"` + Properties map[string]json.RawMessage `json:"properties"` + } `json:"network"` + } `json:"properties"` + Definitions struct { + Subnet struct { + Required []string `json:"required"` + Properties map[string]json.RawMessage `json:"properties"` + } `json:"subnet"` + } `json:"definitions"` + } + require.NoError(t, json.Unmarshal(raw, &doc), "schema must be valid JSON") + + net := doc.Properties.Network + assert.Contains(t, net.Required, "peSubnet", + "network must require peSubnet (no public data-plane fallback)") + assert.Contains(t, net.Properties, "agentSubnet", "network must expose agentSubnet") + assert.Contains(t, net.Properties, "isolationMode", "network must expose isolationMode") + assert.Contains(t, net.Properties, "peSubnet", "network must expose peSubnet") + + // The retired mode-enum shape must not reappear. + assert.NotContains(t, net.Properties, "mode", "network.mode was removed") + assert.NotContains(t, net.Properties, "byo", "network.byo was removed") + assert.NotContains(t, net.Properties, "managed", "network.managed was removed") + + sub := doc.Definitions.Subnet + assert.ElementsMatch(t, []string{"vnet", "name"}, sub.Required, + "a subnet must require exactly vnet + name") + assert.Contains(t, sub.Properties, "prefix", "subnet must expose prefix (create vs reference)") +} + +// TestARMTemplate_MatchesBicepBuild fails if templates/main.arm.json is stale +// relative to main.bicep. AGENTS guidance forbids hand-editing the ARM JSON; +// this catches a forgotten `bicep build`. Skipped when the bicep CLI is not on +// PATH (e.g. minimal CI images) so it never produces a phantom failure. +func TestARMTemplate_MatchesBicepBuild(t *testing.T) { + bicep := lookupBicep() + if bicep == "" { + t.Skip("bicep CLI not found on PATH; skipping ARM drift check") + } + + templatesDir := "templates" + committed, err := os.ReadFile(filepath.Join(templatesDir, "main.arm.json")) + require.NoError(t, err) + + out := filepath.Join(t.TempDir(), "main.arm.json") + cmd := exec.CommandContext(t.Context(), bicep, "build", + filepath.Join(templatesDir, "main.bicep"), "--outfile", out) + var stderr bytes.Buffer + cmd.Stderr = &stderr + require.NoErrorf(t, cmd.Run(), "bicep build failed: %s", stderr.String()) + + rebuilt, err := os.ReadFile(out) + require.NoError(t, err) + + assert.True(t, bytes.Equal(committed, rebuilt), + "templates/main.arm.json is stale; regenerate with `bicep build main.bicep "+ + "--outfile main.arm.json` from the templates directory") +} + +// lookupBicep returns a usable bicep binary path, preferring PATH and falling +// back to the az-bundled location. +func lookupBicep() string { + if p, err := exec.LookPath("bicep"); err == nil { + return p + } + if home, err := os.UserHomeDir(); err == nil { + azBicep := filepath.Join(home, ".azure", "bin", "bicep") + if _, err := os.Stat(azBicep); err == nil { + return azBicep + } + } + return "" +} diff --git a/cli/azd/extensions/azure.ai.agents/internal/synthesis/synthesizer.go b/cli/azd/extensions/azure.ai.agents/internal/synthesis/synthesizer.go index fea490956cc..cbcecbc53c1 100644 --- a/cli/azd/extensions/azure.ai.agents/internal/synthesis/synthesizer.go +++ b/cli/azd/extensions/azure.ai.agents/internal/synthesis/synthesizer.go @@ -15,7 +15,11 @@ package synthesis import ( "errors" "fmt" + "net" + "os" + "regexp" "slices" + "strings" "go.yaml.in/yaml/v3" ) @@ -45,6 +49,19 @@ type Input struct { // caller treats as a Foundry service. If empty, the service's host // value is not checked (only existence and endpoint: are). AcceptedHosts []string + + // Env maps azd environment variable names to values. Used to resolve + // ${VAR} references in network fields (subnet vnet ids, dns.subscription). + // When a referenced variable is absent here, the synthesizer falls back + // to the process environment before failing. May be nil. + Env map[string]string + + // PreserveVarRefs keeps ${VAR} references verbatim instead of resolving + // them. Used by the eject path, where the synthesized main.parameters.json + // must stay environment-portable: the on-disk provision flow resolves + // ${VAR} from the azd environment at provision time. When false (the + // provision path), ${VAR} is resolved here and a missing variable fails. + PreserveVarRefs bool } // Result bundles the bicep sources and the parameter values derived @@ -55,6 +72,10 @@ type Result struct { // Parameters maps bicep param names to plain Go values. Callers wrap // these in ARM's {"value": ...} envelope when serializing. Parameters map[string]any + + // NetworkMode is "none", "byo", or "managed" — derived from the + // network: block (or its absence). Exposed for telemetry. + NetworkMode string } // Deployment mirrors the deploymentType in main.bicep. @@ -94,10 +115,45 @@ type agentBlock struct { // reads. Unknown fields (connections, tools, agents[].tools, etc.) are // intentionally ignored: they are reconciled in azd deploy, not provision. type foundryService struct { - Host string `yaml:"host"` - Endpoint string `yaml:"endpoint,omitempty"` - Deployments []Deployment `yaml:"deployments,omitempty"` - Agents []agentBlock `yaml:"agents,omitempty"` + Host string `yaml:"host"` + Endpoint string `yaml:"endpoint,omitempty"` + Deployments []Deployment `yaml:"deployments,omitempty"` + Agents []agentBlock `yaml:"agents,omitempty"` + Network *networkBlock `yaml:"network,omitempty"` +} + +// networkBlock mirrors the network: sub-tree on the service body. +// +// The block models two orthogonal axes: +// +// - Egress (agent runtime network): agentSubnet present injects the agent into +// that customer subnet; agentSubnet absent uses the Microsoft-managed +// network. isolationMode tunes the managed network's outbound posture and is +// valid only when agentSubnet is absent. +// - Ingress (account data plane): peSubnet is required and always yields an +// account private endpoint, so a network-bound account is never public. +type networkBlock struct { + AgentSubnet *subnetSpec `yaml:"agentSubnet,omitempty"` + IsolationMode string `yaml:"isolationMode,omitempty"` + PESubnet *subnetSpec `yaml:"peSubnet,omitempty"` + DNS *dnsBlock `yaml:"dns,omitempty"` +} + +// subnetSpec is a self-contained subnet descriptor: vnet + name identify the +// subnet, and the optional prefix toggles create-vs-reference. +// +// vnet + name -> reference the existing subnet +// vnet + name + prefix -> create the subnet with that CIDR +type subnetSpec struct { + VNet string `yaml:"vnet,omitempty"` + Name string `yaml:"name,omitempty"` + Prefix string `yaml:"prefix,omitempty"` +} + +// dnsBlock mirrors network.dns (private DNS zone references). +type dnsBlock struct { + ResourceGroup string `yaml:"resourceGroup,omitempty"` + Subscription string `yaml:"subscription,omitempty"` } // projectFile is the root of azure.yaml as we care about it: only services. @@ -150,10 +206,286 @@ func Synthesize(in Input) (*Result, error) { deployments = []Deployment{} } + netParams, netMode, err := synthesizeNetwork(svc.Network, in.ServiceName, in.Env, !in.PreserveVarRefs) + if err != nil { + return nil, err + } + + params := map[string]any{ + "deployments": deployments, + "includeAcr": includeAcr, + } + for k, v := range netParams { + params[k] = v + } + return &Result{ - Parameters: map[string]any{ - "deployments": deployments, - "includeAcr": includeAcr, - }, + Parameters: params, + NetworkMode: netMode, }, nil } + +// Network mode values surfaced for telemetry and emitted as bicep params. +const ( + NetworkModeNone = "none" + NetworkModeByo = "byo" + NetworkModeManaged = "managed" +) + +// Default subnet names used when a subnet descriptor is omitted. +const ( + defaultAgentSubnetName = "agent-subnet" + defaultPESubnetName = "pe-subnet" +) + +// vnetIDPattern matches a Microsoft.Network/virtualNetworks ARM resource id. +var vnetIDPattern = regexp.MustCompile( + `(?i)^/subscriptions/[^/]+/resourceGroups/[^/]+/providers/Microsoft\.Network/virtualNetworks/[^/]+$`, +) + +// guidPattern matches a bare GUID. +var guidPattern = regexp.MustCompile( + `(?i)^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$`, +) + +// rgNamePattern matches a valid Azure resource group name. +var rgNamePattern = regexp.MustCompile(`^[-\w._()]{1,90}$`) + +// varRefPattern matches a ${VAR} reference. +var varRefPattern = regexp.MustCompile(`\$\{([A-Za-z_][A-Za-z0-9_]*)\}`) + +// synthesizeNetwork validates the network: block and returns the bicep +// parameter set plus the telemetry mode. When net is nil the returned +// params disable network isolation and the output is byte-identical to the +// pre-network behavior. +// +// When resolve is true, ${VAR} references in byo.vnet.id / dns.subscription +// are expanded from env (provision path) and an unresolved variable fails. +// When resolve is false (eject path), ${VAR} references are kept verbatim so +// the synthesized parameters file stays environment-portable; the format +// checks that cannot run against an unexpanded placeholder are skipped. +func synthesizeNetwork( + net *networkBlock, + svcName string, + env map[string]string, + resolve bool, +) (map[string]any, string, error) { + // Public account: every network param defaults off. + params := map[string]any{ + "enableNetworkIsolation": false, + "useManagedEgress": false, + "vnetId": "", + "agentSubnetName": defaultAgentSubnetName, + "agentSubnetPrefix": "", + "createAgentSubnet": false, + "peSubnetName": defaultPESubnetName, + "peSubnetPrefix": "", + "createPESubnet": false, + "managedIsolationMode": "", + "dnsZonesResourceGroup": "", + "dnsZonesSubscription": "", + } + if net == nil { + return params, NetworkModeNone, nil + } + + fp := func(suffix string) string { + return fmt.Sprintf("services.%s.network%s", svcName, suffix) + } + + // Ingress: a network-bound account always gets an account private endpoint, + // so peSubnet is mandatory. There is no public data-plane fallback. + if net.PESubnet == nil { + return nil, "", fmt.Errorf("%s: private networking requires peSubnet", fp("")) + } + + // Egress: agentSubnet present injects the agent into the customer subnet; + // absent uses the Microsoft-managed network. + useManagedEgress := net.AgentSubnet == nil + + // isolationMode governs the Microsoft-managed network only. + isoMode := strings.TrimSpace(net.IsolationMode) + if isoMode != "" { + if !useManagedEgress { + return nil, "", fmt.Errorf( + "%s.isolationMode: only valid for managed egress (omit agentSubnet)", fp("")) + } + if isoMode != "AllowInternetOutbound" && isoMode != "AllowOnlyApprovedOutbound" { + return nil, "", fmt.Errorf( + "%s.isolationMode: %q is not one of AllowInternetOutbound, AllowOnlyApprovedOutbound", + fp(""), isoMode) + } + } + + // Ingress subnet (account private endpoint). + peVnet, peName, pePrefix, createPE, err := resolveSubnet(net.PESubnet, fp(".peSubnet"), env, resolve) + if err != nil { + return nil, "", err + } + vnetID := peVnet + + // Egress subnet (byo only). v1 keeps both subnets in one VNet so a single + // vnetId drives injection, the PE, and DNS linking. + if !useManagedEgress { + agentVnet, agentName, agentPrefix, createAgent, aerr := resolveSubnet( + net.AgentSubnet, fp(".agentSubnet"), env, resolve) + if aerr != nil { + return nil, "", aerr + } + if !sameVNet(agentVnet, peVnet) { + return nil, "", fmt.Errorf( + "%s: agentSubnet.vnet and peSubnet.vnet must reference the same virtual network", fp("")) + } + params["agentSubnetName"] = agentName + params["agentSubnetPrefix"] = agentPrefix + params["createAgentSubnet"] = createAgent + vnetID = agentVnet + } + + params["enableNetworkIsolation"] = true + params["useManagedEgress"] = useManagedEgress + params["vnetId"] = vnetID + params["peSubnetName"] = peName + params["peSubnetPrefix"] = pePrefix + params["createPESubnet"] = createPE + params["managedIsolationMode"] = isoMode + + if net.DNS != nil { + if rg := strings.TrimSpace(net.DNS.ResourceGroup); rg != "" { + if !rgNamePattern.MatchString(rg) { + return nil, "", fmt.Errorf("%s.dns.resourceGroup: %q is not a valid resource group name", fp(""), rg) + } + params["dnsZonesResourceGroup"] = rg + } + if sub := strings.TrimSpace(net.DNS.Subscription); sub != "" { + if resolve { + resolved, err := resolveVars(sub, env) + if err != nil { + return nil, "", fmt.Errorf("%s.dns.subscription: %w", fp(""), err) + } + sub = resolved + } + // Normalize to a bare GUID only when concrete; an unexpanded ${VAR} + // (eject path) is normalized at provision time. + if containsVarRef(sub) { + params["dnsZonesSubscription"] = sub + } else { + guid, err := normalizeSubscription(sub) + if err != nil { + return nil, "", fmt.Errorf("%s.dns.subscription: %w", fp(""), err) + } + params["dnsZonesSubscription"] = guid + } + } + } + + mode := NetworkModeByo + if useManagedEgress { + mode = NetworkModeManaged + } + return params, mode, nil +} + +// resolveSubnet validates a subnet descriptor and returns the VNet id, subnet +// name, prefix, and whether azd should create the subnet. +// +// vnet + name -> reference existing subnet (create=false) +// vnet + name + prefix -> create subnet with that CIDR (create=true) +// +// vnet and name are required; ${VAR} references in vnet are expanded when +// resolve is true and validated as a Microsoft.Network/virtualNetworks id only +// when fully concrete. +func resolveSubnet( + s *subnetSpec, fieldPath string, env map[string]string, resolve bool, +) (vnetID, name, prefix string, create bool, err error) { + if s == nil { + return "", "", "", false, fmt.Errorf("%s: required", fieldPath) + } + vnetID = strings.TrimSpace(s.VNet) + name = strings.TrimSpace(s.Name) + prefix = strings.TrimSpace(s.Prefix) + + if vnetID == "" { + return "", "", "", false, fmt.Errorf("%s.vnet: required", fieldPath) + } + if name == "" { + return "", "", "", false, fmt.Errorf("%s.name: required", fieldPath) + } + if resolve { + resolved, rerr := resolveVars(vnetID, env) + if rerr != nil { + return "", "", "", false, fmt.Errorf("%s.vnet: %w", fieldPath, rerr) + } + vnetID = resolved + } + // Validate the ARM id shape only when fully concrete; an unexpanded ${VAR} + // (eject path) is validated at provision time. + if !containsVarRef(vnetID) && !vnetIDPattern.MatchString(vnetID) { + return "", "", "", false, fmt.Errorf( + "%s.vnet: %q is not a well-formed Microsoft.Network/virtualNetworks id", fieldPath, vnetID) + } + if prefix != "" { + if _, _, perr := net.ParseCIDR(prefix); perr != nil { + return "", "", "", false, fmt.Errorf("%s.prefix: %q is not a valid CIDR", fieldPath, prefix) + } + create = true + } + return vnetID, name, prefix, create, nil +} + +// sameVNet reports whether two VNet references point at the same VNet. Concrete +// ids compare case-insensitively (ARM ids are case-insensitive); unresolved +// ${VAR} references compare verbatim. +func sameVNet(a, b string) bool { + a = strings.TrimSpace(a) + b = strings.TrimSpace(b) + if containsVarRef(a) || containsVarRef(b) { + return a == b + } + return strings.EqualFold(a, b) +} + +// containsVarRef reports whether s still contains a ${VAR} reference. +func containsVarRef(s string) bool { + return varRefPattern.MatchString(s) +} + +// resolveVars expands ${VAR} references in s using env first, then the +// process environment. An unresolved reference is an error naming the +// variable. +func resolveVars(s string, env map[string]string) (string, error) { + var unresolved string + out := varRefPattern.ReplaceAllStringFunc(s, func(match string) string { + name := varRefPattern.FindStringSubmatch(match)[1] + if v, ok := env[name]; ok { + return v + } + if v, ok := os.LookupEnv(name); ok { + return v + } + if unresolved == "" { + unresolved = name + } + return match + }) + if unresolved != "" { + return "", fmt.Errorf("unresolved environment variable ${%s}", unresolved) + } + return out, nil +} + +// normalizeSubscription accepts a bare GUID or a /subscriptions/[/...] +// path and returns the bare GUID. +func normalizeSubscription(s string) (string, error) { + s = strings.TrimSpace(s) + if guidPattern.MatchString(s) { + return s, nil + } + if strings.HasPrefix(strings.ToLower(s), "/subscriptions/") { + parts := strings.Split(strings.Trim(s, "/"), "/") + if len(parts) >= 2 && guidPattern.MatchString(parts[1]) { + return parts[1], nil + } + } + return "", fmt.Errorf("%q is not a subscription GUID or /subscriptions/ id", s) +} diff --git a/cli/azd/extensions/azure.ai.agents/internal/synthesis/synthesizer_test.go b/cli/azd/extensions/azure.ai.agents/internal/synthesis/synthesizer_test.go index 5510e226ba6..b891a76ddaa 100644 --- a/cli/azd/extensions/azure.ai.agents/internal/synthesis/synthesizer_test.go +++ b/cli/azd/extensions/azure.ai.agents/internal/synthesis/synthesizer_test.go @@ -194,6 +194,19 @@ services: - name: gpt-4.1-mini model: {format: OpenAI, name: gpt-4.1-mini, version: "2025-04-14"} sku: {capacity: 10, name: GlobalStandard} +`, + serviceName: "my-project", + wantErr: ErrEndpointBrownfield, + }, + { + name: "brownfield: endpoint + network => network ignored, still ErrEndpointBrownfield", + yaml: ` +services: + my-project: + host: azure.ai.agent + endpoint: https://existing.services.ai.azure.com/api/projects/p1 + network: + peSubnet: {vnet: /subscriptions/s/resourceGroups/rg/providers/Microsoft.Network/virtualNetworks/v, name: pe} `, serviceName: "my-project", wantErr: ErrEndpointBrownfield, @@ -253,6 +266,53 @@ services: } } +func TestSynthesize_NetworkPreserveVarRefs(t *testing.T) { + // Eject path: ${VAR} references must pass through verbatim (and skip the + // format checks that cannot run on an unexpanded placeholder), so the + // ejected main.parameters.json stays environment-portable. + yaml := ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {vnet: "${AZURE_VNET_ID}", name: pe-subnet} + dns: + resourceGroup: rg-dns + subscription: "${AZURE_DNS_SUBSCRIPTION_ID}" +` + res, err := Synthesize(Input{ + RawAzureYAML: []byte(yaml), + ServiceName: "my-project", + AcceptedHosts: []string{"azure.ai.agent"}, + PreserveVarRefs: true, + }) + require.NoError(t, err, "unset ${VAR} must not fail on the eject path") + require.NotNil(t, res) + assert.Equal(t, "${AZURE_VNET_ID}", res.Parameters["vnetId"]) + assert.Equal(t, "${AZURE_DNS_SUBSCRIPTION_ID}", res.Parameters["dnsZonesSubscription"]) + assert.Equal(t, "rg-dns", res.Parameters["dnsZonesResourceGroup"]) +} + +func TestSynthesize_NetworkPreserveVarRefs_StillValidatesConcrete(t *testing.T) { + // PreserveVarRefs only skips checks for unexpanded placeholders; a + // concrete-but-malformed value still fails on the eject path. + yaml := ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {vnet: not-an-arm-id, name: pe-subnet} +` + _, err := Synthesize(Input{ + RawAzureYAML: []byte(yaml), + ServiceName: "my-project", + AcceptedHosts: []string{"azure.ai.agent"}, + PreserveVarRefs: true, + }) + require.Error(t, err) + assert.Contains(t, err.Error(), "not a well-formed") +} + func TestSynthesize_InputValidation(t *testing.T) { tests := []struct { name string @@ -295,6 +355,9 @@ func TestTemplatesFS_Embedded(t *testing.T) { "templates/main.arm.json", "templates/abbreviations.json", "templates/modules/acr.bicep", + "templates/modules/network.bicep", + "templates/modules/subnet.bicep", + "templates/modules/private-endpoint-dns.bicep", } for _, p := range wantFiles { t.Run(p, func(t *testing.T) { @@ -329,4 +392,357 @@ func TestARMTemplate_IsValidJSONWithExpectedShape(t *testing.T) { params, ok := arm["parameters"].(map[string]any) require.True(t, ok, "parameters must be an object") assert.Contains(t, params, "resourceGroupName") + + // Network isolation parameters must exist so the synthesizer's network + // param set is accepted by ARM (extra params would fail the deployment). + for _, p := range []string{ + "enableNetworkIsolation", "useManagedEgress", "vnetId", + "agentSubnetName", "agentSubnetPrefix", "createAgentSubnet", + "peSubnetName", "peSubnetPrefix", "createPESubnet", + "managedIsolationMode", "dnsZonesResourceGroup", "dnsZonesSubscription", + } { + assert.Contains(t, params, p, "network param %q must be declared in the ARM template", p) + } + + // The old mode-enum param must be gone; egress is driven by useManagedEgress. + assert.NotContains(t, params, "networkMode", + "networkMode param was replaced by useManagedEgress") + + // Secure-by-default lock: the account data plane must be private whenever + // network isolation is on. The compiled template must gate public access on + // enableNetworkIsolation (not on egress mode), so a network-bound account is + // never left public. This is the regression guard for the data-plane fix. + text := string(data) + wantDisable := `"disablePublicDataPlaneAccess": "[parameters('enableNetworkIsolation')]"` + wantPublic := `"publicNetworkAccess": "[if(variables('disablePublicDataPlaneAccess'), 'Disabled', 'Enabled')]"` + assert.Contains(t, text, wantDisable, + "public data-plane access must be disabled for every network-isolated account") + assert.Contains(t, text, wantPublic, + "account publicNetworkAccess must follow disablePublicDataPlaneAccess") + + // Egress injection shape: byo injects into the customer subnet + // (useMicrosoftManagedNetwork=false), managed uses the Microsoft-managed + // network (useMicrosoftManagedNetwork=true). Both branches must survive + // compilation so the account gets the right networkInjections per mode. + assert.Contains(t, text, "'useMicrosoftManagedNetwork', false()", + "byo egress must inject the agent subnet (useMicrosoftManagedNetwork=false)") + assert.Contains(t, text, "'useMicrosoftManagedNetwork', true()", + "managed egress must use the Microsoft-managed network (useMicrosoftManagedNetwork=true)") + assert.Contains(t, text, `"networkInjections": "[variables('agentNetworkInjections')]"`, + "account must carry the computed networkInjections") + + // isolationMode must be wired to the V2 managed network child resource + // (regression guard: it was previously a no-op echoed only to output). + assert.Contains(t, text, `"type": "Microsoft.CognitiveServices/accounts/managedNetworks"`, + "managed isolationMode must provision a managedNetworks child resource") + assert.Contains(t, text, `"isolationMode": "[parameters('managedIsolationMode')]"`, + "managedNetworks isolationMode must come from the managedIsolationMode param") +} + +func TestSynthesize_Network(t *testing.T) { + t.Setenv("AZURE_VNET_ID", + "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/rg/"+ + "providers/Microsoft.Network/virtualNetworks/my-vnet") + + const validVNet = "/subscriptions/11111111-1111-1111-1111-111111111111/resourceGroups/rg/" + + "providers/Microsoft.Network/virtualNetworks/my-vnet" + + tests := []struct { + name string + yaml string + wantMode string + check func(t *testing.T, p map[string]any) + }{ + { + name: "no network block => public account, isolation off", + yaml: ` +services: + my-project: + host: azure.ai.agent + deployments: + - name: gpt-4.1-mini + model: {format: OpenAI, name: gpt-4.1-mini, version: "2025-04-14"} + sku: {capacity: 10, name: GlobalStandard} +`, + wantMode: NetworkModeNone, + check: func(t *testing.T, p map[string]any) { + assert.Equal(t, false, p["enableNetworkIsolation"]) + assert.Equal(t, false, p["useManagedEgress"]) + }, + }, + { + name: "byo egress (agentSubnet present) with explicit subnets => create both", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + agentSubnet: {vnet: ` + validVNet + `, name: agent-subnet, prefix: 192.168.0.0/24} + peSubnet: {vnet: ` + validVNet + `, name: pe-subnet, prefix: 192.168.1.0/24} + dns: + resourceGroup: rg-private-dns + subscription: 22222222-2222-2222-2222-222222222222 +`, + wantMode: NetworkModeByo, + check: func(t *testing.T, p map[string]any) { + assert.Equal(t, true, p["enableNetworkIsolation"]) + assert.Equal(t, false, p["useManagedEgress"]) + assert.Equal(t, validVNet, p["vnetId"]) + assert.Equal(t, "agent-subnet", p["agentSubnetName"]) + assert.Equal(t, "192.168.0.0/24", p["agentSubnetPrefix"]) + assert.Equal(t, true, p["createAgentSubnet"]) + assert.Equal(t, true, p["createPESubnet"]) + assert.Equal(t, "rg-private-dns", p["dnsZonesResourceGroup"]) + assert.Equal(t, "22222222-2222-2222-2222-222222222222", p["dnsZonesSubscription"]) + }, + }, + { + name: "subnet without prefix => reference (create=false)", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + agentSubnet: {vnet: ` + validVNet + `, name: existing-agent} + peSubnet: {vnet: ` + validVNet + `, name: pe-subnet, prefix: 192.168.1.0/24} +`, + wantMode: NetworkModeByo, + check: func(t *testing.T, p map[string]any) { + assert.Equal(t, "existing-agent", p["agentSubnetName"]) + assert.Equal(t, false, p["createAgentSubnet"]) + assert.Equal(t, "pe-subnet", p["peSubnetName"]) + assert.Equal(t, true, p["createPESubnet"]) + }, + }, + { + name: "subnet vnet from ${VAR}", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {vnet: "${AZURE_VNET_ID}", name: pe-subnet} +`, + wantMode: NetworkModeManaged, + check: func(t *testing.T, p map[string]any) { + assert.Contains(t, p["vnetId"], "/virtualNetworks/my-vnet") + }, + }, + { + name: "managed egress (agentSubnet absent) with isolation", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + isolationMode: AllowOnlyApprovedOutbound + peSubnet: {vnet: ` + validVNet + `, name: pe-subnet, prefix: 192.168.1.0/24} +`, + wantMode: NetworkModeManaged, + check: func(t *testing.T, p map[string]any) { + assert.Equal(t, true, p["enableNetworkIsolation"]) + assert.Equal(t, true, p["useManagedEgress"]) + assert.Equal(t, false, p["createAgentSubnet"]) + assert.Equal(t, "AllowOnlyApprovedOutbound", p["managedIsolationMode"]) + }, + }, + { + name: "dns subscription normalized from /subscriptions/", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {vnet: ` + validVNet + `, name: pe-subnet} + dns: + resourceGroup: rg-dns + subscription: /subscriptions/33333333-3333-3333-3333-333333333333 +`, + wantMode: NetworkModeManaged, + check: func(t *testing.T, p map[string]any) { + assert.Equal(t, "33333333-3333-3333-3333-333333333333", p["dnsZonesSubscription"]) + }, + }, + { + name: "managed egress, isolationMode unset => empty managedIsolationMode", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {vnet: ` + validVNet + `, name: pe-subnet, prefix: 192.168.1.0/24} +`, + wantMode: NetworkModeManaged, + check: func(t *testing.T, p map[string]any) { + assert.Equal(t, true, p["useManagedEgress"]) + assert.Equal(t, "", p["managedIsolationMode"]) + assert.Equal(t, true, p["createPESubnet"]) + }, + }, + { + name: "managed egress, AllowInternetOutbound with referenced peSubnet", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + isolationMode: AllowInternetOutbound + peSubnet: {vnet: ` + validVNet + `, name: existing-pe} +`, + wantMode: NetworkModeManaged, + check: func(t *testing.T, p map[string]any) { + assert.Equal(t, true, p["useManagedEgress"]) + assert.Equal(t, "AllowInternetOutbound", p["managedIsolationMode"]) + assert.Equal(t, "existing-pe", p["peSubnetName"]) + assert.Equal(t, false, p["createPESubnet"]) + }, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + res, err := Synthesize(Input{ + RawAzureYAML: []byte(tt.yaml), + ServiceName: "my-project", + AcceptedHosts: []string{"azure.ai.agent"}, + }) + require.NoError(t, err) + require.NotNil(t, res) + assert.Equal(t, tt.wantMode, res.NetworkMode) + if tt.check != nil { + tt.check(t, res.Parameters) + } + }) + } +} + +func TestSynthesize_NetworkValidationErrors(t *testing.T) { + const validVNet = "/subscriptions/11111111-1111-1111-1111-111111111111/resourceGroups/rg/" + + "providers/Microsoft.Network/virtualNetworks/my-vnet" + const validVNet2 = "/subscriptions/11111111-1111-1111-1111-111111111111/resourceGroups/rg/" + + "providers/Microsoft.Network/virtualNetworks/other-vnet" + + tests := []struct { + name string + yaml string + wantSub string + }{ + { + name: "network present but peSubnet missing", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + isolationMode: AllowInternetOutbound +`, + wantSub: "private networking requires peSubnet", + }, + { + name: "isolationMode with agentSubnet present", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + isolationMode: AllowInternetOutbound + agentSubnet: {vnet: ` + validVNet + `, name: a, prefix: 192.168.0.0/24} + peSubnet: {vnet: ` + validVNet + `, name: pe, prefix: 192.168.1.0/24} +`, + wantSub: "only valid for managed egress", + }, + { + name: "agentSubnet and peSubnet in different vnets", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + agentSubnet: {vnet: ` + validVNet + `, name: a, prefix: 192.168.0.0/24} + peSubnet: {vnet: ` + validVNet2 + `, name: pe, prefix: 192.168.1.0/24} +`, + wantSub: "same virtual network", + }, + { + name: "subnet missing vnet", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {name: pe} +`, + wantSub: "peSubnet.vnet: required", + }, + { + name: "subnet missing name", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {vnet: ` + validVNet + `} +`, + wantSub: "peSubnet.name: required", + }, + { + name: "malformed vnet id", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {vnet: not-an-arm-id, name: pe} +`, + wantSub: "not a well-formed", + }, + { + name: "subnet invalid cidr", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {vnet: ` + validVNet + `, name: pe, prefix: not-a-cidr} +`, + wantSub: "not a valid CIDR", + }, + { + name: "unresolved var", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + peSubnet: {vnet: "${DEFINITELY_NOT_SET_VAR_XYZ}", name: pe} +`, + wantSub: "unresolved environment variable", + }, + { + name: "bad managed isolation mode", + yaml: ` +services: + my-project: + host: azure.ai.agent + network: + isolationMode: Wide + peSubnet: {vnet: ` + validVNet + `, name: pe} +`, + wantSub: "isolationMode", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + _, err := Synthesize(Input{ + RawAzureYAML: []byte(tt.yaml), + ServiceName: "my-project", + AcceptedHosts: []string{"azure.ai.agent"}, + }) + require.Error(t, err) + assert.Contains(t, err.Error(), tt.wantSub) + // Errors carry the service-scoped field path. + assert.Contains(t, err.Error(), "services.my-project.network") + }) + } } diff --git a/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/main.arm.json b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/main.arm.json index 0f0dd210464..bb33d48a719 100644 --- a/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/main.arm.json +++ b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/main.arm.json @@ -5,8 +5,8 @@ "metadata": { "_generator": { "name": "bicep", - "version": "0.39.26.7824", - "templateHash": "10248453623902523275" + "version": "0.44.1.10279", + "templateHash": "15130149524391891581" } }, "definitions": { @@ -120,6 +120,90 @@ "metadata": { "description": "Principal type used in the developer role assignment." } + }, + "enableNetworkIsolation": { + "type": "bool", + "defaultValue": false, + "metadata": { + "description": "Master switch: when true the account is VNet-bound (private)." + } + }, + "useManagedEgress": { + "type": "bool", + "defaultValue": false, + "metadata": { + "description": "When true (and isolation on), the agent runtime uses the Microsoft-managed network instead of injecting into a customer subnet." + } + }, + "vnetId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "ARM id of the existing customer VNet (byo mode)." + } + }, + "agentSubnetName": { + "type": "string", + "defaultValue": "agent-subnet", + "metadata": { + "description": "Agent (delegated) subnet name." + } + }, + "agentSubnetPrefix": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Agent subnet CIDR. Empty derives a /24 from the VNet space." + } + }, + "createAgentSubnet": { + "type": "bool", + "defaultValue": false, + "metadata": { + "description": "When true, create the agent subnet; when false, reference it." + } + }, + "peSubnetName": { + "type": "string", + "defaultValue": "pe-subnet", + "metadata": { + "description": "Private-endpoint subnet name." + } + }, + "peSubnetPrefix": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Private-endpoint subnet CIDR. Empty derives a /24 from the VNet space." + } + }, + "createPESubnet": { + "type": "bool", + "defaultValue": false, + "metadata": { + "description": "When true, create the PE subnet; when false, reference it." + } + }, + "managedIsolationMode": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Managed-network isolation mode (managed mode)." + } + }, + "dnsZonesResourceGroup": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource group holding existing private DNS zones. Empty creates new zones." + } + }, + "dnsZonesSubscription": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Subscription holding existing private DNS zones. Empty defaults to this subscription." + } } }, "resources": { @@ -164,6 +248,42 @@ }, "principalType": { "value": "[parameters('principalType')]" + }, + "enableNetworkIsolation": { + "value": "[parameters('enableNetworkIsolation')]" + }, + "useManagedEgress": { + "value": "[parameters('useManagedEgress')]" + }, + "vnetId": { + "value": "[parameters('vnetId')]" + }, + "agentSubnetName": { + "value": "[parameters('agentSubnetName')]" + }, + "agentSubnetPrefix": { + "value": "[parameters('agentSubnetPrefix')]" + }, + "createAgentSubnet": { + "value": "[parameters('createAgentSubnet')]" + }, + "peSubnetName": { + "value": "[parameters('peSubnetName')]" + }, + "peSubnetPrefix": { + "value": "[parameters('peSubnetPrefix')]" + }, + "createPESubnet": { + "value": "[parameters('createPESubnet')]" + }, + "managedIsolationMode": { + "value": "[parameters('managedIsolationMode')]" + }, + "dnsZonesResourceGroup": { + "value": "[parameters('dnsZonesResourceGroup')]" + }, + "dnsZonesSubscription": { + "value": "[parameters('dnsZonesSubscription')]" } }, "template": { @@ -173,8 +293,8 @@ "metadata": { "_generator": { "name": "bicep", - "version": "0.39.26.7824", - "templateHash": "2049009886480371322" + "version": "0.44.1.10279", + "templateHash": "769018913626752462" } }, "definitions": { @@ -281,6 +401,90 @@ "metadata": { "description": "Principal type used in the developer role assignment." } + }, + "enableNetworkIsolation": { + "type": "bool", + "defaultValue": false, + "metadata": { + "description": "Master switch: when true the account is VNet-bound (private)." + } + }, + "useManagedEgress": { + "type": "bool", + "defaultValue": false, + "metadata": { + "description": "When true (and isolation on), the agent runtime uses the Microsoft-managed network instead of injecting into a customer subnet." + } + }, + "vnetId": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "ARM id of the existing customer VNet (byo mode)." + } + }, + "agentSubnetName": { + "type": "string", + "defaultValue": "agent-subnet", + "metadata": { + "description": "Agent (delegated) subnet name." + } + }, + "agentSubnetPrefix": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Agent subnet CIDR. Empty derives a /24 from the VNet space." + } + }, + "createAgentSubnet": { + "type": "bool", + "defaultValue": false, + "metadata": { + "description": "When true, create the agent subnet; when false, reference it." + } + }, + "peSubnetName": { + "type": "string", + "defaultValue": "pe-subnet", + "metadata": { + "description": "Private-endpoint subnet name." + } + }, + "peSubnetPrefix": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Private-endpoint subnet CIDR. Empty derives a /24 from the VNet space." + } + }, + "createPESubnet": { + "type": "bool", + "defaultValue": false, + "metadata": { + "description": "When true, create the PE subnet; when false, reference it." + } + }, + "managedIsolationMode": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Managed-network isolation mode (managed mode). AllowInternetOutbound | AllowOnlyApprovedOutbound." + } + }, + "dnsZonesResourceGroup": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource group holding existing private DNS zones. Empty creates and links new zones." + } + }, + "dnsZonesSubscription": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Subscription holding existing private DNS zones. Empty defaults to this subscription." + } } }, "variables": { @@ -291,7 +495,12 @@ "resourceToken": "[if(empty(parameters('resourceTokenSalt')), uniqueString(subscription().id, resourceGroup().id, parameters('location')), uniqueString(subscription().id, resourceGroup().id, parameters('location'), parameters('resourceTokenSalt')))]", "abbrs": "[variables('$fxv#0')]", "foundryAccountName": "[format('{0}{1}', variables('abbrs').cognitiveServicesAccounts, variables('resourceToken'))]", - "cognitiveServicesUserRoleId": "[subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'a97b65f3-24c7-4388-baec-2e87135dc908')]" + "useByoNetwork": "[and(parameters('enableNetworkIsolation'), not(parameters('useManagedEgress')))]", + "useManagedNetwork": "[and(parameters('enableNetworkIsolation'), parameters('useManagedEgress'))]", + "disablePublicDataPlaneAccess": "[parameters('enableNetworkIsolation')]", + "cognitiveServicesUserRoleId": "[subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'a97b65f3-24c7-4388-baec-2e87135dc908')]", + "agentSubnetArmId": "[format('{0}/subnets/{1}', parameters('vnetId'), parameters('agentSubnetName'))]", + "agentNetworkInjections": "[if(variables('useByoNetwork'), createArray(createObject('scenario', 'agent', 'subnetArmId', variables('agentSubnetArmId'), 'useMicrosoftManagedNetwork', false())), if(variables('useManagedNetwork'), createArray(createObject('scenario', 'agent', 'useMicrosoftManagedNetwork', true())), null()))]" }, "resources": { "foundryAccount::modelDeployments": { @@ -345,20 +554,39 @@ "properties": { "allowProjectManagement": true, "customSubDomainName": "[variables('foundryAccountName')]", - "publicNetworkAccess": "Enabled", + "publicNetworkAccess": "[if(variables('disablePublicDataPlaneAccess'), 'Disabled', 'Enabled')]", "disableLocalAuth": true, "networkAcls": { - "defaultAction": "Allow", + "defaultAction": "[if(variables('disablePublicDataPlaneAccess'), 'Deny', 'Allow')]", + "bypass": "[if(variables('disablePublicDataPlaneAccess'), 'AzureServices', null())]", "virtualNetworkRules": [], "ipRules": [] + }, + "networkInjections": "[variables('agentNetworkInjections')]" + }, + "dependsOn": [ + "network" + ] + }, + "foundryManagedNetwork": { + "condition": "[and(variables('useManagedNetwork'), not(empty(parameters('managedIsolationMode'))))]", + "type": "Microsoft.CognitiveServices/accounts/managedNetworks", + "apiVersion": "2025-10-01-preview", + "name": "[format('{0}/{1}', variables('foundryAccountName'), 'default')]", + "properties": { + "managedNetwork": { + "isolationMode": "[parameters('managedIsolationMode')]" } - } + }, + "dependsOn": [ + "foundryAccount" + ] }, "developerCognitiveServicesUser": { "condition": "[not(empty(parameters('principalId')))]", "type": "Microsoft.Authorization/roleAssignments", "apiVersion": "2022-04-01", - "scope": "[format('Microsoft.CognitiveServices/accounts/{0}/projects/{1}', variables('foundryAccountName'), parameters('foundryProjectName'))]", + "scope": "[resourceId('Microsoft.CognitiveServices/accounts/projects', variables('foundryAccountName'), parameters('foundryProjectName'))]", "name": "[guid(resourceId('Microsoft.CognitiveServices/accounts/projects', variables('foundryAccountName'), parameters('foundryProjectName')), parameters('principalId'), variables('cognitiveServicesUserRoleId'))]", "properties": { "principalId": "[parameters('principalId')]", @@ -369,6 +597,316 @@ "foundryAccount::project" ] }, + "network": { + "condition": "[parameters('enableNetworkIsolation')]", + "type": "Microsoft.Resources/deployments", + "apiVersion": "2025-04-01", + "name": "foundry-network", + "properties": { + "expressionEvaluationOptions": { + "scope": "inner" + }, + "mode": "Incremental", + "parameters": { + "vnetId": { + "value": "[parameters('vnetId')]" + }, + "agentSubnetName": { + "value": "[parameters('agentSubnetName')]" + }, + "agentSubnetPrefix": { + "value": "[parameters('agentSubnetPrefix')]" + }, + "createAgentSubnet": { + "value": "[parameters('createAgentSubnet')]" + }, + "peSubnetName": { + "value": "[parameters('peSubnetName')]" + }, + "peSubnetPrefix": { + "value": "[parameters('peSubnetPrefix')]" + }, + "createPESubnet": { + "value": "[parameters('createPESubnet')]" + } + }, + "template": { + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": { + "name": "bicep", + "version": "0.44.1.10279", + "templateHash": "11653429655583605398" + } + }, + "parameters": { + "vnetId": { + "type": "string", + "metadata": { + "description": "ARM resource id of the existing customer VNet." + } + }, + "agentSubnetName": { + "type": "string", + "metadata": { + "description": "Name of the agent (delegated) subnet." + } + }, + "agentSubnetPrefix": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "CIDR for the agent subnet. Empty derives a /24 from the VNet space." + } + }, + "createAgentSubnet": { + "type": "bool", + "metadata": { + "description": "When true, create the agent subnet; when false, reference it." + } + }, + "peSubnetName": { + "type": "string", + "metadata": { + "description": "Name of the private-endpoint subnet." + } + }, + "peSubnetPrefix": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "CIDR for the private-endpoint subnet. Empty derives a /24 from the VNet space." + } + }, + "createPESubnet": { + "type": "bool", + "metadata": { + "description": "When true, create the PE subnet; when false, reference it." + } + } + }, + "variables": { + "vnetParts": "[split(parameters('vnetId'), '/')]", + "vnetSubscriptionId": "[variables('vnetParts')[2]]", + "vnetResourceGroupName": "[variables('vnetParts')[4]]", + "vnetName": "[last(variables('vnetParts'))]" + }, + "resources": [ + { + "condition": "[parameters('createAgentSubnet')]", + "type": "Microsoft.Resources/deployments", + "apiVersion": "2025-04-01", + "name": "[format('agent-subnet-{0}', uniqueString(deployment().name, parameters('agentSubnetName')))]", + "subscriptionId": "[variables('vnetSubscriptionId')]", + "resourceGroup": "[variables('vnetResourceGroupName')]", + "properties": { + "expressionEvaluationOptions": { + "scope": "inner" + }, + "mode": "Incremental", + "parameters": { + "vnetName": { + "value": "[variables('vnetName')]" + }, + "subnetName": { + "value": "[parameters('agentSubnetName')]" + }, + "addressPrefix": "[if(empty(parameters('agentSubnetPrefix')), createObject('value', cidrSubnet(reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', variables('vnetSubscriptionId'), variables('vnetResourceGroupName')), 'Microsoft.Network/virtualNetworks', variables('vnetName')), '2024-05-01').addressSpace.addressPrefixes[0], 24, 0)), createObject('value', parameters('agentSubnetPrefix')))]", + "delegations": { + "value": [ + { + "name": "Microsoft.App/environments", + "properties": { + "serviceName": "Microsoft.App/environments" + } + } + ] + } + }, + "template": { + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": { + "name": "bicep", + "version": "0.44.1.10279", + "templateHash": "9706203844896299767" + } + }, + "parameters": { + "vnetName": { + "type": "string", + "metadata": { + "description": "Name of the virtual network the subnet belongs to." + } + }, + "subnetName": { + "type": "string", + "metadata": { + "description": "Name of the subnet to create." + } + }, + "addressPrefix": { + "type": "string", + "metadata": { + "description": "CIDR for the subnet." + } + }, + "delegations": { + "type": "array", + "defaultValue": [], + "metadata": { + "description": "Subnet delegations (e.g. Microsoft.App/environments for the agent subnet)." + } + } + }, + "resources": [ + { + "type": "Microsoft.Network/virtualNetworks/subnets", + "apiVersion": "2024-05-01", + "name": "[format('{0}/{1}', parameters('vnetName'), parameters('subnetName'))]", + "properties": { + "addressPrefix": "[parameters('addressPrefix')]", + "delegations": "[parameters('delegations')]" + } + } + ], + "outputs": { + "subnetId": { + "type": "string", + "value": "[resourceId('Microsoft.Network/virtualNetworks/subnets', split(format('{0}/{1}', parameters('vnetName'), parameters('subnetName')), '/')[0], split(format('{0}/{1}', parameters('vnetName'), parameters('subnetName')), '/')[1])]" + }, + "subnetName": { + "type": "string", + "value": "[parameters('subnetName')]" + } + } + } + } + }, + { + "condition": "[parameters('createPESubnet')]", + "type": "Microsoft.Resources/deployments", + "apiVersion": "2025-04-01", + "name": "[format('pe-subnet-{0}', uniqueString(deployment().name, parameters('peSubnetName')))]", + "subscriptionId": "[variables('vnetSubscriptionId')]", + "resourceGroup": "[variables('vnetResourceGroupName')]", + "properties": { + "expressionEvaluationOptions": { + "scope": "inner" + }, + "mode": "Incremental", + "parameters": { + "vnetName": { + "value": "[variables('vnetName')]" + }, + "subnetName": { + "value": "[parameters('peSubnetName')]" + }, + "addressPrefix": "[if(empty(parameters('peSubnetPrefix')), createObject('value', cidrSubnet(reference(extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', variables('vnetSubscriptionId'), variables('vnetResourceGroupName')), 'Microsoft.Network/virtualNetworks', variables('vnetName')), '2024-05-01').addressSpace.addressPrefixes[0], 24, 1)), createObject('value', parameters('peSubnetPrefix')))]", + "delegations": { + "value": [] + } + }, + "template": { + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": { + "name": "bicep", + "version": "0.44.1.10279", + "templateHash": "9706203844896299767" + } + }, + "parameters": { + "vnetName": { + "type": "string", + "metadata": { + "description": "Name of the virtual network the subnet belongs to." + } + }, + "subnetName": { + "type": "string", + "metadata": { + "description": "Name of the subnet to create." + } + }, + "addressPrefix": { + "type": "string", + "metadata": { + "description": "CIDR for the subnet." + } + }, + "delegations": { + "type": "array", + "defaultValue": [], + "metadata": { + "description": "Subnet delegations (e.g. Microsoft.App/environments for the agent subnet)." + } + } + }, + "resources": [ + { + "type": "Microsoft.Network/virtualNetworks/subnets", + "apiVersion": "2024-05-01", + "name": "[format('{0}/{1}', parameters('vnetName'), parameters('subnetName'))]", + "properties": { + "addressPrefix": "[parameters('addressPrefix')]", + "delegations": "[parameters('delegations')]" + } + } + ], + "outputs": { + "subnetId": { + "type": "string", + "value": "[resourceId('Microsoft.Network/virtualNetworks/subnets', split(format('{0}/{1}', parameters('vnetName'), parameters('subnetName')), '/')[0], split(format('{0}/{1}', parameters('vnetName'), parameters('subnetName')), '/')[1])]" + }, + "subnetName": { + "type": "string", + "value": "[parameters('subnetName')]" + } + } + } + }, + "dependsOn": [ + "[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', variables('vnetSubscriptionId'), variables('vnetResourceGroupName')), 'Microsoft.Resources/deployments', format('agent-subnet-{0}', uniqueString(deployment().name, parameters('agentSubnetName'))))]" + ] + } + ], + "outputs": { + "vnetId": { + "type": "string", + "value": "[extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', variables('vnetSubscriptionId'), variables('vnetResourceGroupName')), 'Microsoft.Network/virtualNetworks', variables('vnetName'))]" + }, + "vnetName": { + "type": "string", + "value": "[variables('vnetName')]" + }, + "vnetSubscriptionId": { + "type": "string", + "value": "[variables('vnetSubscriptionId')]" + }, + "vnetResourceGroupName": { + "type": "string", + "value": "[variables('vnetResourceGroupName')]" + }, + "agentSubnetId": { + "type": "string", + "value": "[format('{0}/subnets/{1}', extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', variables('vnetSubscriptionId'), variables('vnetResourceGroupName')), 'Microsoft.Network/virtualNetworks', variables('vnetName')), parameters('agentSubnetName'))]" + }, + "peSubnetId": { + "type": "string", + "value": "[format('{0}/subnets/{1}', extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', variables('vnetSubscriptionId'), variables('vnetResourceGroupName')), 'Microsoft.Network/virtualNetworks', variables('vnetName')), parameters('peSubnetName'))]" + }, + "peSubnetName": { + "type": "string", + "value": "[parameters('peSubnetName')]" + } + } + } + } + }, "acr": { "condition": "[parameters('includeAcr')]", "type": "Microsoft.Resources/deployments", @@ -405,8 +943,8 @@ "metadata": { "_generator": { "name": "bicep", - "version": "0.39.26.7824", - "templateHash": "1861506930511297752" + "version": "0.44.1.10279", + "templateHash": "5205255504807716842" } }, "parameters": { @@ -497,7 +1035,7 @@ { "type": "Microsoft.Authorization/roleAssignments", "apiVersion": "2022-04-01", - "scope": "[format('Microsoft.ContainerRegistry/registries/{0}', parameters('name'))]", + "scope": "[resourceId('Microsoft.ContainerRegistry/registries', parameters('name'))]", "name": "[guid(resourceId('Microsoft.ContainerRegistry/registries', parameters('name')), parameters('foundryProjectPrincipalId'), variables('acrPullRoleId'))]", "properties": { "principalId": "[parameters('foundryProjectPrincipalId')]", @@ -529,6 +1067,232 @@ "foundryAccount", "foundryAccount::project" ] + }, + "privateEndpointDns": { + "condition": "[parameters('enableNetworkIsolation')]", + "type": "Microsoft.Resources/deployments", + "apiVersion": "2025-04-01", + "name": "foundry-private-endpoint-dns", + "properties": { + "expressionEvaluationOptions": { + "scope": "inner" + }, + "mode": "Incremental", + "parameters": { + "aiAccountName": { + "value": "[variables('foundryAccountName')]" + }, + "vnetId": { + "value": "[reference('network').outputs.vnetId.value]" + }, + "peSubnetId": { + "value": "[reference('network').outputs.peSubnetId.value]" + }, + "suffix": { + "value": "[variables('resourceToken')]" + }, + "dnsZonesResourceGroup": { + "value": "[parameters('dnsZonesResourceGroup')]" + }, + "dnsZonesSubscription": { + "value": "[parameters('dnsZonesSubscription')]" + } + }, + "template": { + "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "metadata": { + "_generator": { + "name": "bicep", + "version": "0.44.1.10279", + "templateHash": "17066190331540845310" + } + }, + "parameters": { + "aiAccountName": { + "type": "string", + "metadata": { + "description": "Name of the Foundry (AIServices) account to bind the private endpoint to." + } + }, + "vnetId": { + "type": "string", + "metadata": { + "description": "ARM resource id of the customer VNet." + } + }, + "peSubnetId": { + "type": "string", + "metadata": { + "description": "ARM resource id of the private-endpoint subnet." + } + }, + "suffix": { + "type": "string", + "metadata": { + "description": "Suffix for unique resource/link names." + } + }, + "dnsZonesResourceGroup": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Resource group holding existing private DNS zones. Empty creates and links new zones." + } + }, + "dnsZonesSubscription": { + "type": "string", + "defaultValue": "", + "metadata": { + "description": "Subscription holding existing private DNS zones. Empty defaults to this subscription." + } + } + }, + "variables": { + "aiServicesDnsZoneName": "privatelink.services.ai.azure.com", + "openAiDnsZoneName": "privatelink.openai.azure.com", + "cognitiveServicesDnsZoneName": "privatelink.cognitiveservices.azure.com", + "useExistingZones": "[not(empty(parameters('dnsZonesResourceGroup')))]", + "existingZonesSubscription": "[if(empty(parameters('dnsZonesSubscription')), subscription().subscriptionId, parameters('dnsZonesSubscription'))]", + "aiServicesZoneId": "[if(variables('useExistingZones'), extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', variables('existingZonesSubscription'), parameters('dnsZonesResourceGroup')), 'Microsoft.Network/privateDnsZones', variables('aiServicesDnsZoneName')), resourceId('Microsoft.Network/privateDnsZones', variables('aiServicesDnsZoneName')))]", + "openAiZoneId": "[if(variables('useExistingZones'), extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', variables('existingZonesSubscription'), parameters('dnsZonesResourceGroup')), 'Microsoft.Network/privateDnsZones', variables('openAiDnsZoneName')), resourceId('Microsoft.Network/privateDnsZones', variables('openAiDnsZoneName')))]", + "cognitiveServicesZoneId": "[if(variables('useExistingZones'), extensionResourceId(format('/subscriptions/{0}/resourceGroups/{1}', variables('existingZonesSubscription'), parameters('dnsZonesResourceGroup')), 'Microsoft.Network/privateDnsZones', variables('cognitiveServicesDnsZoneName')), resourceId('Microsoft.Network/privateDnsZones', variables('cognitiveServicesDnsZoneName')))]" + }, + "resources": [ + { + "type": "Microsoft.Network/privateEndpoints", + "apiVersion": "2024-05-01", + "name": "[format('{0}-private-endpoint', parameters('aiAccountName'))]", + "location": "[resourceGroup().location]", + "properties": { + "subnet": { + "id": "[parameters('peSubnetId')]" + }, + "privateLinkServiceConnections": [ + { + "name": "[format('{0}-private-link-service-connection', parameters('aiAccountName'))]", + "properties": { + "privateLinkServiceId": "[resourceId('Microsoft.CognitiveServices/accounts', parameters('aiAccountName'))]", + "groupIds": [ + "account" + ] + } + } + ] + } + }, + { + "condition": "[not(variables('useExistingZones'))]", + "type": "Microsoft.Network/privateDnsZones", + "apiVersion": "2020-06-01", + "name": "[variables('aiServicesDnsZoneName')]", + "location": "global" + }, + { + "condition": "[not(variables('useExistingZones'))]", + "type": "Microsoft.Network/privateDnsZones", + "apiVersion": "2020-06-01", + "name": "[variables('openAiDnsZoneName')]", + "location": "global" + }, + { + "condition": "[not(variables('useExistingZones'))]", + "type": "Microsoft.Network/privateDnsZones", + "apiVersion": "2020-06-01", + "name": "[variables('cognitiveServicesDnsZoneName')]", + "location": "global" + }, + { + "condition": "[not(variables('useExistingZones'))]", + "type": "Microsoft.Network/privateDnsZones/virtualNetworkLinks", + "apiVersion": "2024-06-01", + "name": "[format('{0}/{1}', variables('aiServicesDnsZoneName'), format('aiServices-{0}-link', parameters('suffix')))]", + "location": "global", + "properties": { + "virtualNetwork": { + "id": "[parameters('vnetId')]" + }, + "registrationEnabled": false + }, + "dependsOn": [ + "[resourceId('Microsoft.Network/privateDnsZones', variables('aiServicesDnsZoneName'))]" + ] + }, + { + "condition": "[not(variables('useExistingZones'))]", + "type": "Microsoft.Network/privateDnsZones/virtualNetworkLinks", + "apiVersion": "2024-06-01", + "name": "[format('{0}/{1}', variables('openAiDnsZoneName'), format('aiServicesOpenAI-{0}-link', parameters('suffix')))]", + "location": "global", + "properties": { + "virtualNetwork": { + "id": "[parameters('vnetId')]" + }, + "registrationEnabled": false + }, + "dependsOn": [ + "[resourceId('Microsoft.Network/privateDnsZones', variables('openAiDnsZoneName'))]" + ] + }, + { + "condition": "[not(variables('useExistingZones'))]", + "type": "Microsoft.Network/privateDnsZones/virtualNetworkLinks", + "apiVersion": "2024-06-01", + "name": "[format('{0}/{1}', variables('cognitiveServicesDnsZoneName'), format('aiServicesCognitiveServices-{0}-link', parameters('suffix')))]", + "location": "global", + "properties": { + "virtualNetwork": { + "id": "[parameters('vnetId')]" + }, + "registrationEnabled": false + }, + "dependsOn": [ + "[resourceId('Microsoft.Network/privateDnsZones', variables('cognitiveServicesDnsZoneName'))]" + ] + }, + { + "type": "Microsoft.Network/privateEndpoints/privateDnsZoneGroups", + "apiVersion": "2024-05-01", + "name": "[format('{0}/{1}', format('{0}-private-endpoint', parameters('aiAccountName')), format('{0}-dns-group', parameters('aiAccountName')))]", + "properties": { + "privateDnsZoneConfigs": [ + { + "name": "[format('{0}-dns-aiserv-config', parameters('aiAccountName'))]", + "properties": { + "privateDnsZoneId": "[variables('aiServicesZoneId')]" + } + }, + { + "name": "[format('{0}-dns-openai-config', parameters('aiAccountName'))]", + "properties": { + "privateDnsZoneId": "[variables('openAiZoneId')]" + } + }, + { + "name": "[format('{0}-dns-cogserv-config', parameters('aiAccountName'))]", + "properties": { + "privateDnsZoneId": "[variables('cognitiveServicesZoneId')]" + } + } + ] + }, + "dependsOn": [ + "[resourceId('Microsoft.Network/privateEndpoints', format('{0}-private-endpoint', parameters('aiAccountName')))]", + "[resourceId('Microsoft.Network/privateDnsZones/virtualNetworkLinks', variables('aiServicesDnsZoneName'), format('aiServices-{0}-link', parameters('suffix')))]", + "[resourceId('Microsoft.Network/privateDnsZones', variables('aiServicesDnsZoneName'))]", + "[resourceId('Microsoft.Network/privateDnsZones/virtualNetworkLinks', variables('cognitiveServicesDnsZoneName'), format('aiServicesCognitiveServices-{0}-link', parameters('suffix')))]", + "[resourceId('Microsoft.Network/privateDnsZones', variables('cognitiveServicesDnsZoneName'))]", + "[resourceId('Microsoft.Network/privateDnsZones/virtualNetworkLinks', variables('openAiDnsZoneName'), format('aiServicesOpenAI-{0}-link', parameters('suffix')))]", + "[resourceId('Microsoft.Network/privateDnsZones', variables('openAiDnsZoneName'))]" + ] + } + ] + } + }, + "dependsOn": [ + "foundryAccount", + "network" + ] } }, "outputs": { @@ -563,6 +1327,14 @@ "AZURE_AI_PROJECT_ACR_CONNECTION_NAME": { "type": "string", "value": "[if(parameters('includeAcr'), reference('acr').outputs.connectionName.value, '')]" + }, + "AZURE_FOUNDRY_NETWORK_MODE": { + "type": "string", + "value": "[if(not(parameters('enableNetworkIsolation')), 'none', if(parameters('useManagedEgress'), 'managed', 'byo'))]" + }, + "AZURE_FOUNDRY_MANAGED_ISOLATION_MODE": { + "type": "string", + "value": "[if(variables('useManagedNetwork'), parameters('managedIsolationMode'), '')]" } } } @@ -608,6 +1380,14 @@ "AZURE_AI_PROJECT_ACR_CONNECTION_NAME": { "type": "string", "value": "[reference('resources').outputs.AZURE_AI_PROJECT_ACR_CONNECTION_NAME.value]" + }, + "AZURE_FOUNDRY_NETWORK_MODE": { + "type": "string", + "value": "[reference('resources').outputs.AZURE_FOUNDRY_NETWORK_MODE.value]" + }, + "AZURE_FOUNDRY_MANAGED_ISOLATION_MODE": { + "type": "string", + "value": "[reference('resources').outputs.AZURE_FOUNDRY_MANAGED_ISOLATION_MODE.value]" } } } \ No newline at end of file diff --git a/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/main.bicep b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/main.bicep index 21411315904..dec15004123 100644 --- a/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/main.bicep +++ b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/main.bicep @@ -63,6 +63,45 @@ param principalId string = '' @description('Principal type used in the developer role assignment.') param principalType string = 'User' +// Network isolation parameters (see modules/resources.bicep for semantics). +// All default off so an absent network: block yields a public account. + +@description('Master switch: when true the account is VNet-bound (private).') +param enableNetworkIsolation bool = false + +@description('When true (and isolation on), the agent runtime uses the Microsoft-managed network instead of injecting into a customer subnet.') +param useManagedEgress bool = false + +@description('ARM id of the existing customer VNet (byo mode).') +param vnetId string = '' + +@description('Agent (delegated) subnet name.') +param agentSubnetName string = 'agent-subnet' + +@description('Agent subnet CIDR. Empty derives a /24 from the VNet space.') +param agentSubnetPrefix string = '' + +@description('When true, create the agent subnet; when false, reference it.') +param createAgentSubnet bool = false + +@description('Private-endpoint subnet name.') +param peSubnetName string = 'pe-subnet' + +@description('Private-endpoint subnet CIDR. Empty derives a /24 from the VNet space.') +param peSubnetPrefix string = '' + +@description('When true, create the PE subnet; when false, reference it.') +param createPESubnet bool = false + +@description('Managed-network isolation mode (managed mode).') +param managedIsolationMode string = '' + +@description('Resource group holding existing private DNS zones. Empty creates new zones.') +param dnsZonesResourceGroup string = '' + +@description('Subscription holding existing private DNS zones. Empty defaults to this subscription.') +param dnsZonesSubscription string = '' + // Resources resource resourceGroup 'Microsoft.Resources/resourceGroups@2021-04-01' = { @@ -83,6 +122,18 @@ module resources 'modules/resources.bicep' = { includeAcr: includeAcr principalId: principalId principalType: principalType + enableNetworkIsolation: enableNetworkIsolation + useManagedEgress: useManagedEgress + vnetId: vnetId + agentSubnetName: agentSubnetName + agentSubnetPrefix: agentSubnetPrefix + createAgentSubnet: createAgentSubnet + peSubnetName: peSubnetName + peSubnetPrefix: peSubnetPrefix + createPESubnet: createPESubnet + managedIsolationMode: managedIsolationMode + dnsZonesResourceGroup: dnsZonesResourceGroup + dnsZonesSubscription: dnsZonesSubscription } } @@ -97,3 +148,5 @@ output FOUNDRY_PROJECT_ENDPOINT string = resources.outputs.FOUNDRY_PROJECT_ENDPO output AZURE_CONTAINER_REGISTRY_ENDPOINT string = resources.outputs.AZURE_CONTAINER_REGISTRY_ENDPOINT output AZURE_CONTAINER_REGISTRY_RESOURCE_ID string = resources.outputs.AZURE_CONTAINER_REGISTRY_RESOURCE_ID output AZURE_AI_PROJECT_ACR_CONNECTION_NAME string = resources.outputs.AZURE_AI_PROJECT_ACR_CONNECTION_NAME +output AZURE_FOUNDRY_NETWORK_MODE string = resources.outputs.AZURE_FOUNDRY_NETWORK_MODE +output AZURE_FOUNDRY_MANAGED_ISOLATION_MODE string = resources.outputs.AZURE_FOUNDRY_MANAGED_ISOLATION_MODE diff --git a/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/network.bicep b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/network.bicep new file mode 100644 index 00000000000..68f414e6da5 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/network.bicep @@ -0,0 +1,94 @@ +// Virtual network wiring for a network-secured (VNet-injected) Foundry account. +// +// Bring-your-own VNet only (network.mode: byo). The VNet must already exist; +// v1 references it by the ARM id supplied in azure.yaml. Each subnet follows +// the tri-state rule from the synthesizer: +// +// create=true, prefix set -> create the subnet with that prefix +// create=true, prefix empty -> create the subnet with a derived /24 prefix +// create=false -> reference an existing subnet as-is +// +// All subnet ids are deterministic ('/subnets/'), so outputs are +// valid whether the subnet was created here or already existed. + +targetScope = 'resourceGroup' + +@description('ARM resource id of the existing customer VNet.') +param vnetId string + +@description('Name of the agent (delegated) subnet.') +param agentSubnetName string + +@description('CIDR for the agent subnet. Empty derives a /24 from the VNet space.') +param agentSubnetPrefix string = '' + +@description('When true, create the agent subnet; when false, reference it.') +param createAgentSubnet bool + +@description('Name of the private-endpoint subnet.') +param peSubnetName string + +@description('CIDR for the private-endpoint subnet. Empty derives a /24 from the VNet space.') +param peSubnetPrefix string = '' + +@description('When true, create the PE subnet; when false, reference it.') +param createPESubnet bool + +// The VNet may live in a different resource group than the deployment RG. +var vnetParts = split(vnetId, '/') +var vnetSubscriptionId = vnetParts[2] +var vnetResourceGroupName = vnetParts[4] +var vnetName = last(vnetParts) + +resource vnet 'Microsoft.Network/virtualNetworks@2024-05-01' existing = { + name: vnetName + scope: resourceGroup(vnetSubscriptionId, vnetResourceGroupName) +} + +var vnetAddressSpace = vnet.properties.addressSpace.addressPrefixes[0] +var agentPrefix = empty(agentSubnetPrefix) ? cidrSubnet(vnetAddressSpace, 24, 0) : agentSubnetPrefix +var pePrefix = empty(peSubnetPrefix) ? cidrSubnet(vnetAddressSpace, 24, 1) : peSubnetPrefix + +// Create the agent subnet, delegated to Microsoft.App/environments so the +// hosted agent's container app environment can be injected into it. +module agentSubnet 'subnet.bicep' = if (createAgentSubnet) { + name: 'agent-subnet-${uniqueString(deployment().name, agentSubnetName)}' + scope: resourceGroup(vnetSubscriptionId, vnetResourceGroupName) + params: { + vnetName: vnetName + subnetName: agentSubnetName + addressPrefix: agentPrefix + delegations: [ + { + name: 'Microsoft.App/environments' + properties: { + serviceName: 'Microsoft.App/environments' + } + } + ] + } +} + +// Create the private-endpoint subnet. Depends on the agent subnet so the two +// subnet PUTs against the same VNet do not race (ARM serializes subnet writes). +module peSubnet 'subnet.bicep' = if (createPESubnet) { + name: 'pe-subnet-${uniqueString(deployment().name, peSubnetName)}' + scope: resourceGroup(vnetSubscriptionId, vnetResourceGroupName) + params: { + vnetName: vnetName + subnetName: peSubnetName + addressPrefix: pePrefix + delegations: [] + } + dependsOn: [ + agentSubnet + ] +} + +output vnetId string = vnet.id +output vnetName string = vnetName +output vnetSubscriptionId string = vnetSubscriptionId +output vnetResourceGroupName string = vnetResourceGroupName +output agentSubnetId string = '${vnet.id}/subnets/${agentSubnetName}' +output peSubnetId string = '${vnet.id}/subnets/${peSubnetName}' +output peSubnetName string = peSubnetName diff --git a/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/private-endpoint-dns.bicep b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/private-endpoint-dns.bicep new file mode 100644 index 00000000000..a8af939b859 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/private-endpoint-dns.bicep @@ -0,0 +1,165 @@ +// Account private endpoint + the three AI private DNS zones for a +// network-secured Foundry account. Dependent stores stay platform-managed, so +// only the account itself gets a private endpoint here (no Search / Storage / +// Cosmos endpoints). +// +// DNS zones are created and linked to the VNet by default. When +// dnsZonesResourceGroup is set, the zones are referenced from that resource +// group (in dnsZonesSubscription, defaulting to this subscription) instead of +// being created. + +targetScope = 'resourceGroup' + +@description('Name of the Foundry (AIServices) account to bind the private endpoint to.') +param aiAccountName string + +@description('ARM resource id of the customer VNet.') +param vnetId string + +@description('ARM resource id of the private-endpoint subnet.') +param peSubnetId string + +@description('Suffix for unique resource/link names.') +param suffix string + +@description('Resource group holding existing private DNS zones. Empty creates and links new zones.') +param dnsZonesResourceGroup string = '' + +@description('Subscription holding existing private DNS zones. Empty defaults to this subscription.') +param dnsZonesSubscription string = '' + +var aiServicesDnsZoneName = 'privatelink.services.ai.azure.com' +var openAiDnsZoneName = 'privatelink.openai.azure.com' +var cognitiveServicesDnsZoneName = 'privatelink.cognitiveservices.azure.com' + +var useExistingZones = !empty(dnsZonesResourceGroup) +var existingZonesSubscription = empty(dnsZonesSubscription) ? subscription().subscriptionId : dnsZonesSubscription + +resource aiAccount 'Microsoft.CognitiveServices/accounts@2025-06-01' existing = { + name: aiAccountName + scope: resourceGroup() +} + +// Account private endpoint in the PE subnet, targeting the 'account' group. +resource aiAccountPrivateEndpoint 'Microsoft.Network/privateEndpoints@2024-05-01' = { + name: '${aiAccountName}-private-endpoint' + location: resourceGroup().location + properties: { + subnet: { + id: peSubnetId + } + privateLinkServiceConnections: [ + { + name: '${aiAccountName}-private-link-service-connection' + properties: { + privateLinkServiceId: aiAccount.id + groupIds: [ + 'account' + ] + } + } + ] + } +} + +// ---- Private DNS zones: create-and-link, or reference existing ---- + +resource aiServicesZone 'Microsoft.Network/privateDnsZones@2020-06-01' = if (!useExistingZones) { + name: aiServicesDnsZoneName + location: 'global' +} +resource existingAiServicesZone 'Microsoft.Network/privateDnsZones@2020-06-01' existing = if (useExistingZones) { + name: aiServicesDnsZoneName + scope: resourceGroup(existingZonesSubscription, dnsZonesResourceGroup) +} +var aiServicesZoneId = useExistingZones ? existingAiServicesZone.id : aiServicesZone.id + +resource openAiZone 'Microsoft.Network/privateDnsZones@2020-06-01' = if (!useExistingZones) { + name: openAiDnsZoneName + location: 'global' +} +resource existingOpenAiZone 'Microsoft.Network/privateDnsZones@2020-06-01' existing = if (useExistingZones) { + name: openAiDnsZoneName + scope: resourceGroup(existingZonesSubscription, dnsZonesResourceGroup) +} +var openAiZoneId = useExistingZones ? existingOpenAiZone.id : openAiZone.id + +resource cognitiveServicesZone 'Microsoft.Network/privateDnsZones@2020-06-01' = if (!useExistingZones) { + name: cognitiveServicesDnsZoneName + location: 'global' +} +resource existingCognitiveServicesZone 'Microsoft.Network/privateDnsZones@2020-06-01' existing = if (useExistingZones) { + name: cognitiveServicesDnsZoneName + scope: resourceGroup(existingZonesSubscription, dnsZonesResourceGroup) +} +var cognitiveServicesZoneId = useExistingZones ? existingCognitiveServicesZone.id : cognitiveServicesZone.id + +// ---- VNet links (only when we create the zones) ---- + +resource aiServicesLink 'Microsoft.Network/privateDnsZones/virtualNetworkLinks@2024-06-01' = if (!useExistingZones) { + parent: aiServicesZone + name: 'aiServices-${suffix}-link' + location: 'global' + properties: { + virtualNetwork: { + id: vnetId + } + registrationEnabled: false + } +} +resource openAiLink 'Microsoft.Network/privateDnsZones/virtualNetworkLinks@2024-06-01' = if (!useExistingZones) { + parent: openAiZone + name: 'aiServicesOpenAI-${suffix}-link' + location: 'global' + properties: { + virtualNetwork: { + id: vnetId + } + registrationEnabled: false + } +} +resource cognitiveServicesLink 'Microsoft.Network/privateDnsZones/virtualNetworkLinks@2024-06-01' = if (!useExistingZones) { + parent: cognitiveServicesZone + name: 'aiServicesCognitiveServices-${suffix}-link' + location: 'global' + properties: { + virtualNetwork: { + id: vnetId + } + registrationEnabled: false + } +} + +// ---- DNS zone group binds the three zones to the account endpoint ---- + +resource aiAccountDnsGroup 'Microsoft.Network/privateEndpoints/privateDnsZoneGroups@2024-05-01' = { + parent: aiAccountPrivateEndpoint + name: '${aiAccountName}-dns-group' + properties: { + privateDnsZoneConfigs: [ + { + name: '${aiAccountName}-dns-aiserv-config' + properties: { + privateDnsZoneId: aiServicesZoneId + } + } + { + name: '${aiAccountName}-dns-openai-config' + properties: { + privateDnsZoneId: openAiZoneId + } + } + { + name: '${aiAccountName}-dns-cogserv-config' + properties: { + privateDnsZoneId: cognitiveServicesZoneId + } + } + ] + } + dependsOn: [ + aiServicesLink + openAiLink + cognitiveServicesLink + ] +} diff --git a/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/resources.bicep b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/resources.bicep index 42e1991b4ed..32985dbdfe5 100644 --- a/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/resources.bicep +++ b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/resources.bicep @@ -55,6 +55,45 @@ param principalId string = '' @description('Principal type used in the developer role assignment.') param principalType string = 'User' +// Network isolation parameters. All default off so an absent network: block in +// azure.yaml yields a public account identical to the pre-network template. + +@description('Master switch: when true the account is VNet-bound (private).') +param enableNetworkIsolation bool = false + +@description('When true (and isolation on), the agent runtime uses the Microsoft-managed network instead of injecting into a customer subnet.') +param useManagedEgress bool = false + +@description('ARM id of the existing customer VNet (byo mode).') +param vnetId string = '' + +@description('Agent (delegated) subnet name.') +param agentSubnetName string = 'agent-subnet' + +@description('Agent subnet CIDR. Empty derives a /24 from the VNet space.') +param agentSubnetPrefix string = '' + +@description('When true, create the agent subnet; when false, reference it.') +param createAgentSubnet bool = false + +@description('Private-endpoint subnet name.') +param peSubnetName string = 'pe-subnet' + +@description('Private-endpoint subnet CIDR. Empty derives a /24 from the VNet space.') +param peSubnetPrefix string = '' + +@description('When true, create the PE subnet; when false, reference it.') +param createPESubnet bool = false + +@description('Managed-network isolation mode (managed mode). AllowInternetOutbound | AllowOnlyApprovedOutbound.') +param managedIsolationMode string = '' + +@description('Resource group holding existing private DNS zones. Empty creates and links new zones.') +param dnsZonesResourceGroup string = '' + +@description('Subscription holding existing private DNS zones. Empty defaults to this subscription.') +param dnsZonesSubscription string = '' + // Variables var resourceToken = empty(resourceTokenSalt) @@ -65,6 +104,13 @@ var abbrs = loadJsonContent('../abbreviations.json') var foundryAccountName = '${abbrs.cognitiveServicesAccounts}${resourceToken}' +// Egress: byo injects the agent into a customer subnet; managed uses the +// Microsoft-managed network. Ingress: an account private endpoint is always +// provisioned when isolation is on, so the data plane is never left public. +var useByoNetwork = enableNetworkIsolation && !useManagedEgress +var useManagedNetwork = enableNetworkIsolation && useManagedEgress +var disablePublicDataPlaneAccess = enableNetworkIsolation + // Built-in role definition ids. See: https://learn.microsoft.com/azure/role-based-access-control/built-in-roles var cognitiveServicesUserRoleId = subscriptionResourceId( 'Microsoft.Authorization/roleDefinitions', @@ -73,6 +119,50 @@ var cognitiveServicesUserRoleId = subscriptionResourceId( // Resources +// Customer VNet wiring: reference the VNet and create or reference the agent +// (byo egress only) + private-endpoint subnets. Runs whenever isolation is on +// because the account private endpoint is always provisioned. +module network 'network.bicep' = if (enableNetworkIsolation) { + name: 'foundry-network' + params: { + vnetId: vnetId + agentSubnetName: agentSubnetName + agentSubnetPrefix: agentSubnetPrefix + createAgentSubnet: createAgentSubnet + peSubnetName: peSubnetName + peSubnetPrefix: peSubnetPrefix + createPESubnet: createPESubnet + } +} + +// networkInjections wires the account into the agent subnet (byo) or the +// Microsoft-managed network (managed). Null when isolation is off. +// +// subnetArmId is built as a concrete string from the (concrete) vnetId param +// rather than network!.outputs.agentSubnetId. The account and the network +// module deploy in the same template, so an inter-module reference() here is +// unresolved at the CognitiveServices RP preflight, which then fails to convert +// networkInjections to its typed contract (ARM what-if does not catch this). +// The deterministic id avoids the unresolved reference; an explicit dependsOn +// on the network module preserves ordering (the subnet must exist first). +var agentSubnetArmId = '${vnetId}/subnets/${agentSubnetName}' +var agentNetworkInjections = useByoNetwork + ? [ + { + scenario: 'agent' + subnetArmId: agentSubnetArmId + useMicrosoftManagedNetwork: false + } + ] + : (useManagedNetwork + ? [ + { + scenario: 'agent' + useMicrosoftManagedNetwork: true + } + ] + : null) + resource foundryAccount 'Microsoft.CognitiveServices/accounts@2025-06-01' = { name: foundryAccountName location: location @@ -87,15 +177,22 @@ resource foundryAccount 'Microsoft.CognitiveServices/accounts@2025-06-01' = { properties: { allowProjectManagement: true customSubDomainName: foundryAccountName - publicNetworkAccess: 'Enabled' + publicNetworkAccess: disablePublicDataPlaneAccess ? 'Disabled' : 'Enabled' disableLocalAuth: true networkAcls: { - defaultAction: 'Allow' + defaultAction: disablePublicDataPlaneAccess ? 'Deny' : 'Allow' + bypass: disablePublicDataPlaneAccess ? 'AzureServices' : null virtualNetworkRules: [] ipRules: [] } + networkInjections: agentNetworkInjections } + // The account injects into the agent subnet via a deterministic id (above), + // so Bicep cannot infer the dependency on the network module that creates + // that subnet. Declare it explicitly so the subnet exists before injection. + dependsOn: useByoNetwork ? [network] : [] + // Sequential model deployment creation; ARM throttles concurrent // deployments on the same account. @batchSize(1) @@ -128,6 +225,23 @@ resource foundryAccount 'Microsoft.CognitiveServices/accounts@2025-06-01' = { } } +// Managed-network isolation (managed egress only). Applies the chosen outbound +// isolation mode to the Microsoft-managed VNet that hosts the agent runtime. +// Only deployed when an explicit isolationMode is requested; otherwise the +// platform default applies. Note: AllowOnlyApprovedOutbound additionally +// requires approved outbound rules for the agent to reach dependent resources; +// for the platform-managed stores used here those are managed by the platform. +resource foundryManagedNetwork 'Microsoft.CognitiveServices/accounts/managednetworks@2025-10-01-preview' = + if (useManagedNetwork && !empty(managedIsolationMode)) { + parent: foundryAccount + name: 'default' + properties: { + managedNetwork: { + isolationMode: managedIsolationMode + } + } + } + module acr 'acr.bicep' = if (includeAcr) { name: 'acr' params: { @@ -140,6 +254,21 @@ module acr 'acr.bicep' = if (includeAcr) { } } +// Account private endpoint + AI private DNS zones. The account is always given a +// private endpoint when isolation is on (byo or managed egress); dependent +// stores stay platform-managed, so only the account gets an endpoint. +module privateEndpointDns 'private-endpoint-dns.bicep' = if (enableNetworkIsolation) { + name: 'foundry-private-endpoint-dns' + params: { + aiAccountName: foundryAccount.name + vnetId: network!.outputs.vnetId + peSubnetId: network!.outputs.peSubnetId + suffix: resourceToken + dnsZonesResourceGroup: dnsZonesResourceGroup + dnsZonesSubscription: dnsZonesSubscription + } +} + // Grant the developer Cognitive Services User on the project so they can call // the Foundry data-plane (chat/completions, agents API) from their machine. resource developerCognitiveServicesUser 'Microsoft.Authorization/roleAssignments@2022-04-01' = if (!empty(principalId)) { @@ -162,3 +291,5 @@ output FOUNDRY_PROJECT_ENDPOINT string = 'https://${foundryAccount.name}.service output AZURE_CONTAINER_REGISTRY_ENDPOINT string = includeAcr ? acr!.outputs.loginServer : '' output AZURE_CONTAINER_REGISTRY_RESOURCE_ID string = includeAcr ? acr!.outputs.resourceId : '' output AZURE_AI_PROJECT_ACR_CONNECTION_NAME string = includeAcr ? acr!.outputs.connectionName : '' +output AZURE_FOUNDRY_NETWORK_MODE string = !enableNetworkIsolation ? 'none' : (useManagedEgress ? 'managed' : 'byo') +output AZURE_FOUNDRY_MANAGED_ISOLATION_MODE string = useManagedNetwork ? managedIsolationMode : '' diff --git a/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/subnet.bicep b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/subnet.bicep new file mode 100644 index 00000000000..27abadcb813 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/internal/synthesis/templates/modules/subnet.bicep @@ -0,0 +1,28 @@ +// Single subnet on an existing VNet. Kept as its own module so the parent can +// place subnets in the VNet's resource group (which may differ from the +// deployment RG) and serialize subnet writes via module dependsOn. + +targetScope = 'resourceGroup' + +@description('Name of the virtual network the subnet belongs to.') +param vnetName string + +@description('Name of the subnet to create.') +param subnetName string + +@description('CIDR for the subnet.') +param addressPrefix string + +@description('Subnet delegations (e.g. Microsoft.App/environments for the agent subnet).') +param delegations array = [] + +resource subnet 'Microsoft.Network/virtualNetworks/subnets@2024-05-01' = { + name: '${vnetName}/${subnetName}' + properties: { + addressPrefix: addressPrefix + delegations: delegations + } +} + +output subnetId string = subnet.id +output subnetName string = subnetName diff --git a/cli/azd/extensions/azure.ai.agents/schemas/examples/complex.azure.yaml b/cli/azd/extensions/azure.ai.agents/schemas/examples/complex.azure.yaml index 63878d29ffc..89ad8b2380d 100644 --- a/cli/azd/extensions/azure.ai.agents/schemas/examples/complex.azure.yaml +++ b/cli/azd/extensions/azure.ai.agents/schemas/examples/complex.azure.yaml @@ -12,6 +12,24 @@ metadata: services: ai: host: microsoft.foundry + # Private networking: provision a VNet-bound (network-secured) account. + # Omit this block for a public account. + network: + # Egress: inject the agent into a customer subnet (BYO VNet). Omit + # agentSubnet to use the Microsoft-managed network instead. + agentSubnet: + vnet: ${AZURE_VNET_ID} + name: agent-subnet + prefix: 192.168.0.0/24 + # Ingress: the account private endpoint (required). Establishes the + # private data plane (public network access disabled). + peSubnet: + vnet: ${AZURE_VNET_ID} + name: pe-subnet + prefix: 192.168.1.0/24 + dns: + resourceGroup: rg-private-dns + subscription: ${AZURE_DNS_SUBSCRIPTION_ID} deployments: - name: gpt-4o model: diff --git a/cli/azd/extensions/azure.ai.agents/schemas/microsoft.foundry.json b/cli/azd/extensions/azure.ai.agents/schemas/microsoft.foundry.json index 34a5d502a2c..d4ba81234bc 100644 --- a/cli/azd/extensions/azure.ai.agents/schemas/microsoft.foundry.json +++ b/cli/azd/extensions/azure.ai.agents/schemas/microsoft.foundry.json @@ -39,6 +39,64 @@ "type": "array", "description": "All agent definitions (hosted and prompt).", "items": { "$ref": "Agent.json" } + }, + "network": { + "type": "object", + "description": "Private networking for the Foundry account. When omitted, the account uses public networking. When present, azd always provisions an account private endpoint (the data plane is never left public) and uses platform-managed dependent stores. Ignored when 'endpoint' is set (brownfield).", + "additionalProperties": false, + "required": ["peSubnet"], + "properties": { + "agentSubnet": { + "description": "Egress: when set, the agent runtime is injected into this customer subnet (BYO VNet). When omitted, the agent uses the Microsoft-managed network.", + "$ref": "#/definitions/subnet" + }, + "isolationMode": { + "type": "string", + "description": "Outbound posture of the Microsoft-managed network. Valid only when 'agentSubnet' is omitted (managed egress).", + "enum": ["AllowInternetOutbound", "AllowOnlyApprovedOutbound"] + }, + "peSubnet": { + "description": "Ingress: subnet for the account private endpoint. Required. Establishes the private data plane (public network access disabled).", + "$ref": "#/definitions/subnet" + }, + "dns": { + "type": "object", + "description": "Private DNS zones for the account private endpoint. When omitted (or resourceGroup omitted), azd creates and links the required AI private DNS zones. When resourceGroup is set, azd references existing zones in that resource group.", + "additionalProperties": false, + "properties": { + "resourceGroup": { + "type": "string", + "description": "Resource group that holds existing private DNS zones to reference." + }, + "subscription": { + "type": "string", + "description": "Subscription that holds the existing private DNS zones. Defaults to the deployment subscription. Accepts a bare GUID or ${VAR}." + } + } + } + } + } + }, + "definitions": { + "subnet": { + "type": "object", + "description": "Subnet descriptor. vnet and name are required. Omit prefix to reference an existing subnet; set prefix to create the subnet with that CIDR.", + "additionalProperties": false, + "required": ["vnet", "name"], + "properties": { + "vnet": { + "type": "string", + "description": "ARM resource id of the virtual network that holds (or will hold) the subnet. Supports ${VAR} resolved from the azd environment." + }, + "name": { + "type": "string", + "description": "Subnet name." + }, + "prefix": { + "type": "string", + "description": "Subnet CIDR. When set, azd creates the subnet; when omitted, azd references the existing subnet." + } + } } } } diff --git a/cli/azd/extensions/azure.ai.agents/test/e2e/network/.gitignore b/cli/azd/extensions/azure.ai.agents/test/e2e/network/.gitignore new file mode 100644 index 00000000000..86ed0e4661a --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/test/e2e/network/.gitignore @@ -0,0 +1,2 @@ +# Transient E2E run artifacts (logs, per-run output) +azd-network-e2e-*/ diff --git a/cli/azd/extensions/azure.ai.agents/test/e2e/network/README.md b/cli/azd/extensions/azure.ai.agents/test/e2e/network/README.md new file mode 100644 index 00000000000..3ded17cbd97 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/test/e2e/network/README.md @@ -0,0 +1,166 @@ +# Foundry Private Networking — E2E Harness + +Real Azure end-to-end validation for `host: azure.ai.agent` private +networking (the `network:` block: BYO VNet, create/reference subnets, own/ +reference private DNS), plus the BYO-image agent lifecycle under a VNet. + +> **Cost & creds:** This harness creates real Azure resources and incurs cost. +> Per the repo guidance, run the authenticated job from an **Azure DevOps +> pipeline** (or locally with `azd auth login`), not a public GitHub workflow. + +## What it validates + +| Scenario | Path | How it's verified | Azure cost | +|---|---|---|---| +| 1. Declarative `network:` | bicep-less (in-memory synth) | `azd provision --preview` what-if shape gate **+** the real provision in phase 3 (same code path) | none extra (what-if) | +| 2. Eject + edit | on-disk template + provision-time `${VAR}` | eject → what-if "no changes" against the live account → manual `infra/` edit delta | none extra (reuses phase 3 account) | +| 3. BYO image under VNet | `deploy → invoke` on the provisioned account | `agent invoke` (gated `RUN_DEPLOY=true`) | none extra (reuses phase 3 account) | + +The whole matrix (subnet `create`/`reference` × DNS `own`/`reference`) is covered: +the `create+own` cell is the single real provision; the other three cells are +checked with what-if only. + +### `--image` is not required for phases 0–4 + +The project is **hand-authored** (an `azure.yaml` fixture with the foundry +service, `network:` block, and an agent entry using `image:`), so phases 0–4 +run against the current branch **without** the BYO-image init UX +(`azd ai agent init --image`). `image:` makes the synthesizer set +`includeAcr=false`, matching BYO image, so no ACR is created at provision. + +**Phase 5 (deploy + invoke) is gated behind `RUN_DEPLOY=true`** because it uses +the deploy-time pre-built-image short-circuit (`AZD_AGENT_SKIP_ACR` +consumption in `service_target_agent.go`). Without it, a headless `azd deploy` +defaults to "build" and fails for a BYO image. + +## Why it's cheap + +The long pole is creating a network-secured Cognitive Services account +(`publicNetworkAccess: Disabled` + private endpoint + DNS, ~8–15 min). The +harness creates that **once**. Everything else uses ARM what-if +(`azd provision --preview`), which creates nothing, and a shared BYO VNet that +is provisioned a single time and reused across cells. + +Ordering is fast-fail: local gates → cheap shared VNet → what-if matrix → the +single expensive provision → deploy/invoke → teardown. A broken +template/parameter fails in seconds, not after a 15-minute provision. + +## Prerequisites + +- `az` (logged in), `azd` with the `ai agent` extension available (for the eject + step `azd ai agent init --infra`), `jq`. +- A subscription with quota for a **westus** network-enabled Foundry account + (hard requirement). Other regions may be used if westus hits capacity for a + given resource — override `ACCOUNT_LOCATION`. +- A current `azd x` developer tool. Phase 0 refreshes the dev extension from the + **current source** (`azd x build` → `pack` → `publish` → `extension install`) + so the run tests your code, not a stale installed build. This registers the + `provisioning-provider` capability + the `microsoft.foundry` provider. If your + installed `azd x` is old (it silently drops the capability), rebuild it first; + otherwise `azd provision` fails with `extension does not support + provisioning-provider capability`. Set `SKIP_EXT_REFRESH=true` to reuse the + already-installed extension. +- For the gated deploy phase only (`RUN_DEPLOY=true`): use an ABAC-enabled ACR + image that is pullable by the Foundry project's managed identity. The harness + can build `~/agents/echo-dual` into an ABAC-enabled ACR with `BUILD_IMAGE=true`. + The build command intentionally uses caller authentication for source registry + access: + `az acr build ... --source-acr-auth-id [caller]`. +- The caller that queues the ACR Task must receive **`Container Registry + Repository Writer`** on the ABAC ACR so the build can push the image. The + harness grants this before running `az acr build --source-acr-auth-id [caller]`. +- The Foundry **project MI** must receive the ABAC-aware **`Container Registry + Repository Reader`** role (exact Azure role name; not the legacy `AcrPull`). + The account MI is not sufficient for hosted-agent image pull. The harness + grants this role in `grant_acr_pull`, sets `AZD_AGENT_SKIP_ACR=true` (the + BYO-image deploy signal), and writes `AZURE_TENANT_ID` for postdeploy. If the + registry requires a narrower ABAC condition, complete the grant manually and + re-run phase 5. +- Because the account is intentionally private (`publicNetworkAccess: Disabled`), + phase 5 deploy/invoke must run from a host that can resolve and reach the + private endpoint. Running from the public internet fails with `403 Public + access is disabled. Please configure private endpoint.` +- The harness captures that line-of-sight automatically (`lib-jumpbox.sh`): it + stands up a jumpbox VM and exposes it as a local SOCKS5 proxy, so `azd + deploy`/`invoke` still run on **this** host (the extension built from the + current branch, the existing azd env) with data-plane HTTPS tunneled into the + VNet. It prefers a VM **inside the foundry VNet** (reachability is structural: + the `dns=own` zones are already linked there); if that region has no VM + capacity it falls back to a **peered VNet** in another region (global peering + + the account FQDNs pinned to the PE IP in the VM's `/etc/hosts`). VM + creation loops over `JB_VM_SIZES` and `JB_FALLBACK_LOCATIONS` until an + allocation succeeds, and the SSH NSG opens then narrows to the client's + actual /24. `phase5-iter.sh` re-runs only phase 5 against a `KEEP=true` run. + +## Usage + +```bash +# from repo root, ensure the extension/CLI is built and on PATH first +export SUBSCRIPTION_ID= +export ACCOUNT_LOCATION=westus # hard requirement for the network account + +cli/azd/extensions/azure.ai.agents/test/e2e/network/run-network-e2e.sh +``` + +Phases 0–4 run by default (no deploy). To also run phase 5 and build the +`~/agents/echo-dual` image into an ABAC-enabled ACR: + +```bash +RUN_DEPLOY=true BUILD_IMAGE=true \ + cli/azd/extensions/azure.ai.agents/test/e2e/network/run-network-e2e.sh +``` + +For manual investigation, keep all created test resources in one RG and skip +teardown: + +```bash +RUN_DEPLOY=true BUILD_IMAGE=true KEEP=true TARGET_RG= \ + cli/azd/extensions/azure.ai.agents/test/e2e/network/run-network-e2e.sh +``` + +Useful overrides: + +| Var | Default | Purpose | +|---|---|---| +| `ACCOUNT_LOCATION` | `westus` | region of the network-enabled Foundry account | +| `RUN_DEPLOY` | `false` | `true` runs phase 5 (deploy + invoke) | +| `RUN_MANAGED_ISO` | `false` | `true` runs phase 3b (managed-egress `AllowOnlyApprovedOutbound` provision + `managedNetworks/default` assertion) | +| `MAX_PARALLEL` | `4` | concurrent what-if cells in phase 2 | +| `JB_VM_SIZES` | `D2as_v5 D2s_v5 B2s …` | jumpbox VM sizes tried in order until one allocates (phase 5) | +| `JB_FALLBACK_LOCATIONS` | `$CLIENT_LOCATION eastus2 …` | regions for a peered jumpbox when `ACCOUNT_LOCATION` has no VM capacity | +| `MAX_PHASE` | `6` | stop after phase N (e.g. `2` for the cheap VNet + what-if gates) | +| `SKIP_EXT_REFRESH` | `false` | `true` skips the phase-0 dev-extension rebuild/reinstall | +| `BUILD_IMAGE` | `false` | `true` builds `ECHO_DUAL_DIR` into an ABAC-enabled ACR before fixtures are generated | +| `ECHO_DUAL_DIR` | `~/agents/echo-dual` | source directory for the phase-5 agent image | +| `ACR_NAME` / `ACR_RG` | derived from `PREFIX` / VNet RG | target ACR used by `BUILD_IMAGE=true` | +| `IMAGE` | the echodual digest or built tag | BYO image (in `agent.yaml`); pulled only in phase 5 | +| `TARGET_RG` | unset | optional single RG for VNet, DNS, ACR, and the real Foundry env | +| `KEEP` | `false` | `true` skips teardown (inspect resources, then `azd down --purge` yourself) | +| `OUT_DIR` | `./azd-network-e2e-` | log directory | +| `RUN_ID` / `PREFIX` | timestamp | name uniqueness | + +## Logs + +All phases tee to `OUT_DIR/` (`00-context.txt`, `run.log`, `NN-*.log`, +`30-env-after-provision.txt`, `31-assert-resources.log`, `51-show.json`, +`52-invoke.log`). Share these for PR validation. + +## Cleanup + +Teardown runs on exit (unless `KEEP=true`): `azd down --force --purge` (purge is +required — otherwise the soft-deleted Cognitive account locks the name for ~48h) +and `az group delete` for the shared VNet/DNS resource groups. If a run is +interrupted, clean up manually: + +```bash +azd down --force --purge # from the project dir +az group delete -n -vnet-rg --yes +az group delete -n -dns-rg --yes +``` + +## Files + +- `run-network-e2e.sh` — orchestrator (phases 0–6). +- `assert-resources.sh` — live-topology `az` assertions (PE, DNS, delegation, + `publicNetworkAccess: Disabled`). +- `lib.sh` — shared logging / assertion / `azure.yaml` mutation helpers. diff --git a/cli/azd/extensions/azure.ai.agents/test/e2e/network/assert-resources.sh b/cli/azd/extensions/azure.ai.agents/test/e2e/network/assert-resources.sh new file mode 100755 index 00000000000..266a9a75330 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/test/e2e/network/assert-resources.sh @@ -0,0 +1,92 @@ +#!/usr/bin/env bash +# assert-resources.sh : verify the live topology of a provisioned +# network-secured Foundry account matches the spec. Sourced or run with the +# provisioned azd env active (cwd = project dir) and these vars exported: +# RG resource group of the Foundry account +# ACCOUNT_NAME Cognitive Services account name +# VNET_RG resource group holding the BYO vnet +# VNET_NAME BYO vnet name +# AGENT_SUBNET agent subnet name +# PE_SUBNET private-endpoint subnet name +# EXPECT_DNS_ZONES "own" | "reference" +# +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +set -Eeuo pipefail +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# shellcheck source=lib.sh +source "$SCRIPT_DIR/lib.sh" + +assert_account_private() { + local j + j="$(az cognitiveservices account show -g "$RG" -n "$ACCOUNT_NAME" -o json)" + assert_eq "$(jq -r '.properties.publicNetworkAccess' <<<"$j")" "Disabled" \ + "account publicNetworkAccess" + assert_eq "$(jq -r '.properties.networkAcls.defaultAction' <<<"$j")" "Deny" \ + "account networkAcls.defaultAction" + # bypass should allow Azure trusted services + assert_contains "$(jq -r '.properties.networkAcls.bypass // ""' <<<"$j")" \ + "AzureServices" "account networkAcls.bypass" +} + +assert_private_endpoint() { + local pes count peid groups + pes="$(az network private-endpoint list -g "$RG" -o json)" + count="$(jq '[.[] | select(.privateLinkServiceConnections[]?.privateLinkServiceId + | ascii_downcase | contains("/accounts/" + ($acct|ascii_downcase)))] + | length' --arg acct "$ACCOUNT_NAME" <<<"$pes")" + assert_ge "${count:-0}" 1 "account private endpoint count" + peid="$(jq -r '.[0].id' <<<"$pes")" + groups="$(az network private-endpoint show --ids "$peid" \ + --query 'privateLinkServiceConnections[0].groupIds' -o tsv 2>/dev/null || echo '')" + assert_contains "$groups" "account" "private endpoint groupIds" +} + +assert_subnet_delegation() { + local del + del="$(az network vnet subnet show -g "$VNET_RG" --vnet-name "$VNET_NAME" \ + -n "$AGENT_SUBNET" --query 'delegations[].serviceName' -o tsv 2>/dev/null || echo '')" + assert_contains "$del" "Microsoft.App/environments" "agent subnet delegation" +} + +assert_dns_zones() { + if [[ "${EXPECT_DNS_ZONES:-own}" == "own" ]]; then + local zones + zones="$(az network private-dns zone list -g "$RG" --query '[].name' -o tsv 2>/dev/null || echo '')" + assert_contains "$zones" "privatelink.services.ai.azure.com" "ai services dns zone" + assert_contains "$zones" "privatelink.openai.azure.com" "openai dns zone" + assert_contains "$zones" "privatelink.cognitiveservices.azure.com" "cognitive dns zone" + else + info "EXPECT_DNS_ZONES=reference: zones live in external RG; skipping in-RG check" + fi +} + +# Real BYO-egress signal read off the account resource itself (not azd's own +# echoed AZURE_FOUNDRY_NETWORK_MODE output): the account's agent network +# injection must reference the customer agent subnet, with the Microsoft-managed +# network disabled. The output-variable classification is covered separately by +# the synthesizer unit tests (wantMode cases). +assert_byo_network_injection() { + local j inj + j="$(az cognitiveservices account show -g "$RG" -n "$ACCOUNT_NAME" -o json)" + inj="$(jq -c '.properties.networkInjections[]? | select(.scenario=="agent")' <<<"$j")" + assert_contains "$(jq -r '.subnetArmId // ""' <<<"$inj")" \ + "/subnets/$AGENT_SUBNET" "account agent networkInjection subnet" + assert_eq "$(jq -r '.useMicrosoftManagedNetwork' <<<"$inj")" "false" \ + "account agent networkInjection useMicrosoftManagedNetwork" +} + +main() { + : "${RG:?}" "${ACCOUNT_NAME:?}" "${VNET_RG:?}" "${VNET_NAME:?}" "${AGENT_SUBNET:?}" + info "asserting live topology for account=$ACCOUNT_NAME rg=$RG" + assert_account_private + assert_private_endpoint + assert_subnet_delegation + assert_dns_zones + assert_byo_network_injection + info "ALL RESOURCE ASSERTIONS PASSED" +} + +# only run main when executed directly (not when sourced) +if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then main "$@"; fi diff --git a/cli/azd/extensions/azure.ai.agents/test/e2e/network/lib-jumpbox.sh b/cli/azd/extensions/azure.ai.agents/test/e2e/network/lib-jumpbox.sh new file mode 100644 index 00000000000..7010a7c33d8 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/test/e2e/network/lib-jumpbox.sh @@ -0,0 +1,229 @@ +# Jumpbox helpers for phase 5 (deploy + invoke) line-of-sight. +# +# The account data plane is private (publicNetworkAccess=Disabled), so a public +# host cannot reach the account FQDNs. Instead of running azd on a VM (which +# would mean replicating our extension build + managed-identity auth there), +# we stand up a jumpbox with line-of-sight and expose it as a local SOCKS5 +# proxy. azd deploy/invoke then run on THIS host -- using the extension we +# built from the current branch and the existing azd env -- while data-plane +# HTTPS to the private FQDNs is tunneled through the jumpbox (SOCKS5 remote DNS, +# so the jumpbox resolves the privatelink names to the PE IP). +# +# Reachability is captured two ways: +# - Preferred: VM inside the foundry VNet (ACCOUNT_LOCATION). The dns=own +# privatelink zones are already linked to that VNet and routing is +# intra-VNet, so remote DNS + routing work with no extra wiring. +# - Fallback (ACCOUNT_LOCATION has no VM capacity): VM in a peered VNet in one +# of JB_FALLBACK_LOCATIONS. We global-peer it to the foundry VNet and pin +# the account FQDNs to the PE IP in the VM's /etc/hosts (the peered VNet is +# not linked to the private DNS zones, so resolution is done via hosts). +# +# VM capacity is frequently restricted per region+size, so VM creation loops +# over JB_VM_SIZES (and, in the fallback, over JB_FALLBACK_LOCATIONS) until an +# allocation succeeds. A NIC is created once per region and reused across size +# attempts so failed allocations do not orphan public IPs/NICs. +# +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +# shellcheck shell=bash + +# State (set during the run). +JB_LOCATION="" # resolved region (ACCOUNT_LOCATION or a fallback) +JB_HOST="" # public IP +JB_SSH_KEY="" # private key path +JB_SSH_PID="" # SOCKS tunnel pid +JB_RG="" # resource group holding the VM +JB_VNET="" # vnet holding the VM (foundry vnet, or a fallback vnet) + +# jumpbox_init : resolve config defaults that depend on run-time vars (PREFIX, +# CLIENT_LOCATION). Called at the start of jumpbox_up so sourcing this file does +# not require those vars to exist yet (set -u safe). +jumpbox_init() { + JB_VM_NAME="${JB_VM_NAME:-${PREFIX}-jb}" + # Capacity-restricted SKUs are common; try a spread until one allocates. + JB_VM_SIZES="${JB_VM_SIZES:-Standard_D2as_v5 Standard_D2s_v5 Standard_B2s Standard_B2ms Standard_DS1_v2 Standard_A2_v2}" + JB_SUBNET_NAME="${JB_SUBNET_NAME:-jumpbox-subnet}" + JB_SUBNET_CIDR="${JB_SUBNET_CIDR:-192.168.50.0/24}" # inside the foundry VNet + JB_FALLBACK_LOCATIONS="${JB_FALLBACK_LOCATIONS:-$CLIENT_LOCATION eastus2 westus2 westus3 centralus}" + JB_FALLBACK_VNET_CIDR="${JB_FALLBACK_VNET_CIDR:-10.60.0.0/16}" + JB_FALLBACK_SUBNET_CIDR="${JB_FALLBACK_SUBNET_CIDR:-10.60.0.0/24}" + JB_SOCKS_PORT="${JB_SOCKS_PORT:-1080}" + JB_ADMIN="${JB_ADMIN:-azureuser}" +} + +# jumpbox_ensure_net +# Idempotently ensure a regional NSG (SSH open initially -- this host's +# Azure-facing NAT IP is not reliably known up front and may differ from a +# public echo service; jumpbox_narrow_nsg tightens it once SSH is up), and the +# jumpbox subnet (associated with the NSG). Sets JB_NSG. +jumpbox_ensure_net() { + local rg="$1" loc="$2" vnet="$3" vnet_cidr="$4" subnet="$5" subnet_cidr="$6" create_vnet="$7" + JB_NSG="${JB_VM_NAME}-${loc}-nsg" + if ! az network nsg show -g "$rg" -n "$JB_NSG" >/dev/null 2>&1; then + run_retry 3 "jb-nsg-$loc" az network nsg create -g "$rg" -n "$JB_NSG" -l "$loc" + run_retry 3 "jb-nsg-ssh-$loc" az network nsg rule create -g "$rg" --nsg-name "$JB_NSG" \ + -n allow-ssh --priority 1000 --access Allow --protocol Tcp --direction Inbound \ + --destination-port-ranges 22 --source-address-prefixes Internet + fi + if [[ "$create_vnet" == true ]] && ! az network vnet show -g "$rg" -n "$vnet" >/dev/null 2>&1; then + run_retry 3 "jb-vnet-$loc" az network vnet create -g "$rg" -n "$vnet" \ + --address-prefixes "$vnet_cidr" -l "$loc" + fi + if ! az network vnet subnet show -g "$rg" --vnet-name "$vnet" -n "$subnet" >/dev/null 2>&1; then + run_retry 3 "jb-subnet-$loc" az network vnet subnet create -g "$rg" --vnet-name "$vnet" \ + -n "$subnet" --address-prefixes "$subnet_cidr" --network-security-group "$JB_NSG" + fi +} + +# jumpbox_vm_sizeloop : create a NIC (reused across +# attempts) and try JB_VM_SIZES until one allocates. Returns 0 on success. +jumpbox_vm_sizeloop() { + local rg="$1" loc="$2" vnet="$3" subnet="$4" + local pip="${JB_VM_NAME}-${loc}-pip" nic="${JB_VM_NAME}-${loc}-nic" size + az network public-ip show -g "$rg" -n "$pip" >/dev/null 2>&1 || \ + run_retry 3 "jb-pip-$loc" az network public-ip create -g "$rg" -n "$pip" --sku Standard -l "$loc" + az network nic show -g "$rg" -n "$nic" >/dev/null 2>&1 || \ + run_retry 3 "jb-nic-$loc" az network nic create -g "$rg" -n "$nic" -l "$loc" \ + --vnet-name "$vnet" --subnet "$subnet" --public-ip-address "$pip" + for size in $JB_VM_SIZES; do + if STEP_TIMEOUT=600 run_capture "jb-vm-${loc}-${size}" az vm create -g "$rg" -n "$JB_VM_NAME" \ + --image Ubuntu2204 --size "$size" --location "$loc" --nics "$nic" \ + --admin-username "$JB_ADMIN" --ssh-key-values "${JB_SSH_KEY}.pub"; then + JB_VM_SIZE="$size"; info "jumpbox: VM created in $loc as $size"; return 0 + fi + warn "jumpbox: $size unavailable in $loc; trying next size" + az vm delete -g "$rg" -n "$JB_VM_NAME" --yes 2>/dev/null || true # clear any partial VM + done + return 1 +} + +# jumpbox_peer_and_pin : global-peer the fallback +# VNet to the foundry VNet, then pin the account FQDNs to the PE IP in the VM's +# /etc/hosts (the peered VNet is not linked to the private DNS zones). +jumpbox_peer_and_pin() { + local fvnet="$1" jbvnet="$2" + run_capture "jb-peer-out" az network vnet peering create -g "$VNET_RG" \ + --vnet-name "$VNET_NAME" -n "to-jb" --remote-vnet "$jbvnet" \ + --allow-vnet-access --allow-forwarded-traffic + run_capture "jb-peer-in" az network vnet peering create -g "$JB_RG" \ + --vnet-name "$JB_VNET" -n "to-foundry" --remote-vnet "$fvnet" \ + --allow-vnet-access --allow-forwarded-traffic + + # resolve the account PE private IP and derive the three account FQDNs. The + # PE customDnsConfigs are often empty and the dns=own zones live in the + # account RG, so derive the public FQDNs from the account's custom subdomain + # (defaults to the account name) and pin them to the PE IP. + local pe_name pe_ip nic_id sub fqdns hosts_line + pe_name="$(az network private-endpoint list -g "$RG" --query "[0].name" -o tsv 2>/dev/null)" + nic_id="$(az network private-endpoint show -g "$RG" -n "$pe_name" \ + --query "networkInterfaces[0].id" -o tsv 2>/dev/null)" + pe_ip="$(az network nic show --ids "$nic_id" \ + --query "ipConfigurations[0].privateIPAddress" -o tsv 2>/dev/null)" + sub="$(az cognitiveservices account show -g "$RG" -n "$ACCOUNT_NAME" \ + --query "properties.customSubDomainName" -o tsv 2>/dev/null)" + [[ -z "$sub" ]] && sub="$ACCOUNT_NAME" + fqdns="${sub}.services.ai.azure.com ${sub}.openai.azure.com ${sub}.cognitiveservices.azure.com" + info "jumpbox: pinning PE_IP=$pe_ip for FQDNs: $fqdns" + [[ -z "$pe_ip" ]] && { warn "could not resolve PE IP; /etc/hosts pin skipped"; return 0; } + hosts_line="$pe_ip $fqdns" + jumpbox_ssh "echo '$hosts_line' | sudo tee -a /etc/hosts >/dev/null" +} + +# jumpbox_ssh : run a command on the jumpbox (capped). +jumpbox_ssh() { + timeout 60 ssh -i "$JB_SSH_KEY" -o StrictHostKeyChecking=no \ + -o UserKnownHostsFile=/dev/null -o ConnectTimeout=15 \ + "${JB_ADMIN}@${JB_HOST}" "$@" +} + +# jumpbox_wait_ssh : block until SSH answers (capped ~3 min). +jumpbox_wait_ssh() { + local i + for i in $(seq 1 18); do + if jumpbox_ssh true 2>/dev/null; then info "jumpbox SSH reachable"; return 0; fi + sleep 10 + done + die "jumpbox SSH not reachable at $JB_HOST after ~3m" +} + +# jumpbox_narrow_nsg : once SSH works, tighten the allow-ssh rule from Internet +# to the /24 of the client IP the jumpbox actually sees (the Azure-facing NAT +# may differ from a public echo IP, and the pool can rotate within a /24). +jumpbox_narrow_nsg() { + local realip cidr + realip="$(jumpbox_ssh 'echo $SSH_CONNECTION' 2>/dev/null | awk '{print $1}')" + [[ "$realip" =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]] \ + || { warn "jumpbox: could not detect client IP; SSH left open to Internet"; return 0; } + cidr="$(awk -F. '{print $1"."$2"."$3".0/24"}' <<<"$realip")" + run_capture "jb-nsg-narrow" az network nsg rule update -g "$JB_RG" --nsg-name "$JB_NSG" \ + -n allow-ssh --source-address-prefixes "$cidr" || true + info "jumpbox: narrowed SSH to $cidr (client $realip)" +} + +# jumpbox_up : stand up the jumpbox with line-of-sight. Sets JB_HOST. +jumpbox_up() { + info "### jumpbox: provisioning line-of-sight VM" + jumpbox_init + JB_SSH_KEY="$WORK_DIR/jb_id_ed25519" + [[ -f "$JB_SSH_KEY" ]] || ssh-keygen -t ed25519 -N "" -f "$JB_SSH_KEY" -q + + # Preferred: VM inside the foundry VNet (line-of-sight is structural). + jumpbox_ensure_net "$VNET_RG" "$ACCOUNT_LOCATION" "$VNET_NAME" "" \ + "$JB_SUBNET_NAME" "$JB_SUBNET_CIDR" false + if jumpbox_vm_sizeloop "$VNET_RG" "$ACCOUNT_LOCATION" "$VNET_NAME" "$JB_SUBNET_NAME"; then + JB_LOCATION="$ACCOUNT_LOCATION"; JB_RG="$VNET_RG"; JB_VNET="$VNET_NAME" + else + warn "jumpbox: no VM capacity in $ACCOUNT_LOCATION; trying peered fallback regions: $JB_FALLBACK_LOCATIONS" + JB_RG="${PREFIX}-jb-rg" + local rgok=false loc jbvnet + for loc in $JB_FALLBACK_LOCATIONS; do + [[ "$loc" == "$ACCOUNT_LOCATION" ]] && continue + jbvnet="${PREFIX}-jb-${loc}-vnet" + [[ "$rgok" == false ]] && { run_capture "jb-rg" az group create -n "$JB_RG" -l "$loc"; rgok=true; } + jumpbox_ensure_net "$JB_RG" "$loc" "$jbvnet" "$JB_FALLBACK_VNET_CIDR" \ + "$JB_SUBNET_NAME" "$JB_FALLBACK_SUBNET_CIDR" true + if jumpbox_vm_sizeloop "$JB_RG" "$loc" "$jbvnet" "$JB_SUBNET_NAME"; then + JB_LOCATION="$loc"; JB_VNET="$jbvnet" + jumpbox_peer_and_pin "$VNET_ID" \ + "$(az network vnet show -g "$JB_RG" -n "$jbvnet" --query id -o tsv)" + break + fi + warn "jumpbox: no VM capacity in $loc" + done + [[ -n "${JB_LOCATION:-}" && "$JB_LOCATION" != "$ACCOUNT_LOCATION" ]] \ + || die "jumpbox: no VM capacity in $ACCOUNT_LOCATION or fallback regions ($JB_FALLBACK_LOCATIONS)" + fi + + JB_HOST="$(az vm show -d -g "$JB_RG" -n "$JB_VM_NAME" --query publicIps -o tsv)" + info "jumpbox: $JB_VM_NAME up in $JB_LOCATION at $JB_HOST" + jumpbox_wait_ssh + jumpbox_narrow_nsg +} + +# jumpbox_socks_open : open a background SSH SOCKS5 tunnel on localhost. +jumpbox_socks_open() { + jumpbox_socks_close + timeout 30 ssh -i "$JB_SSH_KEY" -o StrictHostKeyChecking=no \ + -o UserKnownHostsFile=/dev/null -o ConnectTimeout=15 -o ExitOnForwardFailure=yes \ + -fN -D "127.0.0.1:${JB_SOCKS_PORT}" "${JB_ADMIN}@${JB_HOST}" + JB_SSH_PID="$(pgrep -f "ssh.*-D 127.0.0.1:${JB_SOCKS_PORT}.*${JB_HOST}" | head -1)" + [[ -n "$JB_SSH_PID" ]] || die "jumpbox: SOCKS tunnel failed to start" + info "jumpbox: SOCKS5 proxy on 127.0.0.1:${JB_SOCKS_PORT} (pid $JB_SSH_PID)" +} + +jumpbox_socks_close() { + local port="${JB_SOCKS_PORT:-1080}" + [[ -n "$JB_SSH_PID" ]] && kill "$JB_SSH_PID" 2>/dev/null || true + pkill -f "ssh.*-D 127.0.0.1:${port}.*${JB_HOST:-_none_}" 2>/dev/null || true + JB_SSH_PID="" +} + +# jumpbox_down : close the tunnel and delete the fallback RG (the in-VNet VM is +# removed with VNET_RG at teardown). +jumpbox_down() { + jumpbox_socks_close + if [[ -n "${JB_RG:-}" && "${JB_RG:-}" != "${VNET_RG:-}" ]]; then + run_capture "jb-del-rg" az group delete -n "$JB_RG" --yes --no-wait || true + fi +} diff --git a/cli/azd/extensions/azure.ai.agents/test/e2e/network/lib.sh b/cli/azd/extensions/azure.ai.agents/test/e2e/network/lib.sh new file mode 100644 index 00000000000..3eee459c728 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/test/e2e/network/lib.sh @@ -0,0 +1,112 @@ +# Shared helpers for the Foundry private-networking E2E harness. +# Sourced by run-network-e2e.sh and assert-resources.sh; not executed directly. +# +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +# shellcheck shell=bash + +set -Eeuo pipefail + +# --- logging ----------------------------------------------------------------- + +log() { printf '%s %s\n' "$(date -Is)" "$*" | tee -a "${RUN_LOG:-/dev/null}"; } +info() { log "[info ] $*"; } +warn() { log "[warn ] $*" >&2; } +die() { log "[fatal] $*" >&2; exit 1; } + +# run_capture : run a command, tee stdout+stderr+timing to +# $OUT_DIR/.log, and still propagate failures. The command is time-capped +# (default STEP_TIMEOUT seconds, SIGKILL 30s after) so a silently-stuck Azure +# operation fails fast instead of hanging the run. Override per call with +# `STEP_TIMEOUT= run_capture ...`. +run_capture() { + local name="$1"; shift + local f="$OUT_DIR/$name.log" + local t="${STEP_TIMEOUT:-1200}" + info "==> $name: $* (timeout ${t}s)" + local rc=0 + { time timeout -k 30 "$t" "$@"; } >"$f" 2>&1 || rc=$? + if (( rc == 124 || rc == 137 )); then + warn "$name TIMED OUT after ${t}s (rc=$rc; see $f)"; tail -n 40 "$f" >&2 || true; return 1 + elif (( rc != 0 )); then + warn "$name FAILED (rc=$rc; see $f)"; tail -n 40 "$f" >&2 || true; return 1 + fi + info "<== $name ok" +} + +# run_retry : run_capture with retries, for ARM +# eventual-consistency transients (e.g. a create racing resource-group +# propagation). Backs off 10s between attempts. +run_retry() { + local n="$1" name="$2"; shift 2 + local i + for i in $(seq 1 "$n"); do + if run_capture "$([[ $i -gt 1 ]] && echo "${name}-try$i" || echo "$name")" "$@"; then + return 0 + fi + (( i < n )) && { warn "$name attempt $i/$n failed; retrying in 10s"; sleep 10; } + done + return 1 +} + +# --- assertions -------------------------------------------------------------- + +assert_eq() { # + if [[ "$1" != "$2" ]]; then die "ASSERT $3: expected [$2] got [$1]"; fi + info "ok: $3 == $2" +} + +assert_contains() { # + if [[ "$1" != *"$2"* ]]; then die "ASSERT $3: [$2] not found"; fi + info "ok: $3 contains $2" +} + +assert_ge() { # + if (( $1 < $2 )); then die "ASSERT $3: expected >= $2 got $1"; fi + info "ok: $3 ($1) >= $2" +} + +# --- preflight --------------------------------------------------------------- + +require_tools() { + local t + for t in az azd jq; do command -v "$t" >/dev/null || die "missing required tool: $t"; done + az account show >/dev/null 2>&1 || die "run 'az login' first" + azd auth login --check-status >/dev/null 2>&1 || die "run 'azd auth login' first" + # The 'ai agent' command group must be available (the eject step uses + # `azd ai agent init --infra`). + azd ai agent --help >/dev/null 2>&1 || die "azd 'ai agent' extension not available" +} + +# --- azure.yaml mutation ----------------------------------------------------- + +# inject_network_block : insert a network: block immediately +# after the foundry service's `host: azure.ai.agent` line, using the indentation +# that azd init emits (4 spaces under the service key). The block body is read +# from stdin and re-indented to 6 spaces. +inject_network_block() { + local file="$1" tmp + tmp="$(mktemp)" + local block + block="$(sed 's/^/ /')" # 6-space indent for keys under ` network:` + awk -v blk="$block" ' + /^[[:space:]]+host:[[:space:]]+azure\.ai\.agent[[:space:]]*$/ { + print + print " network:" + print blk + next + } + { print } + ' "$file" >"$tmp" + mv "$tmp" "$file" +} + +# --- azd what-if parsing ----------------------------------------------------- + +# whatif_json : run `azd provision --preview` and capture structured +# output. azd does not emit machine JSON for preview, so we keep the text log +# and grep it; callers assert on substrings. +preview_capture() { # + run_capture "$1" azd provision --preview --no-prompt +} diff --git a/cli/azd/extensions/azure.ai.agents/test/e2e/network/phase5-iter.sh b/cli/azd/extensions/azure.ai.agents/test/e2e/network/phase5-iter.sh new file mode 100644 index 00000000000..fd360171726 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/test/e2e/network/phase5-iter.sh @@ -0,0 +1,31 @@ +#!/usr/bin/env bash +# Phase-5 iteration driver: reuse the resources a KEEP=true run already +# provisioned, and exercise only the jumpbox + deploy + invoke path. Not part +# of the harness; a local debugging aid. +set -Eeuo pipefail + +export RUN_ID="${RUN_ID:?set RUN_ID of the kept run, e.g. live-0622-165947}" +export SUBSCRIPTION_ID="${SUBSCRIPTION_ID:-1756abc0-3554-4341-8d6a-46674962ea19}" +export ACCOUNT_LOCATION="${ACCOUNT_LOCATION:-westus}" +export CLIENT_LOCATION="${CLIENT_LOCATION:-eastus}" +export WORK_DIR="${WORK_DIR:?set WORK_DIR of the kept run}" +export OUT_DIR="${OUT_DIR:-/tmp/azdnet-p5-$RUN_ID}" +export RUN_DEPLOY=true KEEP=true NO_COLOR=1 +mkdir -p "$OUT_DIR" + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# shellcheck source=run-network-e2e.sh +source "$SCRIPT_DIR/run-network-e2e.sh" # functions + vars, no main (guarded) + +RUN_LOG="$OUT_DIR/run.log" +SUBSCRIPTION_ID="$(az account show --query id -o tsv)" + +# Re-bind the runtime vars a normal run sets in phases 1/3. +VNET_ID="$(az network vnet show -g "$VNET_RG" -n "$VNET_NAME" --query id -o tsv)" +REAL_DIR="$WORK_DIR/real" +RG="$(cd "$REAL_DIR" && azd env get-value AZURE_RESOURCE_GROUP)" +ACCOUNT_NAME="$(cd "$REAL_DIR" && azd env get-value AZURE_AI_ACCOUNT_NAME)" +info "phase5-iter: VNET_ID=$VNET_ID RG=$RG ACCOUNT=$ACCOUNT_NAME REAL_DIR=$REAL_DIR" + +phase5_deploy_invoke +info "phase5-iter complete; logs in $OUT_DIR" diff --git a/cli/azd/extensions/azure.ai.agents/test/e2e/network/run-network-e2e.sh b/cli/azd/extensions/azure.ai.agents/test/e2e/network/run-network-e2e.sh new file mode 100755 index 00000000000..8089e516062 --- /dev/null +++ b/cli/azd/extensions/azure.ai.agents/test/e2e/network/run-network-e2e.sh @@ -0,0 +1,566 @@ +#!/usr/bin/env bash +# run-network-e2e.sh : end-to-end validation of Foundry private networking for +# `host: azure.ai.agent`, optimized for minimal Azure resource-operation time. +# +# Strategy (see README.md for the cost rationale): +# - ONE real network account is provisioned (the create+own matrix cell). +# - The other matrix cells and the bicep-less vs eject code paths are verified +# with `azd provision --preview` (ARM what-if) which creates nothing. +# - A shared BYO VNet (+ optional pre-created subnets / DNS zones) is created +# once and reused across cells. +# +# Phases 0-4 validate all the *networking* code and do NOT require the BYO-image +# init UX (`azd ai agent init --image`). The project is hand-authored +# (azure.yaml fixture), so it runs against the current branch today. Phase 5 +# (deploy + invoke the BYO image under the VNet) uses the deploy-time pre-built +# image short-circuit and is gated behind RUN_DEPLOY=true. +# +# Phases: +# 0 local gates (no Azure) +# 1 shared infra create RG(s) + VNet (+ reference subnets/zones) +# 2 what-if matrix bicep-less shape for all cells (no creation) +# 3 real provision create+own cell (Scenario 1 + network topology) +# 4 eject idempotency eject -> what-if "no changes" + edit delta (Scenario 2) +# 5 deploy + invoke agent under the VNet (Scenario 3) -- RUN_DEPLOY=true +# 6 teardown azd down --purge + delete shared RG(s) +# +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +set -Eeuo pipefail +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# shellcheck source=lib.sh +source "$SCRIPT_DIR/lib.sh" +# shellcheck source=lib-jumpbox.sh +source "$SCRIPT_DIR/lib-jumpbox.sh" + +# --- configuration (override via env) ---------------------------------------- + +# Region constraints (from the test plan): +# - The network-enabled Foundry account, its VNet, DNS zones, and ACR MUST be +# in westus: Foundry VNet injection is only supported there. Everything this +# harness creates lives in ACCOUNT_LOCATION. +# - westus is frequently out of VM capacity. The harness itself creates no VM, +# but the gated deploy+invoke phase needs VNet line-of-sight to the account +# private endpoint. If you stand up a jumpbox for that, put it in a +# capacity-available region (CLIENT_LOCATION, default eastus) in its own VNet +# and global-peer it to the westus VNet (+ link the private DNS zones to it). +ACCOUNT_LOCATION="${ACCOUNT_LOCATION:-westus}" +CLIENT_LOCATION="${CLIENT_LOCATION:-eastus}" + +# Bound on concurrent what-if cells in phase 2 (independent, create nothing). +MAX_PARALLEL="${MAX_PARALLEL:-4}" + +RUN_ID="${RUN_ID:-$(date +%Y%m%d-%H%M%S)}" +PREFIX="${PREFIX:-azdnet${RUN_ID//-/}}" +PREFIX="${PREFIX:0:18}" # keep within name limits + +# BYO image, written into agent.yaml. Only pulled during the gated deploy phase +# (RUN_DEPLOY=true); the Foundry project MI is granted repository read on this +# RBAC+ABAC registry then. NOTE: this digest can be garbage-collected when the +# fixture image is rebuilt (a stale digest fails deploy with [ImageError] +# "Container image tag not found"). Refresh it, or use BUILD_IMAGE=true to build +# and push a fresh image from $ECHO_DUAL_DIR. +IMAGE="${IMAGE:-1756abcawemengncus3a16acr.azurecr.io/echodual@sha256:7d5009a3008258c242a1602dd1749926875b9810a7954c9b16d86ae5fecaff8a}" + +# BUILD_IMAGE=true builds ~/agents/echo-dual into an ABAC-enabled ACR before any +# project fixture is generated, then rewrites IMAGE to the pushed tag. +BUILD_IMAGE="${BUILD_IMAGE:-false}" +ECHO_DUAL_DIR="${ECHO_DUAL_DIR:-$HOME/agents/echo-dual}" +IMAGE_REPO="${IMAGE_REPO:-network-e2e/echo-dual}" +IMAGE_TAG="${IMAGE_TAG:-$RUN_ID}" +ACR_SKU="${ACR_SKU:-Basic}" + +# Phase 5 (deploy + invoke) uses the BYO-image deploy short-circuit. Off by +# default so phases 0-4 run against the current branch today. +RUN_DEPLOY="${RUN_DEPLOY:-false}" + +# TARGET_RG lets investigation runs keep all test resources in a single RG. +# By default, keep the matrix-style split RGs for isolation/readability. +TARGET_RG="${TARGET_RG:-}" +VNET_RG="${VNET_RG:-${TARGET_RG:-${PREFIX}-vnet-rg}}" +DNS_RG="${DNS_RG:-${TARGET_RG:-${PREFIX}-dns-rg}}" # external zones for the reference cells +VNET_NAME="${VNET_NAME:-${PREFIX}-vnet}" +# Dedicated VNet for the managed-iso cell (phase 3b). A dns=own account links +# its VNet to the AI privatelink zones, and a VNet may hold only one link per +# zone namespace; the phase-3 account already owns the shared VNet's links, so +# phase 3b needs its own VNet. Same address space is fine (the two VNets are +# never peered). +ISO_VNET_NAME="${ISO_VNET_NAME:-${PREFIX}-iso-vnet}" +VNET_CIDR="${VNET_CIDR:-192.168.0.0/16}" +DEFAULT_ACR_NAME="$(printf '%sacr' "$PREFIX" | tr -cd '[:alnum:]' | tr '[:upper:]' '[:lower:]')" +DEFAULT_ACR_NAME="${DEFAULT_ACR_NAME:0:50}" +ACR_RG="${ACR_RG:-${TARGET_RG:-$VNET_RG}}" +ACR_NAME="${ACR_NAME:-$DEFAULT_ACR_NAME}" + +# create-mode subnets are created by the template (must NOT pre-exist); +# reference-mode subnets are pre-created here. +AGENT_SUBNET_CREATE="${AGENT_SUBNET_CREATE:-agent-subnet}" +PE_SUBNET_CREATE="${PE_SUBNET_CREATE:-pe-subnet}" +AGENT_SUBNET_REF="${AGENT_SUBNET_REF:-ref-agent-subnet}" +PE_SUBNET_REF="${PE_SUBNET_REF:-ref-pe-subnet}" + +AGENT_NAME="${AGENT_NAME:-netagent}" +WORK_DIR="${WORK_DIR:-$(mktemp -d)}" +OUT_DIR="${OUT_DIR:-$(pwd)/azd-network-e2e-$RUN_ID}" +KEEP="${KEEP:-false}" # KEEP=true skips teardown + +export DOTNET_SYSTEM_GLOBALIZATION_INVARIANT="${DOTNET_SYSTEM_GLOBALIZATION_INVARIANT:-1}" +export NO_COLOR=1 + +mkdir -p "$OUT_DIR" +RUN_LOG="$OUT_DIR/run.log" +VNET_ID="" # set in phase 1 + +# --- helpers ----------------------------------------------------------------- + +# write a network: block for a given matrix cell into $1 (azure.yaml). +# args: [iso] +# egress=byo -> inject the agent into agentSubnet (BYO egress) +# egress=managed -> omit agentSubnet (Microsoft-managed egress); [iso] sets isolationMode +# peSubnet is always written (required: the account data plane is never public). +write_network_block() { + local file="$1" egress="$2" subnet_mode="$3" dns_mode="$4" iso="${5:-}" + local agent pe + if [[ "$subnet_mode" == "create" ]]; then + agent="$AGENT_SUBNET_CREATE"; pe="$PE_SUBNET_CREATE" + else + agent="$AGENT_SUBNET_REF"; pe="$PE_SUBNET_REF" + fi + { + if [[ "$egress" == "byo" ]]; then + echo "agentSubnet:" + echo " vnet: \${AZURE_VNET_ID}" + echo " name: $agent" + [[ "$subnet_mode" == "create" ]] && echo " prefix: 192.168.10.0/24" + elif [[ -n "$iso" ]]; then + echo "isolationMode: $iso" + fi + echo "peSubnet:" + echo " vnet: \${AZURE_VNET_ID}" + echo " name: $pe" + [[ "$subnet_mode" == "create" ]] && echo " prefix: 192.168.11.0/24" + if [[ "$dns_mode" == "reference" ]]; then + echo "dns:" + echo " resourceGroup: $DNS_RG" + echo " subscription: \${AZURE_DNS_SUBSCRIPTION_ID}" + fi + } | inject_network_block "$file" +} + +# write a hand-authored azure.yaml fixture for a matrix cell into a fresh +# project dir and create its azd environment. No `azd ai agent init --image`: +# phases 0-4 do not need the BYO-image init UX. The agent entry uses +# `image:` (so the synthesizer sets includeAcr=false, matching BYO image). +# args: [iso] +setup_project() { + local name="$1" egress="$2" subnet_mode="$3" dns_mode="$4" iso="${5:-}" + PROJECT_DIR="$WORK_DIR/$name" + rm -rf "${PROJECT_DIR:?}"; mkdir -p "$PROJECT_DIR" + cat >"$PROJECT_DIR/azure.yaml" <//agent.yaml; no project: => project root). + # kind: hosted + image: => BYO pre-built image (no ACR build). + cat >"$PROJECT_DIR/agent.yaml" </dev/null + azd env set AZURE_TENANT_ID "$(az account show --query tenantId -o tsv)" >/dev/null + azd env set AZURE_VNET_ID "$VNET_ID" >/dev/null + azd env set AZURE_DNS_SUBSCRIPTION_ID "$SUBSCRIPTION_ID" >/dev/null + # BYO pre-built image: skip ACR build at provision AND deploy. Without this + # the headless deploy defaults to "build" (no Dockerfile) and fails. Mirrors + # what `azd ai agent init --image` persists. + azd env set AZD_AGENT_SKIP_ACR true >/dev/null + ) +} + +# --- phase 0: local gates ---------------------------------------------------- + +phase0_local_gates() { + info "### phase 0: local gates (no Azure)" + run_capture "00-azd-version" azd version + run_capture "00-go-build" bash -c "cd '$SCRIPT_DIR/../../..' && go build ./..." + # Refresh the dev extension from the CURRENT source so the run tests our code, + # not a stale installed build. build (binary) -> pack -> publish (registers + # capabilities incl. provisioning-provider + the microsoft.foundry provider) + # -> install from the local source. Requires an up-to-date `azd x` tool. + if [[ "${SKIP_EXT_REFRESH:-false}" != "true" ]]; then + ( cd "$SCRIPT_DIR/../../.." + azd extension uninstall azure.ai.agents >/dev/null 2>&1 || true + run_capture "01-ext-build" azd x build + run_capture "01-ext-pack" azd x pack + run_capture "01-ext-publish" azd x publish + run_capture "01-ext-install" azd extension install azure.ai.agents --source local + ) + else + warn "SKIP_EXT_REFRESH=true: using the already-installed azure.ai.agents extension" + fi +} + +# --- phase 1: shared infra --------------------------------------------------- + +phase1_shared_infra() { + info "### phase 1: shared BYO infra" + run_capture "10-rg-vnet" az group create -n "$VNET_RG" -l "$ACCOUNT_LOCATION" + run_capture "10-vnet" az network vnet create -g "$VNET_RG" -n "$VNET_NAME" \ + --address-prefixes "$VNET_CIDR" -l "$ACCOUNT_LOCATION" + VNET_ID="$(az network vnet show -g "$VNET_RG" -n "$VNET_NAME" --query id -o tsv)" + info "VNET_ID=$VNET_ID" + + # reference-mode subnets (pre-created so the template can reference them). + run_capture "11-ref-pe-subnet" az network vnet subnet create -g "$VNET_RG" \ + --vnet-name "$VNET_NAME" -n "$PE_SUBNET_REF" --address-prefixes 192.168.20.0/24 + run_capture "11-ref-agent-subnet" az network vnet subnet create -g "$VNET_RG" \ + --vnet-name "$VNET_NAME" -n "$AGENT_SUBNET_REF" --address-prefixes 192.168.21.0/24 \ + --delegations Microsoft.App/environments + + # external DNS zones (for the dns=reference cells). + run_capture "12-dns-rg" az group create -n "$DNS_RG" -l "$ACCOUNT_LOCATION" + local z + for z in privatelink.services.ai.azure.com privatelink.openai.azure.com \ + privatelink.cognitiveservices.azure.com; do + # idempotent: private-dns zone create errors if the zone already exists. + if az network private-dns zone show -g "$DNS_RG" -n "$z" >/dev/null 2>&1; then + info "dns zone $z already exists; reusing" + else + run_capture "12-zone-${z//./_}" az network private-dns zone create -g "$DNS_RG" -n "$z" + fi + done +} + +# --- optional image build ---------------------------------------------------- + +build_byo_image() { + if [[ "$BUILD_IMAGE" != "true" ]]; then + return 0 + fi + + info "### image build: ABAC-enabled ACR + echo-dual" + if [[ ! -f "$ECHO_DUAL_DIR/Dockerfile" ]]; then + fatal "ECHO_DUAL_DIR does not contain a Dockerfile: $ECHO_DUAL_DIR" + fi + + run_capture "13-rg-acr" az group create -n "$ACR_RG" -l "$ACCOUNT_LOCATION" + if az acr show -n "$ACR_NAME" >/dev/null 2>&1; then + local mode + mode="$(az acr show -n "$ACR_NAME" --query roleAssignmentMode -o tsv 2>/dev/null || echo '')" + if [[ "$mode" != *Abac* && "$mode" != *abac* ]]; then + fatal "ACR $ACR_NAME exists but is not ABAC-enabled (roleAssignmentMode=$mode); choose a new ACR_NAME" + fi + info "ABAC-enabled ACR $ACR_NAME already exists; reusing" + else + run_capture "13-acr-create" az acr create -g "$ACR_RG" -n "$ACR_NAME" \ + --sku "$ACR_SKU" --location "$ACCOUNT_LOCATION" --role-assignment-mode rbac-abac + fi + + local acr_id caller_id principal_type + acr_id="$(az acr show -n "$ACR_NAME" --query id -o tsv)" + # Avoid Microsoft Graph here: some tenants block `az ad signed-in-user show` + # via Conditional Access. The ARM token contains the caller object id (`oid`). + caller_id="$(az account get-access-token --resource https://management.azure.com/ \ + --query accessToken -o tsv | python3 -c 'import base64,json,sys; p=sys.stdin.read().strip().split(".")[1]; print(json.loads(base64.urlsafe_b64decode(p + "=" * (-len(p) % 4))).get("oid", ""))')" + principal_type="$(az account show --query user.type -o tsv)" + if [[ "$principal_type" == "servicePrincipal" ]]; then + principal_type="ServicePrincipal" + else + principal_type="User" + fi + + if [[ -n "$caller_id" ]]; then + # ABAC-enabled registries require repository-scoped data-plane roles. The + # caller queues the ACR Task and needs repository write to push the built + # image. The project MI receives Repository Reader later for image pull. + run_capture "13-acr-caller-writer" az role assignment create \ + --assignee-object-id "$caller_id" --assignee-principal-type "$principal_type" \ + --role "Container Registry Repository Writer" --scope "$acr_id" || \ + warn "caller repository-writer grant failed (may already exist)" + sleep 30 # role propagation before the ACR Task push + else + warn "could not resolve caller object id; ensure caller has Container Registry Repository Writer" + fi + + # ABAC-enabled repository permissions require the caller identity when ACR + # Tasks authenticates to a source registry. Keep the literal [caller] quoted + # so the shell does not interpret it as a glob. + run_capture "13-acr-build" az acr build -r "$ACR_NAME" \ + -t "$IMAGE_REPO:$IMAGE_TAG" --source-acr-auth-id "[caller]" "$ECHO_DUAL_DIR" + IMAGE="$ACR_NAME.azurecr.io/$IMAGE_REPO:$IMAGE_TAG" + printf 'IMAGE=%s\nACR_NAME=%s\nACR_RG=%s\n' "$IMAGE" "$ACR_NAME" "$ACR_RG" >"$OUT_DIR/13-image.txt" + info "IMAGE=$IMAGE" +} + +# --- phase 2: what-if matrix ------------------------------------------------- + +# the matrix cells. The first BYO create/own cell is also the real-provision +# cell. Managed-egress cells omit agentSubnet; the last two exercise both +# isolationMode values (the managedNetworks child resource). +# fields: [iso] +MATRIX=( + "byo create own" + "byo create reference" + "byo reference own" + "byo reference reference" + "managed create own" + "managed reference reference" + "managed create own AllowInternetOutbound" + "managed create own AllowOnlyApprovedOutbound" +) + +phase2_whatif_matrix() { + info "### phase 2: what-if matrix (no creation, up to ${MAX_PARALLEL} parallel)" + local cell eg sm dm iso tag + local -a pids=() tags=() + local rc=0 + for cell in "${MATRIX[@]}"; do + read -r eg sm dm iso <<<"$cell" + tag="${eg}-${sm}-${dm}${iso:+-$iso}" + # Throttle: wait for a slot when MAX_PARALLEL cells are in flight. + while (( ${#pids[@]} >= MAX_PARALLEL )); do + if ! wait "${pids[0]}"; then rc=1; warn "what-if[${tags[0]}] failed"; fi + pids=("${pids[@]:1}"); tags=("${tags[@]:1}") + done + # Each cell uses an isolated project dir + azd env, and what-if creates + # nothing against the shared VNet, so cells are safe to run concurrently. + ( + setup_project "wi-$tag" "$eg" "$sm" "$dm" "$iso" + cd "$PROJECT_DIR" + preview_capture "20-whatif-$tag" + info "ok: what-if[$tag] generated a valid plan" + ) & + pids+=("$!"); tags+=("$tag") + done + # Drain the rest. + local i + for i in "${!pids[@]}"; do + if ! wait "${pids[$i]}"; then rc=1; warn "what-if[${tags[$i]}] failed"; fi + done + (( rc == 0 )) || die "phase 2: one or more what-if cells failed" +} + +# --- phase 3: real provision (create/own) ------------------------------------ + +phase3_real_provision() { + info "### phase 3: real provision (create+own)" + setup_project "real" byo create own + REAL_DIR="$PROJECT_DIR" + ( cd "$REAL_DIR" + STEP_TIMEOUT=1800 run_capture "30-provision" azd provision --no-prompt + azd env get-values >"$OUT_DIR/30-env-after-provision.txt" 2>&1 || true + ) + + # resolve account for the live-topology assertions. + RG="$(cd "$REAL_DIR" && azd env get-value AZURE_RESOURCE_GROUP)" + ACCOUNT_NAME="$(cd "$REAL_DIR" && azd env get-value AZURE_AI_ACCOUNT_NAME)" + + # live topology assertions + ( cd "$REAL_DIR" + RG="$RG" ACCOUNT_NAME="$ACCOUNT_NAME" VNET_RG="$VNET_RG" VNET_NAME="$VNET_NAME" \ + AGENT_SUBNET="$AGENT_SUBNET_CREATE" PE_SUBNET="$PE_SUBNET_CREATE" \ + EXPECT_DNS_ZONES=own \ + bash "$SCRIPT_DIR/assert-resources.sh" + ) 2>&1 | tee "$OUT_DIR/31-assert-resources.log" +} + +# --- phase 3b: managed-egress isolationMode (gated) -------------------------- + +# Provisions the managed-egress AllowOnlyApprovedOutbound cell and asserts the +# managedNetworks/default child resource was accepted with that isolationMode. +# This is the one scenario `azd provision --preview` cannot confirm (the V2 +# managed network is created, not just planned). Gated behind RUN_MANAGED_ISO +# because it provisions a second real account. Cleans up its own RG inline. +phase3b_managed_iso() { + [[ "${RUN_MANAGED_ISO:-false}" == "true" ]] || { info "### phase 3b: managed-iso (skipped; set RUN_MANAGED_ISO=true)"; return 0; } + info "### phase 3b: managed-egress AllowOnlyApprovedOutbound (real provision)" + # Dedicated VNet (see ISO_VNET_NAME): a dns=own account links its VNet to the + # AI privatelink zones, and a VNet may hold only one link per zone namespace. + # The phase-3 account already linked the shared VNet, so the managed cell + # provisions into its own VNet to create+link its zones without colliding. + # (Brownfield/multi-account callers use dns: reference mode, which skips the + # link entirely.) + run_capture "31-iso-vnet" az network vnet create -g "$VNET_RG" -n "$ISO_VNET_NAME" \ + --address-prefixes "$VNET_CIDR" -l "$ACCOUNT_LOCATION" + local iso_vnet_id + iso_vnet_id="$(az network vnet show -g "$VNET_RG" -n "$ISO_VNET_NAME" --query id -o tsv)" + setup_project "iso" managed create own AllowOnlyApprovedOutbound + local iso_dir="$PROJECT_DIR" iso_rg iso_acct iso_mode + ( cd "$iso_dir" + azd env set AZURE_VNET_ID "$iso_vnet_id" >/dev/null # override the shared VNet + STEP_TIMEOUT=1800 run_capture "32-provision-iso" azd provision --no-prompt + ) + iso_rg="$(cd "$iso_dir" && azd env get-value AZURE_RESOURCE_GROUP)" + iso_acct="$(cd "$iso_dir" && azd env get-value AZURE_AI_ACCOUNT_NAME)" + iso_mode="$(az resource show \ + --ids "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$iso_rg/providers/Microsoft.CognitiveServices/accounts/$iso_acct/managedNetworks/default" \ + --api-version 2025-10-01-preview --query 'properties.managedNetwork.isolationMode' -o tsv 2>/dev/null || echo '')" + # Always clean up the second account RG, then fail if the assertion missed. + run_capture "33-del-iso-rg" az group delete -n "$iso_rg" --yes --no-wait || true + if [[ "$iso_mode" == "AllowOnlyApprovedOutbound" ]]; then + info "ok: managedNetworks/default isolationMode=$iso_mode" + else + die "managedNetworks/default isolationMode mismatch: got '$iso_mode', want AllowOnlyApprovedOutbound" + fi +} + +# grant the Foundry project managed identity repository read on the BYO +# registry. This ACR uses RBAC+ABAC, so the correct role is the ABAC-aware +# "Container Registry Repository Reader" (not the legacy AcrPull). Only needed +# for the gated deploy phase (image pull). +grant_acr_pull() { + local acr_login acr_name acr_id project_id pid + acr_login="${IMAGE%%/*}" + acr_name="${acr_login%%.*}" + acr_id="$(az acr show -n "$acr_name" --query id -o tsv 2>/dev/null || echo '')" + if [[ -z "$acr_id" ]]; then + warn "could not resolve ACR '$acr_name' id; grant the project MI 'Container Registry Repository Reader' manually" + return 0 + fi + + project_id="$(cd "$REAL_DIR" && azd env get-value AZURE_AI_PROJECT_ID 2>/dev/null || echo '')" + if [[ -n "$project_id" ]]; then + pid="$(az rest --method get \ + --url "https://management.azure.com${project_id}?api-version=2025-04-01-preview" \ + --query identity.principalId -o tsv 2>/dev/null || echo '')" + fi + # Fallback for older RP/API shapes, but the hosted-agent image pull uses the + # project MI when a project-scoped identity exists. + if [[ -z "${pid:-}" || "$pid" == "null" ]]; then + pid="$(az cognitiveservices account show -g "$RG" -n "$ACCOUNT_NAME" \ + --query identity.principalId -o tsv 2>/dev/null || echo '')" + fi + if [[ -z "${pid:-}" || "$pid" == "null" ]]; then + warn "could not resolve project MI principalId; grant repository read manually" + return 0 + fi + run_capture "30-acr-pull" az role assignment create --assignee-object-id "$pid" \ + --assignee-principal-type ServicePrincipal --role "Container Registry Repository Reader" \ + --scope "$acr_id" || \ + warn "repository-read grant failed (may already exist or need an ABAC condition)" +} + +# --- phase 4: eject idempotency (Scenario 2) --------------------------------- + +phase4_eject() { + info "### phase 4: eject idempotency on the provisioned env" + ( cd "$REAL_DIR" + run_capture "40-eject" azd ai agent init --infra + # the ejected params must preserve the ${VAR} token (regression guard). + assert_contains "$(cat infra/main.parameters.json)" '${AZURE_VNET_ID}' \ + 'ejected params preserve vnet ${VAR} placeholder' + # what-if against the live account: ejected on-disk template + provision-time + # ${VAR} resolution must reproduce the same topology -> no changes. + preview_capture "41-eject-whatif" + if grep -qiE 'no changes|nothing to (deploy|change)' "$OUT_DIR/41-eject-whatif.log"; then + info "ok: eject what-if reports no changes (idempotent)" + else + warn "eject what-if shows changes; inspect $OUT_DIR/41-eject-whatif.log" + fi + ) +} + +# --- phase 5: deploy + invoke (gated: RUN_DEPLOY=true) ----------------------- + +phase5_deploy_invoke() { + if [[ "$RUN_DEPLOY" != "true" ]]; then + warn "RUN_DEPLOY!=true: skipping deploy+invoke." + warn "deploy+invoke also needs VNet line-of-sight to the westus account PE; if using a jumpbox, put it in ${CLIENT_LOCATION} (peered to the westus VNet)." + return 0 + fi + info "### phase 5: deploy + invoke under the VNet" + jumpbox_up # line-of-sight VM (in-VNet, or peered fallback) + jumpbox_socks_open # local SOCKS5 proxy into the VNet + grant_acr_pull # repository read for the BYO image pull + # Tunnel azd's data-plane HTTPS to the private account FQDNs through the + # jumpbox. SOCKS5 (socks5h) does remote DNS on the jumpbox, which resolves + # the privatelink names to the PE IP. azd itself runs here (our extension). + ( cd "$REAL_DIR" + export HTTPS_PROXY="socks5://127.0.0.1:${JB_SOCKS_PORT}" + export HTTP_PROXY="$HTTPS_PROXY" ALL_PROXY="$HTTPS_PROXY" NO_PROXY="127.0.0.1,localhost" + STEP_TIMEOUT=1800 run_capture "50-deploy" azd deploy --no-prompt + azd ai agent show --output json >"$OUT_DIR/51-show.json" 2>&1 || true + STEP_TIMEOUT=300 run_capture "52-invoke" azd ai agent invoke --new-session "hello, are you up?" + ) +} + +# --- phase 6: teardown ------------------------------------------------------- + +phase6_teardown() { + if [[ "$KEEP" == "true" ]]; then warn "KEEP=true: skipping teardown"; return 0; fi + info "### phase 6: teardown" + jumpbox_down + if [[ -n "${REAL_DIR:-}" && -d "$REAL_DIR" ]]; then + ( cd "$REAL_DIR" && run_capture "60-down" azd down --force --purge ) || \ + warn "azd down failed; clean up manually" + fi + run_capture "61-del-vnet-rg" az group delete -n "$VNET_RG" --yes --no-wait || true + run_capture "61-del-dns-rg" az group delete -n "$DNS_RG" --yes --no-wait || true +} + +# --- main -------------------------------------------------------------------- + +main() { + require_tools + SUBSCRIPTION_ID="${SUBSCRIPTION_ID:-$(az account show --query id -o tsv)}" + { + echo "run_id=$RUN_ID" + echo "subscription=$SUBSCRIPTION_ID" + echo "account_location=$ACCOUNT_LOCATION" + echo "client_location=$CLIENT_LOCATION max_parallel=$MAX_PARALLEL" + echo "image=$IMAGE" + echo "build_image=$BUILD_IMAGE" + echo "echo_dual_dir=$ECHO_DUAL_DIR" + echo "acr_name=$ACR_NAME acr_rg=$ACR_RG" + echo "target_rg=$TARGET_RG" + echo "run_deploy=$RUN_DEPLOY" + echo "work_dir=$WORK_DIR" + echo "out_dir=$OUT_DIR" + echo "vnet_rg=$VNET_RG dns_rg=$DNS_RG vnet=$VNET_NAME" + azd version + } >"$OUT_DIR/00-context.txt" + + trap 'phase6_teardown' EXIT + # MAX_PHASE lets you stop early while iterating (e.g. MAX_PHASE=2 for the cheap + # VNet + what-if gates). Teardown still runs via the EXIT trap unless KEEP=true. + local max="${MAX_PHASE:-6}" + phase0_local_gates + if (( max >= 1 )); then phase1_shared_infra; fi + build_byo_image + if (( max >= 2 )); then phase2_whatif_matrix; fi + if (( max >= 3 )); then phase3_real_provision; fi + if (( max >= 3 )); then phase3b_managed_iso; fi + if (( max >= 4 )); then phase4_eject; fi + if (( max >= 5 )); then phase5_deploy_invoke; fi + # teardown runs via trap + info "E2E complete (through phase $max). Logs: $OUT_DIR" +} + +# Run main only when executed directly; sourcing (e.g. a phase-5 iteration +# driver) gets the functions/vars without kicking off a full run. +if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then + main "$@" +fi diff --git a/docs/reference/telemetry-data.md b/docs/reference/telemetry-data.md index d95af6256f3..b4855b28f2d 100644 --- a/docs/reference/telemetry-data.md +++ b/docs/reference/telemetry-data.md @@ -390,6 +390,16 @@ Emitted on `azd provision` / `azd up` to measure adoption and safety of `infra.l | `provision.layer.explicit_dependson_count` | measurement | Layers using the explicit `infra.layers[].dependsOn` override | +
+Foundry Private Networking + +Emitted at provision start by the `microsoft.foundry` provisioning provider (the `azure.ai.agents` extension) to measure secured-agent adoption and the BYO-vs-managed split. + +| Field Key | Type | Description | +|-----------|------|-------------| +| `provision.network_mode` | string | `none` (public account, no `network:` block), `byo` (customer VNet), or `managed` (Foundry-managed VNet) | +
+
Environment Management