Skip to content

Commit 9b76f20

Browse files
committed
fix: address comments
1 parent d4b1aec commit 9b76f20

File tree

1 file changed

+56
-119
lines changed

1 file changed

+56
-119
lines changed

docs/proposals/20250818-multi-az-apiserver-loadbalancer.md

Lines changed: 56 additions & 119 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Add first-class Multi-AZ support for the Kubernetes control plane LoadBalancer i
1919
- Managing or provisioning DNS records.
2020
- Provider-specific topologies such as ACTIVE_STANDBY across fault domains.
2121
- Service type LoadBalancer for worker Services.
22+
- Automatic creation of the network and subnets; the operator is responsible for creating the network and subnets before enabling multi-AZ LoadBalancer.
2223

2324
## User Stories
2425
1) As a platform engineer, I want per-AZ LBs so a full AZ outage leaves the cluster reachable via DNS multi-A records that resolve to the remaining AZs.
@@ -48,130 +49,85 @@ flowchart LR
4849
end
4950
```
5051

51-
## Integration with External Global Server Load Balancing (GSLB)
52-
53-
External GSLB systems (e.g., Route 53 health-checked records, Akamai GTM, Cloudflare Load Balancing, NS1, F5 DNS/GTM) pair naturally with this Multi-AZ LB design:
54-
55-
- Clear targets: Each AZ has its own LB with a stable IP (floating IP or provider VIP) and deterministic name. These per-AZ endpoints are ideal GSLB health-check targets.
56-
- Health-aware failover: GSLB continuously probes each per-AZ LB (TCP 6443 or an alternative port configured via additionalPorts) and automatically removes unhealthy AZ endpoints from DNS responses.
57-
- Improved blast-radius isolation: An AZ outage only affects the corresponding AZ LB. GSLB maintains service by answering with remaining healthy AZ LB IPs.
58-
- Policy flexibility: GSLB policies (failover, weighted round-robin, latency/geo) can prefer:
59-
- Same-region/same-AZ endpoints for lowest latency
60-
- Spillover to other AZs only on failure
61-
- Weighted distribution across AZs for capacity utilization
62-
63-
Recommended GSLB patterns
64-
- Record model: Use a single control plane FQDN (the cluster’s spec.controlPlaneEndpoint.Host) and publish multiple A/AAAA records—one per AZ LB IP.
65-
- Health checks:
66-
- Protocol: TCP on the API port (default 6443). For providers that support L7 checks, TCP is generally sufficient for the Kubernetes API.
67-
- Source IPs: Ensure GSLB checker IPs are permitted if using allowedCIDRs on listeners.
68-
- TTL guidance:
69-
- Use low TTL (e.g., 30–60s) to accelerate failover while balancing resolver load.
70-
- Be aware that some clients cache beyond TTL; plan operationally for a brief grace period during failover.
71-
- IP sourcing:
72-
- Floating IPs typically simplify routing and are stable across LB re-creation.
73-
- If using fixed VIPs (no floating), ensure they are routable to your GSLB health-check network and external resolvers that must reach them.
74-
- Automation hooks:
75-
- Deterministic LB naming (per-AZ suffix) and tags facilitate discovery by GSLB automation to register/update record sets.
76-
- A controller or out-of-band job can list per-AZ LBs and synchronize GSLB records and health checks.
77-
78-
Failure scenarios and behavior
79-
- Single AZ failure: The corresponding per-AZ LB becomes unhealthy; GSLB health checks fail; DNS answers exclude that AZ until recovery. Existing connections may break depending on client TCP retry behavior; new connections will target healthy AZs.
80-
- Partial AZ degradation (e.g., only some members or monitor thresholds): Octavia monitor status influences LB health; ensure GSLB health thresholds align with Octavia monitor sensitivity to avoid premature removal or flapping.
81-
- Network partitions from health-check vantage points:
82-
- If GSLB checkers reside outside the cloud, confirm egress paths to per-AZ IPs and allowedCIDRs permit probes from those checkers.
83-
- Consider diverse checker regions to avoid false positives due to upstream routing issues.
84-
85-
Operational considerations
86-
- Access control: When using allowedCIDRs, include:
87-
- Management cluster egress IPs (so CAPO can reconcile listeners/pools/monitors)
88-
- Bastion/router IPs as needed for administration
89-
- GSLB health-check source IP ranges
90-
- Observability:
91-
- Track per-AZ LB health and GSLB health check status together to diagnose discrepancies (LB marked healthy, but GSLB marks unhealthy often indicates ACL/routing issues).
92-
- Multi-region future: This proposal focuses on multi-AZ within a region. If multi-region is introduced later, the same per-AZ model composes naturally: per-AZ LBs per region, with GSLB distributing across regions using latency- or geo-based policies and regional failover priorities.
93-
94-
This integration enables operators to achieve health-aware, low-latency, and failure-tolerant access to the Kubernetes API without CAPO managing DNS, while leveraging the explicit per-AZ LB separation for precise GSLB control.
95-
9652
## API Changes (additive)
9753

9854
All changes are confined to the OpenStackCluster API and are backward compatible. Proposed changes in:
99-
- [api/v1beta1/openstackcluster_types.go](api/v1beta1/openstackcluster_types.go)
100-
- [api/v1beta1/types.go](api/v1beta1/types.go)
55+
- [api/v1beta1/openstackcluster_types.go](../../api/v1beta1/openstackcluster_types.go)
56+
- [api/v1beta1/types.go](../../api/v1beta1/types.go)
10157

102-
### Spec additions on APIServerLoadBalancer
103-
- availabilityZoneSubnets []AZSubnetMapping (required to enable multi-AZ)
58+
### Spec additions on `APIServerLoadBalancer`
59+
- `availabilityZoneSubnets []AZSubnetMapping` (required to enable multi-AZ)
10460
- Explicit mapping; each entry includes:
105-
- availabilityZone string
106-
- subnet SubnetParam
107-
- The LB network MUST be specified when using this mapping via spec.apiServerLoadBalancer.network. Each mapped subnet MUST belong to that network.
108-
- allowCrossAZLoadBalancerMembers *bool
109-
- Default false.
110-
- When true, register control plane nodes to all per-AZ LBs; otherwise same-AZ only.
111-
- additionalPorts []int
61+
- `availabilityZone string`
62+
- `subnet SubnetParam`
63+
- The LB network MUST be specified when using this mapping via `spec.apiServerLoadBalancer.network`. Each mapped subnet MUST belong to that network.
64+
- `allowCrossAZLoadBalancerMembers *bool`
65+
- Default `false`.
66+
- When `true`, register control plane nodes to all per-AZ LBs; otherwise same-AZ only.
67+
- `additionalPorts []int`
11268
- Optional extra listener ports besides the Kubernetes API port.
113-
- allowedCIDRs []string
69+
- `allowedCIDRs []string`
11470
- Optional VIP ACL list when supported by the Octavia provider.
11571

11672
Notes:
117-
- The existing single-value availabilityZone field (if present) is treated as a legacy single-AZ shorthand; multi-AZ requires availabilityZoneSubnets.
73+
- The existing single-value `availabilityZone` field (if present) is treated as a legacy single-AZ shorthand; multi-AZ requires `availabilityZoneSubnets`.
11874

11975
### Status additions
120-
- apiServerLoadBalancers []LoadBalancer
121-
- A list-map keyed by availabilityZone (kubebuilder listMapKey=availabilityZone).
122-
- Each entry includes: name, id, ip, internalIP, tags, availabilityZone, loadBalancerNetwork, allowedCIDRs.
76+
- `apiServerLoadBalancers []LoadBalancer`
77+
- A list-map keyed by `availabilityZone` (kubebuilder `listMapKey=availabilityZone`).
78+
- Each entry includes: `name`, `id`, `ip`, `internalIP`, `tags`, `availabilityZone`, `loadBalancerNetwork`, `allowedCIDRs`.
12379

12480
### Validation (CRD and controller)
125-
- No duplicate availabilityZone values in availabilityZoneSubnets.
126-
- Each availabilityZoneSubnets.subnet MUST resolve to a subnet that belongs to the specified LB network.
81+
- No duplicate `availabilityZone` values in `availabilityZoneSubnets`.
82+
- Each `availabilityZoneSubnets.subnet` MUST resolve to a subnet that belongs to the specified LB network.
12783
- No duplicate subnets across mappings.
12884
- At least one mapping is required to enable multi-AZ; otherwise behavior is legacy single-LB.
12985

13086
CRD updates in:
131-
- [config/crd/bases/](config/crd/bases/)
132-
- [config/crd/patches/](config/crd/patches/)
87+
- [config/crd/bases/](../../config/crd/bases/)
88+
- [config/crd/patches/](../../config/crd/patches/)
13389

13490
## Controller Design
13591

13692
Changes span these components:
137-
- [controllers/openstackcluster_controller.go](controllers/openstackcluster_controller.go)
138-
- [pkg/cloud/services/loadbalancer/](pkg/cloud/services/loadbalancer/)
139-
- [pkg/cloud/services/networking/](pkg/cloud/services/networking/)
93+
- [controllers/openstackcluster_controller.go](../../controllers/openstackcluster_controller.go)
94+
- [pkg/cloud/services/loadbalancer/](../../pkg/cloud/services/loadbalancer/)
95+
- [pkg/cloud/services/networking/](../../pkg/cloud/services/networking/)
14096

14197
### VIP network and subnet resolution
142-
- When spec.apiServerLoadBalancer.network is specified with availabilityZoneSubnets:
143-
- Resolve each SubnetParam in order; validate that each belongs to the given LB network.
98+
- When `spec.apiServerLoadBalancer.network` is specified with `availabilityZoneSubnets`:
99+
- Resolve each `SubnetParam` in order; validate that each belongs to the given LB network.
144100
- Derive the AZ list directly from the mapping entries.
145-
- Persist the LB network and the ordered subnets into status.apiServerLoadBalancer.loadBalancerNetwork.
101+
- Persist the LB network and the ordered subnets into `status.apiServerLoadBalancer.loadBalancerNetwork`.
146102
- Legacy single-AZ behavior (no mapping provided):
147103
- If an LB network is specified but no mapping is provided, treat as single-LB and select a subnet per legacy rules (unchanged).
148-
- If no LB network is specified, default to the cluster networks subnets (unchanged single-LB behavior).
104+
- If no LB network is specified, default to the cluster network's subnets (unchanged single-LB behavior).
149105

150-
Initialize or update status.apiServerLoadBalancers entries to carry the LB network reference.
106+
Initialize or update `status.apiServerLoadBalancers` entries to carry the LB network reference.
151107

152108
### Per-AZ LoadBalancer reconciliation
153-
For each AZ in availabilityZoneSubnets:
109+
For each AZ in `availabilityZoneSubnets`:
154110
- Determine the VIP subnet from the mapping and create or adopt a LoadBalancer named:
155111
- k8s-clusterapi-cluster-${NAMESPACE}-${CLUSTER_NAME}-${AZ}-kubeapi
156-
- Set Octavia AvailabilityZone hint when supported by the provider.
157-
- Create or adopt listeners, pools, and monitors for the API port and any additionalPorts.
112+
- Set Octavia `AvailabilityZone` hint when supported by the provider.
113+
- Create or adopt listeners, pools, and monitors for the API port and any `additionalPorts`.
158114
- If floating IPs are not disabled, allocate and associate a floating IP to the LB VIP port when needed.
159-
- Update or insert the AZ entry in status.apiServerLoadBalancers, including name, id, internalIP, optional ip, tags, allowedCIDRs, and loadBalancerNetwork.
115+
- Update or insert the AZ entry in `status.apiServerLoadBalancers`, including `name`, `id`, `internalIP`, optional `ip`, `tags`, `allowedCIDRs`, and `loadBalancerNetwork`.
160116

161117
### Legacy adoption and migration
162118
- Discover legacy single-LB resources named:
163-
- k8s-clusterapi-cluster-${NAMESPACE}-${CLUSTER_NAME}-kubeapi
164-
- When multi-AZ is enabled (availabilityZoneSubnets provided), rename legacy resources to the AZ-specific name for the first configured AZ, or adopt correctly named resources if they already exist.
119+
- `k8s-clusterapi-cluster-${NAMESPACE}-${CLUSTER_NAME}-kubeapi`
120+
- When multi-AZ is enabled (`availabilityZoneSubnets` provided), rename legacy resources to the AZ-specific name for the first configured AZ, or adopt correctly named resources if they already exist.
165121
- Emit clear events and warnings; ensure idempotent operation.
166122

167123
### Member registration behavior
168124
- Determine the machine failure domain (AZ) from the owning control plane machine.
169-
- Default behavior: register the node only with the LoadBalancer whose availabilityZone matches the nodes AZ; if the legacy LB exists without an AZ, include it as a fallback.
170-
- When allowCrossAZLoadBalancerMembers is true: register the node with all per-AZ LBs.
171-
- Reconcile membership across the API port and any additionalPorts.
125+
- Default behavior: register the node only with the LoadBalancer whose `availabilityZone` matches the node's AZ; if the legacy LB exists without an AZ, include it as a fallback.
126+
- When `allowCrossAZLoadBalancerMembers` is `true`: register the node with all per-AZ LBs.
127+
- Reconcile membership across the API port and any `additionalPorts`.
172128

173129
### Control plane endpoint
174-
- Preserve a user-provided DNS in spec.controlPlaneEndpoint when set and valid.
130+
- Preserve a user-provided DNS in `spec.controlPlaneEndpoint` when set and valid.
175131
- Otherwise choose:
176132
- The LB floating IP if present, else the VIP for an LB.
177133
- If no LB host is available and floating IPs are allowed, allocate or adopt a floating IP for the cluster endpoint when applicable.
@@ -185,7 +141,7 @@ For each AZ in availabilityZoneSubnets:
185141

186142
## Example configurations
187143

188-
Explicit AZ→Subnet mapping (required for multi-AZ)
144+
### Explicit AZ→Subnet mapping (required for multi-AZ)
189145
```yaml
190146
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
191147
kind: OpenStackCluster
@@ -207,7 +163,7 @@ spec:
207163
allowCrossAZLoadBalancerMembers: false
208164
```
209165
210-
Allow cross-AZ member registration
166+
### Allow cross-AZ member registration
211167
```yaml
212168
spec:
213169
apiServerLoadBalancer:
@@ -224,55 +180,36 @@ spec:
224180
allowCrossAZLoadBalancerMembers: true
225181
```
226182
227-
Restrict access using allowed CIDRs
228-
```yaml
229-
spec:
230-
apiServerLoadBalancer:
231-
enabled: true
232-
network:
233-
id: 6c90b532-7ba0-418a-a276-5ae55060b5b0
234-
availabilityZoneSubnets:
235-
- availabilityZone: az1
236-
subnet:
237-
id: cad5a91a-36de-4388-823b-b0cc82cadfdc
238-
- availabilityZone: az2
239-
subnet:
240-
id: e2407c18-c4e7-4d3d-befa-8eec5d8756f2
241-
allowedCIDRs:
242-
- 192.0.2.0/24
243-
- 203.0.113.10
244-
```
245-
246183
## Backward compatibility and migration
247184
248185
- Default behavior remains single-LB when no multi-AZ mapping is provided.
249186
- Enabling multi-AZ:
250-
- Operators add availabilityZoneSubnets (and optionally additionalPorts, allowedCIDRs, allowCrossAZLoadBalancerMembers) and must specify the LB network.
187+
- Operators add `availabilityZoneSubnets` (and optionally `additionalPorts`, `allowedCIDRs`, `allowCrossAZLoadBalancerMembers`) and must specify the LB network.
251188
- Controller renames or adopts legacy resources into AZ-specific naming.
252-
- status.apiServerLoadBalancers is populated alongside legacy status until further cleanup.
189+
- `status.apiServerLoadBalancers` is populated alongside legacy status until further cleanup.
253190
- Disabling multi-AZ:
254191
- Remove the mapping; controller maintains single-LB behavior.
255-
- Per-AZ LBs are not automatically deleted; operators may clean up unused resources.
192+
- Per-AZ LBs are not automatically deleted to prevent accidental data loss and to allow operators to gracefully migrate traffic before cleanup. Operators are responsible for manually cleaning up unused LoadBalancer resources.
256193

257194
## Testing strategy
258195

259-
Unit tests
196+
### Unit tests
260197
- Validation: duplicate AZs, duplicate subnets in mapping, wrong network-subnet associations.
261198
- LB reconciliation: AZ hint propagation, per-port resource creation and updates.
262199
- Migration/adoption: renaming legacy resources and adopting correctly-named resources.
263200
- Member registration: defaults and cross-AZ opt-in.
264201
- Allowed CIDRs: canonicalization and provider capability handling.
265202

266-
E2E tests
203+
### E2E tests
267204
- Multi-AZ suite to verify per-AZ LBs exist with expected names and ports.
268-
- status.apiServerLoadBalancers contains per-AZ entries including LB network and IPs.
205+
- `status.apiServerLoadBalancers` contains per-AZ entries including LB network and IPs.
269206
- Control plane nodes register to same-AZ LB (or to all LBs when cross-AZ is enabled).
270207
- DNS records remain out of scope for e2e.
271208

272209
Test code locations:
273-
- [pkg/cloud/services/loadbalancer/](pkg/cloud/services/loadbalancer/)
274-
- [controllers/](controllers/)
275-
- [test/e2e/](test/e2e/)
210+
- [pkg/cloud/services/loadbalancer/](../../pkg/cloud/services/loadbalancer/)
211+
- [controllers/](../../controllers/)
212+
- [test/e2e/](../../test/e2e/)
276213

277214
## Risks and mitigations
278215
- Mapping/network mismatches: reject with clear validation messages; enforce via CRD CEL where feasible and in-controller checks.
@@ -282,15 +219,15 @@ Test code locations:
282219

283220
## Rollout plan
284221
1) API and CRD changes:
285-
- Add new fields and list-map keyed status to OpenStackCluster types in [api/v1beta1/](api/v1beta1/).
286-
- Update CRDs in [config/crd/bases/](config/crd/bases/) and patches in [config/crd/patches/](config/crd/patches/).
222+
- Add new fields and list-map keyed status to OpenStackCluster types in [api/v1beta1/](../../api/v1beta1/).
223+
- Update CRDs in [config/crd/bases/](../../config/crd/bases/) and patches in [config/crd/patches/](../../config/crd/patches/).
287224
2) Controller implementation:
288-
- VIP network/subnet resolution and explicit AZ mapping in [controllers/openstackcluster_controller.go](controllers/openstackcluster_controller.go).
289-
- Per-AZ LB reconciliation, rename/adoption, member selection, and optional floating IPs in [pkg/cloud/services/loadbalancer/](pkg/cloud/services/loadbalancer/).
225+
- VIP network/subnet resolution and explicit AZ mapping in [controllers/openstackcluster_controller.go](../../controllers/openstackcluster_controller.go).
226+
- Per-AZ LB reconciliation, rename/adoption, member selection, and optional floating IPs in [pkg/cloud/services/loadbalancer/](../../pkg/cloud/services/loadbalancer/).
290227
3) Documentation:
291-
- Update configuration guide and examples in [docs/book/src/clusteropenstack/configuration.md](docs/book/src/clusteropenstack/configuration.md).
228+
- Update configuration guide and examples in [docs/book/src/clusteropenstack/configuration.md](../book/src/clusteropenstack/configuration.md).
292229
4) Testing:
293-
- Unit tests across controller and services; e2e suite updates in [test/e2e/](test/e2e/).
230+
- Unit tests across controller and services; e2e suite updates in [test/e2e/](../../test/e2e/).
294231
5) Optional metrics:
295232
- Add observability for per-AZ LB counts and reconciliation timings (non-breaking).
296233

0 commit comments

Comments
 (0)