Skip to content

Commit 660f634

Browse files
committed
feat: proposal for multi-az apiserver loadbalancer
Add a design proposal for first-class Multi-AZ support for the Kubernetes control plane LoadBalancer in CAPO. The feature reconciles one Octavia LoadBalancer per Availability Zone via an explicit AZ-to-Subnet mapping, registers control plane nodes with the LB in their AZ by default, and relies on external DNS multi-value A records for client-side failover.
1 parent a36532d commit 660f634

File tree

1 file changed

+235
-0
lines changed

1 file changed

+235
-0
lines changed
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Multi-AZ API Server LoadBalancer for CAPO
2+
3+
## Summary
4+
Add first-class Multi-AZ support for the Kubernetes control plane LoadBalancer in Cluster API Provider OpenStack (CAPO). The feature reconciles one Octavia LoadBalancer per Availability Zone (AZ), places each VIP in the intended subnet for that AZ via an explicit AZ→Subnet mapping, and by default registers control plane nodes only with the LB in the same AZ. Operators expose the control plane endpoint via external DNS multi-value A records that point at the per-AZ LB IPs. This proposal is additive and backward compatible.
5+
6+
## Motivation
7+
- Achieve true multi-AZ resilience for the control plane by avoiding a single VIP dependency.
8+
- Align control plane networking with existing multi-AZ compute placement goals.
9+
- Provide clear, portable primitives across Octavia providers with native AZ hints and an explicit, unambiguous mapping between AZs and VIP subnets.
10+
11+
## Goals
12+
- Create and manage one API server LoadBalancer per configured AZ.
13+
- Support explicit AZ→Subnet mapping only (no positional mapping).
14+
- Default to same-AZ LB membership for control plane nodes; allow opt-in cross-AZ registration.
15+
- Keep the API additive with strong validation, clear events and documentation.
16+
- Preserve user-provided DNS endpoints; DNS record management remains out of scope.
17+
18+
## Non-Goals
19+
- Managing or provisioning DNS records.
20+
- Provider-specific topologies such as ACTIVE_STANDBY across fault domains.
21+
- Service type LoadBalancer for worker Services.
22+
- Automatic creation of the network and subnets; the operator is responsible for creating the network and subnets before enabling multi-AZ LoadBalancer.
23+
24+
## User Stories
25+
1) As a platform engineer, I want per-AZ LBs so a full AZ outage leaves the cluster reachable via DNS multi-A records that resolve to the remaining AZs.
26+
2) As an operator, I want a safe migration path from single-LB clusters to per-AZ LBs without downtime.
27+
3) As an operator, I pre-create the network and subnets for each AZ and then configure CAPO to place a LoadBalancer in each one.
28+
29+
## Design Overview
30+
31+
### High-level behavior
32+
- When enabled and configured with an explicit mapping, CAPO reconciles one LoadBalancer per Availability Zone (AZ).
33+
- VIP placement is controlled only by an explicit mapping list that binds each AZ to a specific subnet on the LB network.
34+
- Each per-AZ LB is named with an AZ suffix.
35+
- Control plane nodes are registered as LB members only in their AZ by default; opt-in cross-AZ membership is supported.
36+
- Operators expose an external DNS name for the control plane endpoint with one A/AAAA record per AZ LB IP.
37+
38+
### Architecture diagram
39+
```mermaid
40+
flowchart LR
41+
Clients --> DNS[External DNS zone]
42+
DNS -->|A record per AZ| LBa[LB az1]
43+
DNS -->|A record per AZ| LBb[LB az2]
44+
DNS -->|A record per AZ| LBn[LB azN]
45+
subgraph OpenStack
46+
LBa --> LaL[Listeners] --> Pa[Pools] --> CP1[Control plane nodes in az1]
47+
LBb --> LbL[Listeners] --> Pb[Pools] --> CP2[Control plane nodes in az2]
48+
LBn --> LnL[Listeners] --> Pn[Pools] --> CPn[Control plane nodes in azN]
49+
end
50+
```
51+
52+
## API Changes (additive)
53+
54+
All changes are confined to the OpenStackCluster API and are backward compatible. Proposed changes in:
55+
- [api/v1beta1/openstackcluster_types.go](../../api/v1beta1/openstackcluster_types.go)
56+
- [api/v1beta1/types.go](../../api/v1beta1/types.go)
57+
58+
### Spec additions on `APIServerLoadBalancer`
59+
- `availabilityZoneSubnets []AZSubnetMapping` (required to enable multi-AZ)
60+
- Explicit mapping; each entry includes:
61+
- `availabilityZone string`
62+
- `subnet SubnetParam`
63+
- The LB network MUST be specified when using this mapping via `spec.apiServerLoadBalancer.network`. Each mapped subnet MUST belong to that network.
64+
- `allowCrossAZLoadBalancerMembers *bool`
65+
- Default `false`.
66+
- When `true`, register control plane nodes to all per-AZ LBs; otherwise same-AZ only.
67+
- `additionalPorts []int`
68+
- Optional extra listener ports besides the Kubernetes API port.
69+
- `allowedCIDRs []string`
70+
- Optional VIP ACL list when supported by the Octavia provider.
71+
72+
Notes:
73+
- The existing single-value `availabilityZone` field (if present) is treated as a legacy single-AZ shorthand; multi-AZ requires `availabilityZoneSubnets`.
74+
75+
### Status additions
76+
- `apiServerLoadBalancers []LoadBalancer`
77+
- A list-map keyed by `availabilityZone` (kubebuilder `listMapKey=availabilityZone`).
78+
- Each entry includes: `name`, `id`, `ip`, `internalIP`, `tags`, `availabilityZone`, `loadBalancerNetwork`, `allowedCIDRs`.
79+
80+
### Validation (CRD and controller)
81+
- No duplicate `availabilityZone` values in `availabilityZoneSubnets`.
82+
- Each `availabilityZoneSubnets.subnet` MUST resolve to a subnet that belongs to the specified LB network.
83+
- No duplicate subnets across mappings.
84+
- At least one mapping is required to enable multi-AZ; otherwise behavior is legacy single-LB.
85+
86+
CRD updates in:
87+
- [config/crd/bases/](../../config/crd/bases/)
88+
- [config/crd/patches/](../../config/crd/patches/)
89+
90+
## Controller Design
91+
92+
Changes span these components:
93+
- [controllers/openstackcluster_controller.go](../../controllers/openstackcluster_controller.go)
94+
- [pkg/cloud/services/loadbalancer/](../../pkg/cloud/services/loadbalancer/)
95+
- [pkg/cloud/services/networking/](../../pkg/cloud/services/networking/)
96+
97+
### VIP network and subnet resolution
98+
- When `spec.apiServerLoadBalancer.network` is specified with `availabilityZoneSubnets`:
99+
- Resolve each `SubnetParam` in order; validate that each belongs to the given LB network.
100+
- Derive the AZ list directly from the mapping entries.
101+
- Persist the LB network and the ordered subnets into `status.apiServerLoadBalancer.loadBalancerNetwork`.
102+
- Legacy single-AZ behavior (no mapping provided):
103+
- If an LB network is specified but no mapping is provided, treat as single-LB and select a subnet per legacy rules (unchanged).
104+
- If no LB network is specified, default to the cluster network's subnets (unchanged single-LB behavior).
105+
106+
Initialize or update `status.apiServerLoadBalancers` entries to carry the LB network reference.
107+
108+
### Per-AZ LoadBalancer reconciliation
109+
For each AZ in `availabilityZoneSubnets`:
110+
- Determine the VIP subnet from the mapping and create or adopt a LoadBalancer named:
111+
- k8s-clusterapi-cluster-${NAMESPACE}-${CLUSTER_NAME}-${AZ}-kubeapi
112+
- Set Octavia `AvailabilityZone` hint when supported by the provider.
113+
- Create or adopt listeners, pools, and monitors for the API port and any `additionalPorts`.
114+
- If floating IPs are not disabled, allocate and associate a floating IP to the LB VIP port when needed.
115+
- Update or insert the AZ entry in `status.apiServerLoadBalancers`, including `name`, `id`, `internalIP`, optional `ip`, `tags`, `allowedCIDRs`, and `loadBalancerNetwork`.
116+
117+
### Legacy adoption and migration
118+
- Discover legacy single-LB resources named:
119+
- `k8s-clusterapi-cluster-${NAMESPACE}-${CLUSTER_NAME}-kubeapi`
120+
- When multi-AZ is enabled (`availabilityZoneSubnets` provided), rename legacy resources to the AZ-specific name for the first configured AZ, or adopt correctly named resources if they already exist.
121+
- Emit clear events and warnings; ensure idempotent operation.
122+
123+
### Member registration behavior
124+
- Determine the machine failure domain (AZ) from the owning control plane machine.
125+
- Default behavior: register the node only with the LoadBalancer whose `availabilityZone` matches the node's AZ; if the legacy LB exists without an AZ, include it as a fallback.
126+
- When `allowCrossAZLoadBalancerMembers` is `true`: register the node with all per-AZ LBs.
127+
- Reconcile membership across the API port and any `additionalPorts`.
128+
129+
### Control plane endpoint
130+
- Preserve a user-provided DNS in `spec.controlPlaneEndpoint` when set and valid.
131+
- Otherwise choose:
132+
- The LB floating IP if present, else the VIP for an LB.
133+
- If no LB host is available and floating IPs are allowed, allocate or adopt a floating IP for the cluster endpoint when applicable.
134+
- If floating IPs are disabled and a fixed IP is provided, use it.
135+
- Operators are expected to configure DNS with one A/AAAA record per AZ LB IP for client-side failover. CAPO does not manage DNS.
136+
137+
### Events and metrics
138+
- Emit events for create/update/delete of LBs, listeners, pools, monitors, and floating IPs.
139+
- Emit warnings when provider features are unavailable or when validations fail.
140+
- Optional metrics (non-breaking) for per-AZ LB counts and reconciliation latency.
141+
142+
## Example configurations
143+
144+
### Explicit AZ→Subnet mapping (required for multi-AZ)
145+
```yaml
146+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
147+
kind: OpenStackCluster
148+
metadata:
149+
name: my-cluster
150+
namespace: default
151+
spec:
152+
apiServerLoadBalancer:
153+
enabled: true
154+
network:
155+
id: 6c90b532-7ba0-418a-a276-5ae55060b5b0
156+
availabilityZoneSubnets:
157+
- availabilityZone: az1
158+
subnet:
159+
id: cad5a91a-36de-4388-823b-b0cc82cadfdc
160+
- availabilityZone: az2
161+
subnet:
162+
id: e2407c18-c4e7-4d3d-befa-8eec5d8756f2
163+
allowCrossAZLoadBalancerMembers: false
164+
```
165+
166+
### Allow cross-AZ member registration
167+
```yaml
168+
spec:
169+
apiServerLoadBalancer:
170+
enabled: true
171+
network:
172+
id: 6c90b532-7ba0-418a-a276-5ae55060b5b0
173+
availabilityZoneSubnets:
174+
- availabilityZone: az1
175+
subnet:
176+
id: cad5a91a-36de-4388-823b-b0cc82cadfdc
177+
- availabilityZone: az2
178+
subnet:
179+
id: e2407c18-c4e7-4d3d-befa-8eec5d8756f2
180+
allowCrossAZLoadBalancerMembers: true
181+
```
182+
183+
## Backward compatibility and migration
184+
185+
- Default behavior remains single-LB when no multi-AZ mapping is provided.
186+
- Enabling multi-AZ:
187+
- Operators add `availabilityZoneSubnets` (and optionally `additionalPorts`, `allowedCIDRs`, `allowCrossAZLoadBalancerMembers`) and must specify the LB network.
188+
- Controller renames or adopts legacy resources into AZ-specific naming.
189+
- `status.apiServerLoadBalancers` is populated alongside legacy status until further cleanup.
190+
- Disabling multi-AZ:
191+
- Operators redirect all traffic to a single LB (e.g. update DNS records) and then remove the AZ mapping from the spec.
192+
- The controller automatically deletes per-AZ LBs whose mappings have been removed and reverts to single-LB behavior.
193+
194+
## Testing strategy
195+
196+
### Unit tests
197+
- Validation: duplicate AZs, duplicate subnets in mapping, wrong network-subnet associations.
198+
- LB reconciliation: AZ hint propagation, per-port resource creation and updates.
199+
- Migration/adoption: renaming legacy resources and adopting correctly-named resources.
200+
- Member registration: defaults and cross-AZ opt-in.
201+
- Allowed CIDRs: canonicalization and provider capability handling.
202+
203+
### E2E tests
204+
- Multi-AZ suite to verify per-AZ LBs exist with expected names and ports.
205+
- `status.apiServerLoadBalancers` contains per-AZ entries including LB network and IPs.
206+
- Control plane nodes register to same-AZ LB (or to all LBs when cross-AZ is enabled).
207+
- DNS records remain out of scope for e2e.
208+
209+
Test code locations:
210+
- [pkg/cloud/services/loadbalancer/](../../pkg/cloud/services/loadbalancer/)
211+
- [controllers/](../../controllers/)
212+
- [test/e2e/](../../test/e2e/)
213+
214+
## Risks and mitigations
215+
- Mapping/network mismatches: reject with clear validation messages; enforce via CRD CEL where feasible and in-controller checks.
216+
- Providers ignoring AZ hints: VIP subnet mapping still ensures deterministic placement; document expected variance.
217+
- Increased resource usage: multiple LBs per cluster increase quota consumption; highlight in docs and operations guidance.
218+
- DNS misconfiguration: documented as operator responsibility.
219+
220+
## Rollout plan
221+
1) API and CRD changes:
222+
- Add new fields and list-map keyed status to OpenStackCluster types in [api/v1beta1/](../../api/v1beta1/).
223+
- Update CRDs in [config/crd/bases/](../../config/crd/bases/) and patches in [config/crd/patches/](../../config/crd/patches/).
224+
2) Controller implementation:
225+
- VIP network/subnet resolution and explicit AZ mapping in [controllers/openstackcluster_controller.go](../../controllers/openstackcluster_controller.go).
226+
- Per-AZ LB reconciliation, rename/adoption, member selection, and optional floating IPs in [pkg/cloud/services/loadbalancer/](../../pkg/cloud/services/loadbalancer/).
227+
3) Documentation:
228+
- Update configuration guide and examples in [docs/book/src/clusteropenstack/configuration.md](../book/src/clusteropenstack/configuration.md).
229+
4) Testing:
230+
- Unit tests across controller and services; e2e suite updates in [test/e2e/](../../test/e2e/).
231+
5) Optional metrics:
232+
- Add observability for per-AZ LB counts and reconciliation timings (non-breaking).
233+
234+
## Open questions
235+
- Should we add a future explicit field to declare the endpoint strategy (single VIP vs external DNS multi-A)? Current design preserves user-provided DNS and documents multi-A.

0 commit comments

Comments
 (0)