Skip to content

📖proposal for multi-az apiserver loadbalancer#2660

Open
sebltm wants to merge 2 commits intokubernetes-sigs:mainfrom
sebltm:multi-az-lb-proposal
Open

📖proposal for multi-az apiserver loadbalancer#2660
sebltm wants to merge 2 commits intokubernetes-sigs:mainfrom
sebltm:multi-az-lb-proposal

Conversation

@sebltm
Copy link
Copy Markdown

@sebltm sebltm commented Aug 18, 2025

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 18, 2025
@netlify
Copy link
Copy Markdown

netlify bot commented Aug 18, 2025

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit b0ee6ad
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-cluster-api-openstack/deploys/698f01867481950008b54d33
😎 Deploy Preview https://deploy-preview-2660--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 18, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @sebltm. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lentzi90
Copy link
Copy Markdown
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 18, 2025
Copy link
Copy Markdown
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting proposal! It will need some polishing though.
I think all the links are broken, so that would perhaps be the first thing to fix.

Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md Outdated
Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md Outdated
Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md Outdated
Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md Outdated
Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md Outdated
Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md Outdated
- status.apiServerLoadBalancers is populated alongside legacy status until further cleanup.
- Disabling multi-AZ:
- Remove the mapping; controller maintains single-LB behavior.
- Per-AZ LBs are not automatically deleted; operators may clean up unused resources.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we not clean them up automatically?

Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md Outdated
Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md
Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md
@k8s-triage-robot
Copy link
Copy Markdown

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 17, 2025
@sebltm
Copy link
Copy Markdown
Author

sebltm commented Dec 29, 2025

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2025
Copy link
Copy Markdown
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good!
Could you squash the commits?

I would also be interested in other maintainers opinion on this.
/cc @mandre @stephenfin

Comment thread docs/proposals/20250818-multi-az-apiserver-loadbalancer.md Outdated
Add a design proposal for first-class Multi-AZ support for the
Kubernetes control plane LoadBalancer in CAPO. The feature reconciles
one Octavia LoadBalancer per Availability Zone via an explicit
AZ-to-Subnet mapping, registers control plane nodes with the LB in
their AZ by default, and relies on external DNS multi-value A records
for client-side failover.
@lentzi90
Copy link
Copy Markdown
Contributor

@nikParasyr have you seen this proposal? I would be interested in your thoughts, especially when considering v1beta2 changes we want to make (e.g. #2899).

Copy link
Copy Markdown
Contributor

@nikParasyr nikParasyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 3 main "concerns" here:

  1. The change to OSC.status will be a breaking change as we will need to go from status.apiServerLoadBalancer to status.apiServerLoadBalancer[]. We could introduce is at as a new field and deprecate the other but its something to keep in mind.
  2. There are some fields of the current api that are not covered by the proposal. For example .spec.apiServerFloatingIP. I suspect that being able to predefine the floating ip per LB/AZ would be nice as it would allow users to create the FIPs & records beforehand (or through a lifecycleHook) and let the cluster roll out nicely. This would affect the current v1beta2 proposed changes as we will probably have to opt for option 2 and might even have to change the managedLoadbalancer to a list.
  3. The request in #2999 has to be incorporated in this proposal i think. It doesnt feel usable to me (but might be missing something as openstack deployments vary) if we can create LBs per AZ on different subnets but cannot put the control-plane nodes on these subnets.

My other concern is that this will be tricky to test in e2e, but lets see

- Add observability for per-AZ LB counts and reconciliation timings (non-breaking).

## Open questions
- Should we add a future explicit field to declare the endpoint strategy (single VIP vs external DNS multi-A)? Current design preserves user-provided DNS and documents multi-A. No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the field, but there should be clear documentation on how to achieve the different strategies

@sebltm
Copy link
Copy Markdown
Author

sebltm commented Feb 13, 2026

Thanks for the review feedback. I took a pass at addressing some of the comments.
Main changes:

  1. operator-managed network/subnet responsibility is explicit
  2. endpoint strategy is now strict in multi-AZ (explicit DNS controlPlaneEndpoint required)
  3. status replacement/deprecation is now deferred to the v2 migrations
  4. FIP behavior and scope boundaries are explicit (existing v1beta1 behavior retained; per-AZ endpoint/FIP API redesign deferred to ✨ v1beta2 Group apiServer related fields on OpenStackCluster #2899)
  5. alignment with Support multiple subnets for control-plane nodes across Availability Zones (similar to CAPA) #2999 is documented as complementary (no same-subnet requirement, but explicit reachability prerequisites)

Let me know if that aligns with that you were thinking

Copy link
Copy Markdown
Contributor

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this now. Let's see what others think.
/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lentzi90

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2026
@lentzi90 lentzi90 requested a review from nikParasyr April 15, 2026 07:38
@lentzi90
Copy link
Copy Markdown
Contributor

/cc @bnallapeta

@k8s-ci-robot k8s-ci-robot requested a review from bnallapeta April 15, 2026 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Inbox

Development

Successfully merging this pull request may close these issues.

5 participants