Skip to content

docs(dr): fix Talos upgrade runbook — version pin drives upgrades, not the ISO#2178

Merged
devantler merged 1 commit into
mainfrom
claude/upbeat-moore-e4d259
Jun 19, 2026
Merged

docs(dr): fix Talos upgrade runbook — version pin drives upgrades, not the ISO#2178
devantler merged 1 commit into
mainfrom
claude/upbeat-moore-e4d259

Conversation

@devantler

Copy link
Copy Markdown
Contributor

Problem

The DR runbook's Scenario 2 — Planned rolling Talos / Kubernetes upgrade told operators to "bump the Talos ISO ID in ksail.prod.yaml … ksail cordons and replaces nodes one at a time."

That is not how the deployed KSail upgrades existing nodes. A change to spec.cluster.talos.iso is classified in-place and only affects newly provisioned nodes (autoscaler scale-ups, full rebuilds) — bumping it never rolls the running control planes or workers. Following the runbook as written would leave a planned upgrade doing nothing to the existing cluster.

Fix

Rewrite Scenario 2 to reflect the real mechanism:

  • Talos OS / Kubernetes upgrades are driven by the version pins (spec.cluster.talos.version + the matching machine.install.image installer tag, and spec.cluster.kubernetesVersion), which ksail cluster update applies as an in-place rolling upgrade — one node at a time, workers first, rebooting each into the new installer image.
  • Clarify that the iso field is for new-node provisioning only, not a lever for upgrading existing nodes.
  • Add a recovery note for an interrupted upgrade leaving a mixed-version cluster — the failure mode behind devantler-tech/ksail#5359 — including the one-off talosctl upgrade per stuck node.

Docs-only; no manifests touched.

🤖 Generated with Claude Code

…t the ISO

Scenario 2 said to "bump the Talos ISO ID ... ksail cordons and replaces nodes
one at a time", but in the deployed KSail an `iso` change is classified in-place
and only affects newly provisioned nodes — bumping it never rolls the existing
ones. Talos OS / Kubernetes upgrades are driven by the version pins
(`spec.cluster.talos.version` + the matching installer image, and
`spec.cluster.kubernetesVersion`), which `ksail cluster update` applies as an
in-place rolling upgrade.

Correct the procedure, clarify the ISO's actual role (new-node provisioning
only), and add a recovery note for an interrupted upgrade leaving a
mixed-version cluster — the failure mode behind devantler-tech/ksail#5359.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@devantler devantler marked this pull request as ready for review June 19, 2026 21:51
@devantler devantler added this pull request to the merge queue Jun 19, 2026
Merged via the queue into main with commit 36eba9f Jun 19, 2026
10 checks passed
@devantler devantler deleted the claude/upbeat-moore-e4d259 branch June 19, 2026 21:52
@github-project-automation github-project-automation Bot moved this from 🫴 Ready to ✅ Done in 🌊 Project Board Jun 19, 2026
@botantler

botantler Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 1.69.8 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@botantler botantler Bot added the released label Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

1 participant