Skip to content

fix(ci): pin ksail to v7.65.0 in system-test to avoid the 7.66.0 validate race#2184

Open
devantler wants to merge 1 commit into
mainfrom
claude/fix-ci-ksail-validate-race
Open

fix(ci): pin ksail to v7.65.0 in system-test to avoid the 7.66.0 validate race#2184
devantler wants to merge 1 commit into
mainfrom
claude/fix-ci-ksail-validate-race

Conversation

@devantler

Copy link
Copy Markdown
Contributor

🤖 Generated by Claude Code (unblocking #2180)

Problem

The System Test (ksail workload validate) fails intermittently on k8s PRs — it blocked #2180. The failure is a YAML parse error in the rendered manifests, but the location and error type vary every run:

Run Result
CI (#2180) line 73 / 91: mapping values are not allowed in this context
local 7.66.0 run 1 line 225 / 245 / 306: could not find expected ':' / did not find expected key
local 7.66.0 runs failed 2, 2, then 1 kustomization(s) — count varies

Root cause

The ksail-cluster action on this step is pinned to v7.65.0, but it installs the ksail binary with its default ksail-version: latest — currently 7.66.0. ksail 7.66.0 has a non-deterministic race in workload validate's in-process parallel Helm rendering: it corrupts the rendered manifest stream at a random point, so validation fails on a random kustomization with a random YAML error each run. The varying line numbers/error types are the signature of the race (a real manifest defect fails at a fixed location).

Verified locally with the exact CI version:

  • ksail 7.65.0✔ 97 kustomizations validated deterministically (both ksail.yaml and ksail.prod.yaml, repeated).
  • ksail 7.66.0 → reproduces the failure; kubectl kustomize of the same overlays passes (so it's the helm-render path, not the manifests).

The affected overlays (providers/{docker,hetzner}/infrastructure/controllers) aren't even touched by #2180 — confirming it's the tool, not the change.

Fix

Pin the installed binary to ksail-version: "7.65.0", matching the action ref already pinned on this step. Deterministic, and reversible — drop the pin once a ksail release fixes the race.

Follow-up (separate, in the ksail repo)

The underlying bug is in ksail itself (devantler-tech/ksail): workload validate should serialize or isolate per-kustomization Helm rendering so concurrent renders can't corrupt each other's output. Worth a dedicated issue/fix there; this PR is just the platform-side unblock.

🤖 Generated with Claude Code

The system-test's ksail-cluster action installs the ksail binary with its
default `ksail-version: latest`, which currently resolves to 7.66.0. That
release has a non-deterministic race in `ksail workload validate`'s
in-process parallel Helm rendering: it corrupts the rendered manifest
stream at a random point and fails validation with varying YAML parse
errors ("mapping values are not allowed in this context", "could not find
expected ':'", "did not find expected key") on a different kustomization
(and a different line) each run. This intermittently fails the System
Test on k8s PRs (observed blocking #2180; reproduced locally — 7.66.0
fails, 7.65.0 passes 97/97 across ksail.yaml and ksail.prod.yaml).

Pin the installed binary to v7.65.0, matching the action ref already
pinned on this step. Reversible: drop the pin once a ksail release fixes
the race.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@devantler

Copy link
Copy Markdown
Contributor Author

Upstream bug filed: devantler-tech/ksail#5362 — drop this pin once a ksail release ships the fix.

@devantler devantler added this pull request to the merge queue Jun 20, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🫴 Ready

Development

Successfully merging this pull request may close these issues.

1 participant