Skip to content

ci(fleet): guard example-suite tooling pins against drifting below the feature floor#427

Merged
joshua-temple merged 1 commit into
mainfrom
ci/tooling-pin-floor-guard
Jul 1, 2026
Merged

ci(fleet): guard example-suite tooling pins against drifting below the feature floor#427
joshua-temple merged 1 commit into
mainfrom
ci/tooling-pin-floor-guard

Conversation

@joshua-temple

Copy link
Copy Markdown
Collaborator

What

Closes #406. Adds a floor-consistency check so an example repo's scenario-suite setup-cli tooling pin can never silently drift below the current feature floor (the class that caused the 4env v0.5.1 and branch-protection v0.2.1 'unknown command' fleet failures).

Floor + mechanism

Floor = latest stable cascade release. The fleet repin rewrites each repo's manifest cli_version to the rc under test but never touches the suite's own stable setup-cli bootstrap pin; that pin is what drifts. Latest-stable is the right floor: every released command is present, and during an rc run the latest stable is the prior release (no chicken-and-egg with the unreleased rc), so at/above-floor suites pass. A moving ref (@main) is skipped; only strictly-below fails, so no false positives.

  • .github/scripts/check-suite-tooling-floor.sh: reads each roster repo's suite via gh api, extracts semver pins from setup-cli@ and version:, compares with sort -V.
  • .github/workflows/suite-tooling-floor.yaml: daily schedule + dispatch (catches drift off the release cadence).
  • fleet-e2e.yaml: a new floor-check job gates repin and is wired into the aggregate gate, so a stale pin reds an rc run before any live dispatch, with a clear message instead of a cryptic mid-suite failure.

Current suites brought to floor (already live, bot-DCO)

All bumped to the current latest stable: 2env, artifact-a, 3env, single-env, release-only, no-env (from v0.2.1) and 4env (from v0.5.1). Already-current: primary, rollback-dispatch. No pin (@main, not flagged): artifact-b, callbacks. Each replacement targeted the exact stale value; scenario data literals untouched (0 collateral).

Fail-on-stale demonstration

FLOOR at current latest before bumps: FAIL exit 1, listed all 7 stale repos. FLOOR below all: PASS. After bumps: PASS, entire roster at/above floor, so the gate starts green and does not destabilize the fleet.

Verification

shellcheck + actionlint clean (fleet-e2e.yaml + suite-tooling-floor.yaml); no Go changed; injection-safe (floor passed via env); reuses CASCADE_STATE_TOKEN, no new secret.

Each cascade-example repo bootstraps a cascade CLI in its scenario suite,
pinned by hand to a fixed release. Nothing kept that pin moving forward, so a
suite could sit on a release that predates a command it now invokes and fail a
live fleet lane with a cryptic unknown-command error mid-fan-out.

Add a shared check that fails when a suite's setup-cli pin is below the latest
stable cascade release (the feature floor). Run it two ways: a daily Suite
Tooling Floor workflow that surfaces drift off the release cadence, and a
floor-check gate in fleet-e2e that reds an rc run before any repin or dispatch.
A pin at or above the floor passes; a suite tracking a moving ref is not
flagged.

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
@joshua-temple joshua-temple merged commit 7a1ea94 into main Jul 1, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prevent example-suite tooling pins drifting behind the cli_version_sha feature floor

1 participant