Skip to content

vault-snapshot CronJob takes no snapshot β€” add real raft snapshots after the HA cutoverΒ #1970

@devantler

Description

@devantler

πŸ€– Generated by the Daily AI Assistant

Gap

k8s/bases/infrastructure/vault-backup/cronjob.yaml (CronJob vault-snapshot, nightly 03:30) does not take a snapshot β€” it authenticates, runs bao status, and exits (a seal-state health check). But:

  • docs/dr/openbao-raft-ha-migration.md step 1 says "rely on the existing vault-backup CronJob's latest snapshot" β€” output that doesn't exist.
  • The OpenBao data PVCs sit on hcloud block storage, which has no CSI snapshot support, so Velero falls back to file-system backup of a live, open raft database β€” crash-consistent at best, not a supported OpenBao restore path.
  • Rotated DB credentials and tenant-pushed app secrets are not reproducible from vault-seed (the runbook itself notes this), so a real snapshot is the only clean restore for them.

Proposal (post raft cutover, #1907)

Once the 3-node Raft cutover completes, bao operator raft snapshot save is available (consistent, online, supported restore via raft snapshot restore). Rework the CronJob to:

  1. bao operator raft snapshot save /backup/openbao-$(date).snap against openbao-active (needs a token/policy with sys/storage/raft/snapshot read β€” the existing vault-snapshot k8s-auth role can be extended in vault-config),
  2. persist it where Velero's nightly run picks it up (small PVC with N-day retention), or push directly to R2,
  3. keep the seal-state assertion as a side check (PR fix(openbao): route clients to the active service so sealed standbys stop 503ingΒ #1964 already points it at openbao-active so it stops failing spuriously on sealed standbys).

Sequencing: blocked on the raft cutover completing (openbao-0 currently still runs the legacy file backend, where no raft snapshot API exists).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions