feat(backups): surface repo maintenance + alert on failed runs#285
Merged
Conversation
Adds a "Repo maintenance" panel to the group backup page: an at-a-glance health summary (last successful maintenance / failed / running) plus a table of recent kopia maintenance cycles, mirroring the recent-runs panel. recent_maintenance was already returned by the stats endpoint but never rendered. Adds a group-level backup-maintenance-error incident (Error, paging) raised when the most recently *finished* maintenance run failed, cleared by a later successful run. This complements the existing backup-maintenance-stale absence-of-success check, which catches a different failure mode (maintenance not running at all). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
getByText("success"/"running") also matched the "Last successful
maintenance" caption / the "Running" summary chip. Use exact matches.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Snapshot expiry (kopia snapshot expire --delete) runs on every maintenance cycle, not just full; full additionally reclaims the freed space. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 The group backup page tracked maintenance runs in the DB and returned them from the
statsendpoint asrecent_maintenance, but never rendered them — and nothing alerted when a maintenance run failed (only the absence-of-success staleness check existed).This adds both halves.
UI — "Repo maintenance" panel
A new panel on the group backup page with:
Healthy/Last run failed/Running, plus when maintenance last completed successfully (or that it never has).This directly answers "has maintenance/expiry run, and did it succeed?", which previously had no UI indication at all.
Alerting —
backup-maintenance-errorA new group-scoped issue raised when the most recently finished maintenance run failed, cleared by a later successful run.
Errorseverity, so it opens an incident and pages — consistent with the existingbackup-maintenance-stalecheck. In-flight runs (no outcome yet) are ignored, so a started-but-unfinished run never clears an open failure.This complements
backup-maintenance-stale(no successful maintenance for 8 days), which catches a different failure mode: maintenance not running at all vs. running but erroring every time.Per the discussion, the existing 8-day staleness threshold is left untouched.
Linear: TAM-6877