From f2a9202d743f84054ee47a52de1b1d91c439241a Mon Sep 17 00:00:00 2001 From: James Peru Date: Mon, 27 Apr 2026 18:46:22 +0300 Subject: [PATCH 01/13] docs: add RFC for incremental NAS backup support (KVM) Adds the design document for incremental NAS backups using QEMU dirty bitmaps and libvirt's backup-begin API. Reduces daily backup storage 80-95% for large VMs. Refs: apache/cloudstack#12899 --- docs/rfcs/incremental-nas-backup.md | 406 ++++++++++++++++++++++++++++ 1 file changed, 406 insertions(+) create mode 100644 docs/rfcs/incremental-nas-backup.md diff --git a/docs/rfcs/incremental-nas-backup.md b/docs/rfcs/incremental-nas-backup.md new file mode 100644 index 000000000000..aa2e55ff55bf --- /dev/null +++ b/docs/rfcs/incremental-nas-backup.md @@ -0,0 +1,406 @@ +# RFC: Incremental NAS Backup Support for KVM Hypervisor + +| Field | Value | +|---------------|--------------------------------------------| +| **Author** | James Peru, Xcobean Systems Limited | +| **Status** | Draft | +| **Created** | 2026-03-27 | +| **Target** | Apache CloudStack 4.23+ | +| **Component** | Backup & Recovery (NAS Backup Provider) | + +--- + +## Summary + +This RFC proposes adding incremental backup support to CloudStack's NAS backup provider for KVM hypervisors. By leveraging QEMU dirty bitmaps and libvirt's `backup-begin` API, CloudStack can track changed disk blocks between backups and export only the delta, reducing daily backup storage consumption by 80--95% and shortening backup windows from hours to minutes for large VMs. The feature is opt-in at the zone level, backward-compatible with existing full-backup behavior, and gracefully degrades on older QEMU/libvirt versions. + +--- + +## Motivation + +CloudStack's NAS backup provider currently performs a full disk copy every time a backup is taken. For a 500 GB VM with five daily backups retained, that amounts to 2.5 TB of storage consumed. At scale -- tens or hundreds of VMs -- this becomes a serious operational and financial burden. + +**Problems with the current approach:** + +1. **Storage waste.** Every backup is a full copy of the entire virtual disk, regardless of how little data actually changed since the last backup. +2. **Long backup windows.** Copying hundreds of gigabytes over NFS or SMB takes hours, increasing the risk of I/O contention on production workloads. +3. **Network bandwidth pressure.** Full-disk transfers saturate the storage network during backup windows, impacting other VMs on the same host. +4. **Uncompetitive feature set.** VMware (Changed Block Tracking / VADP), Proxmox Backup Server, and Veeam all support incremental backups natively. CloudStack's lack of incremental backup is a common complaint on the users@ mailing list and a blocker for adoption in environments with large VMs. + +**What incremental backup achieves:** + +- Only changed blocks are transferred and stored after the initial full backup. +- A typical daily incremental for a 500 GB VM with moderate write activity is 5--15 GB, a reduction of 97--99% compared to a full copy. +- Backup completes in minutes rather than hours. +- Retention of 30+ daily restore points becomes economically feasible. + +--- + +## Proposed Design + +### Backup Chain Model + +Incremental backups form a chain anchored by a periodic full backup: + +``` +Full (Day 0) -> Inc 1 (Day 1) -> Inc 2 (Day 2) -> ... -> Inc 6 (Day 6) -> Full (Day 7) -> ... +``` + +Restoring to any point in time requires the full backup plus every incremental up to the desired restore point. To bound restore complexity and protect against chain corruption, a new full backup is forced at a configurable interval. + +**Global settings (zone scope):** + +| Setting | Type | Default | Description | +|--------------------------------------|---------|---------|------------------------------------------------------| +| `nas.backup.incremental.enabled` | Boolean | `false` | Enable incremental backup for the zone | +| `nas.backup.full.interval` | Integer | `7` | Days between full backups | +| `nas.backup.incremental.max.chain` | Integer | `6` | Max incremental backups before forcing a new full | + +When `nas.backup.incremental.enabled` is `false` (the default), behavior is identical to today -- every backup is a full copy. Existing deployments are unaffected. + +--- + +### Technical Approach + +#### 1. Dirty Bitmap Tracking (QEMU Layer) + +QEMU 4.0 introduced persistent dirty bitmaps: per-disk bitmaps that record which blocks have been written since the bitmap was created. These bitmaps survive QEMU restarts (they are stored in the qcow2 image header) and are the foundation for incremental backup. + +**Lifecycle:** + +1. When incremental backup is enabled for a VM, the agent creates a persistent dirty bitmap on each virtual disk via QMP: + ```json + { + "execute": "block-dirty-bitmap-add", + "arguments": { + "node": "drive-virtio-disk0", + "name": "backup-20260327", + "persistent": true + } + } + ``` +2. QEMU automatically sets bits in this bitmap whenever the guest writes to a block. +3. At backup time, the bitmap tells the backup process exactly which blocks to read. +4. After a successful backup, a new bitmap is created for the next cycle and the old bitmap is optionally removed. + +#### 2. Backup Flow + +**Full backup (Day 0 or every `nas.backup.full.interval` days):** + +```bash +# 1. Export the entire disk to the NAS mount +qemu-img convert -f qcow2 -O qcow2 \ + /var/lib/libvirt/images/vm-disk.qcow2 \ + /mnt/nas/backups/vm-uuid/backup-full-20260327.qcow2 + +# 2. Create a new dirty bitmap to track changes from this point +virsh qemu-monitor-command $DOMAIN --hmp \ + 'block-dirty-bitmap-add drive-virtio-disk0 backup-20260327 persistent=true' +``` + +**Incremental backup (Day 1 through Day N):** + +```bash +# 1. Use libvirt backup-begin with incremental mode +# This exports only blocks dirty since bitmap "backup-20260327" +cat > /tmp/backup.xml <<'XML' + + + + + + + + backup-20260327 + +XML + +virsh backup-begin $DOMAIN /tmp/backup.xml + +# 2. Wait for completion +virsh domjobinfo $DOMAIN --completed + +# 3. Rotate bitmaps: remove old, create new +virsh qemu-monitor-command $DOMAIN --hmp \ + 'block-dirty-bitmap-remove drive-virtio-disk0 backup-20260327' +virsh qemu-monitor-command $DOMAIN --hmp \ + 'block-dirty-bitmap-add drive-virtio-disk0 backup-20260328 persistent=true' +``` + +**New full backup cycle (Day 7):** + +```bash +# Remove all existing bitmaps +virsh qemu-monitor-command $DOMAIN --hmp \ + 'block-dirty-bitmap-remove drive-virtio-disk0 backup-20260327' + +# Take a full backup (same as Day 0) +# Optionally prune expired chains from NAS +``` + +#### 3. Restore Flow + +Restoring from an incremental chain requires replaying the full backup plus all incrementals up to the target restore point. This is handled entirely within `nasbackup.sh` and is transparent to the management server and the end user. + +**Example: Restore to Day 3 (full + 3 incrementals):** + +```bash +# 1. Create a working copy from the full backup +cp /mnt/nas/backups/vm-uuid/backup-full-20260327.qcow2 /tmp/restored.qcow2 + +# 2. Apply each incremental in order using qemu-img rebase +# Each incremental is a thin qcow2 containing only changed blocks. +# Rebasing merges the incremental's blocks into the chain. +qemu-img rebase -u -b /tmp/restored.qcow2 \ + /mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2 + +qemu-img rebase -u -b /mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2 \ + /mnt/nas/backups/vm-uuid/backup-inc-20260329.qcow2 + +qemu-img rebase -u -b /mnt/nas/backups/vm-uuid/backup-inc-20260329.qcow2 \ + /mnt/nas/backups/vm-uuid/backup-inc-20260330.qcow2 + +# 3. Flatten the chain into a single image +qemu-img convert -f qcow2 -O qcow2 \ + /mnt/nas/backups/vm-uuid/backup-inc-20260330.qcow2 \ + /tmp/vm-restored-final.qcow2 + +# 4. Return the flattened image for CloudStack to import +``` + +An alternative approach uses `qemu-img commit` to merge each layer down. The implementation will benchmark both methods and choose the faster one for large images. + +#### 4. Database Schema Changes + +**Modified table: `backups`** + +| Column | Type | Description | +|--------------------|--------------|------------------------------------------------| +| `backup_type` | VARCHAR(16) | `FULL` or `INCREMENTAL` | +| `parent_backup_id` | BIGINT (FK) | For incremental: ID of the previous backup | +| `bitmap_name` | VARCHAR(128) | QEMU dirty bitmap identifier for this backup | +| `chain_id` | BIGINT (FK) | Links to the backup chain this backup belongs to | + +**New table: `backup_chains`** + +| Column | Type | Description | +|-------------------|-------------|------------------------------------------------| +| `id` | BIGINT (PK) | Auto-increment primary key | +| `vm_instance_id` | BIGINT (FK) | The VM this chain belongs to | +| `full_backup_id` | BIGINT (FK) | The full backup anchoring this chain | +| `state` | VARCHAR(16) | `ACTIVE`, `SEALED`, `EXPIRED` | +| `created` | DATETIME | When the chain was started | + +**Schema migration** will be provided as a Liquibase changeset, consistent with CloudStack's existing migration framework. The new columns are nullable to maintain backward compatibility with existing backup records. + +#### 5. Management Server Changes + +**`BackupManagerImpl` (orchestration):** + +- Before taking a backup, query the active chain for the VM. +- If no active chain exists, or the chain has reached `nas.backup.incremental.max.chain` incrementals, or `nas.backup.full.interval` days have elapsed since the last full backup: schedule a full backup and start a new chain. +- Otherwise: schedule an incremental backup linked to the previous backup in the chain. +- On backup failure: if the bitmap is suspected corrupt, mark the chain as `SEALED` and force a full backup on the next run. + +**`NASBackupProvider.takeBackup()`:** + +- Accept a new parameter `BackupType` (FULL or INCREMENTAL). +- For incremental: pass the parent backup's bitmap name and NAS path to the agent command. + +**`TakeBackupCommand` / `TakeBackupAnswer`:** + +- Add fields: `backupType` (FULL/INCREMENTAL), `parentBackupId`, `bitmapName`, `parentBackupPath`. +- The answer includes the actual size of the backup (important for incrementals, which are much smaller than the disk size). + +**`RestoreBackupCommand`:** + +- Add field: `backupChain` (ordered list of backup paths from full through the target incremental). +- The agent reconstructs the full image from the chain before importing. + +#### 6. KVM Agent Changes + +**`LibvirtTakeBackupCommandWrapper`:** + +- For `FULL` backups: existing behavior (qemu-img convert), plus create the initial dirty bitmap. +- For `INCREMENTAL` backups: use `virsh backup-begin` with `` XML, then rotate bitmaps. +- Pre-flight check: verify QEMU version >= 4.0 and libvirt version >= 6.0. If not met, fall back to full backup and log a warning. + +**`nasbackup.sh` enhancements:** + +- New flag `-i` for incremental mode. +- New flag `-p ` to specify the parent backup on the NAS. +- New flag `-b ` to specify which dirty bitmap to use. +- New subcommand `restore-chain` that accepts an ordered list of backup paths and produces a flattened image. + +**`LibvirtRestoreBackupCommandWrapper`:** + +- If the restore target is an incremental backup, request the full chain from the management server and pass it to `nasbackup.sh restore-chain`. + +#### 7. API Changes + +**Existing API: `createBackup`** + +No change to the API signature. The management server automatically decides full vs. incremental based on the zone configuration and the current chain state. Callers do not need to specify the backup type. + +**Existing API: `listBackups`** + +Response gains two new fields: +- `backuptype` (string): `Full` or `Incremental` +- `parentbackupid` (string): UUID of the parent backup (null for full backups) + +**Existing API: `restoreBackup`** + +No change. The management server resolves the full chain internally. + +#### 8. UI Changes + +- **Backup list view:** Add a "Type" column showing `Full` or `Incremental`, with a visual indicator (e.g., a small chain icon for incrementals). +- **Backup detail view:** Show the backup chain as a vertical timeline: full backup at the top, incrementals branching down, with sizes and timestamps. +- **Restore dialog:** When the user selects an incremental restore point, display a note: "This restore will replay N backups (total chain size: X GB)." +- **Backup schedule settings** (zone-level): Toggle for incremental backup, full backup interval slider, max chain length input. + +--- + +### Storage Savings Projections + +The following estimates assume a moderate write workload (2--5% of disk blocks changed per day), which is typical for application servers, databases with WAL, and file servers. + +| Scenario | Full Backups Only | With Incremental | Savings | +|-----------------------------------|-------------------|--------------------|------------| +| 500 GB VM, 7 daily backups | 3.5 TB | ~550 GB | **84%** | +| 1 TB VM, 30 daily backups | 30 TB | ~1.3 TB | **96%** | +| 100 VMs x 100 GB, weekly cycle | 70 TB/week | ~12 TB/week | **83%** | +| 50 VMs x 200 GB, 30-day retain | 300 TB | ~18 TB | **94%** | + +For environments with higher change rates (e.g., heavy database writes), incremental sizes will be larger, but savings still typically exceed 60%. + +--- + +### Requirements + +| Requirement | Minimum Version | Notes | +|-------------------------|----------------|-------------------------------------------------| +| QEMU | 4.0+ | Dirty bitmap support. Ubuntu 20.04+, RHEL 8+. | +| libvirt | 6.0+ | `virsh backup-begin` support. Ubuntu 22.04+, RHEL 8.3+. | +| CloudStack | 4.19+ | NAS backup provider must already be present. | +| NAS storage | NFS or SMB | No special requirements beyond existing NAS backup support. | + +**Graceful degradation:** If a KVM host runs QEMU < 4.0 or libvirt < 6.0, the agent will detect this at startup and report `incrementalBackupCapable=false` to the management server. Backups for VMs on that host will remain full-only, with a warning logged. No manual intervention is required. + +--- + +### Risks and Mitigations + +| Risk | Impact | Mitigation | +|------|--------|------------| +| **Bitmap corruption** (host crash during backup, QEMU bug) | Incremental backup produces an incomplete or incorrect image | Detect bitmap inconsistency via QMP query; force a new full backup and start a fresh chain. Data in previous full backup is unaffected. | +| **Chain too long** (missed full backup schedule) | Restore time increases; single corrupt link breaks the chain | Enforce `nas.backup.incremental.max.chain` hard limit. If exceeded, the next backup is automatically a full. | +| **Restore complexity** | User confusion about which backup to pick; longer restore for deep chains | Restore logic is fully automated in `nasbackup.sh`. The UI shows a single "Restore" button per restore point, with the chain replayed transparently. | +| **VM live migration during backup** | Dirty bitmap may be lost if migrated mid-backup | Check VM state before backup; abort and retry if migration is in progress. Bitmaps persist across clean shutdowns and restarts but not across live migration in older QEMU versions. For QEMU 6.2+, bitmaps survive migration. | +| **Backward compatibility** | Existing full-backup users should not be affected | Feature is disabled by default. No schema changes affect existing rows (new columns are nullable). Full-backup code path is unchanged. | +| **Disk space during restore** | Flattening a chain requires temporary disk space equal to the full disk size | Use the same scratch space already used for full backup restores. Document the requirement. | + +--- + +### Implementation Plan + +| Phase | Scope | Estimated Effort | +|-------|-------|------------------| +| **Phase 1** | Core incremental backup and restore in `nasbackup.sh` and KVM agent wrappers. Dirty bitmap lifecycle management. Manual testing with `virsh` and `qemu-img`. | 2--3 weeks | +| **Phase 2** | Management server changes: chain management, scheduling logic, global settings, database schema migration, API response changes. | 2 weeks | +| **Phase 3** | UI changes: backup type column, chain visualization, restore dialog enhancements, zone-level settings. | 1 week | +| **Phase 4** | Integration testing (full cycle: enable, backup, restore, disable, upgrade from older version). Edge case testing (host crash, bitmap loss, migration, mixed QEMU versions). Documentation. | 2 weeks | + +**Total estimated effort: 7--8 weeks.** + +We (Xcobean Systems) intend to implement this and submit PRs against the `main` branch. We would appreciate early design feedback before starting implementation to avoid rework. + +--- + +### Prior Art + +- **VMware VADP / Changed Block Tracking (CBT):** VMware's CBT is the industry-standard approach. A change tracking driver inside the hypervisor records changed blocks, and backup vendors query the CBT via the vSphere API. This RFC's approach is analogous, using QEMU dirty bitmaps as the CBT equivalent. + +- **Proxmox Backup Server (PBS):** PBS uses QEMU dirty bitmaps to implement incremental backups natively. Their implementation validates that the dirty bitmap approach is production-ready for KVM/QEMU environments. PBS has been stable since Proxmox VE 6.4 (2020). + +- **Veeam Backup & Replication:** Veeam uses a "reverse incremental" model where the most recent backup is always a synthetic full, and older backups are stored as reverse deltas. This simplifies restore (always restore from the latest full) at the cost of more I/O during backup. We chose the forward-incremental model for simplicity and because it aligns with how QEMU dirty bitmaps work natively. + +- **libvirt backup API:** The `virsh backup-begin` command and its underlying `virDomainBackupBegin()` API were specifically designed for this use case. The libvirt documentation includes examples of incremental backup using dirty bitmaps. See: https://libvirt.org/kbase/incremental-backup.html + +--- + +### About the Author + +Xcobean Systems Limited operates a production Apache CloudStack deployment providing IaaS to 50+ client VMs. We use the NAS backup provider daily and have contributed several improvements to it: + +- PR [#12805](https://github.com/apache/cloudstack/pull/12805) -- NAS backup NPE fix +- PR [#12822](https://github.com/apache/cloudstack/pull/12822) -- Backup restore improvements +- PR [#12826](https://github.com/apache/cloudstack/pull/12826) -- NAS backup script hardening +- PRs [#12843](https://github.com/apache/cloudstack/pull/12843)--[#12848](https://github.com/apache/cloudstack/pull/12848) -- Various NAS backup fixes +- PR [#12872](https://github.com/apache/cloudstack/pull/12872) -- Additional backup provider fixes + +We experience the storage and bandwidth cost of full-only backups firsthand and are motivated to solve this problem upstream rather than maintaining a fork. + +--- + +## Open Questions for Discussion + +We welcome feedback from the community on the following: + +1. **Interest level.** Is there sufficient demand for this feature to justify the implementation effort? We believe so based on mailing list threads, but would like confirmation. + +2. **Dirty bitmaps vs. alternatives.** Are there concerns about relying on QEMU dirty bitmaps? Alternative approaches include file-level deduplication on the NAS (less efficient, not hypervisor-aware) or `qemu-img compare` (slower, requires reading both images). + +3. **Target release.** Should this target CloudStack 4.23, or is a later release more appropriate given the scope? + +4. **Chain model.** We proposed forward-incremental with periodic full backups. Would the community prefer a different model (e.g., reverse-incremental like Veeam, or forever-incremental with periodic synthetic fulls)? + +5. **Scope of first PR.** Should we submit the entire feature as one PR, or break it into smaller PRs (e.g., nasbackup.sh changes first, then agent, then management server, then UI)? + +6. **Testing infrastructure.** We can test against our production environment (Ubuntu 22.04, QEMU 6.2, libvirt 8.0). Are there CI environments or community test labs available for broader testing (RHEL, Rocky, older QEMU versions)? + +--- + +*This RFC is posted as a GitHub Discussion to gather community feedback before implementation begins. Please share your thoughts, concerns, and suggestions.* + +--- + +## Appendix: Related Proposal — CloudStack Infrastructure Backup to NAS + +### Problem +CloudStack's NAS backup provider only backs up VM disks. The management server database, agent configurations, SSL certificates, and global settings are not backed up. If the management server fails, all metadata is lost unless someone manually configured mysqldump. + +### Proposed Solution +Add a new scheduled task that automatically backs up CloudStack infrastructure to the same NAS backup storage. + +**What gets backed up:** +| Component | Method | Size | +|-----------|--------|------| +| CloudStack database (`cloud`, `cloud_usage`) | mysqldump | ~50-500MB | +| Management server config (`/etc/cloudstack/management/`) | tar | <1MB | +| Agent configs (`/etc/cloudstack/agent/`) | tar | <1MB | +| SSL certificates and keystores | tar | <1MB | +| Global settings export | SQL dump | <1MB | + +**Configuration:** +- `nas.infra.backup.enabled` (global, default: false) +- `nas.infra.backup.schedule` (cron expression, default: `0 2 * * *` — daily at 2am) +- `nas.infra.backup.retention` (number of backups to keep, default: 7) + +**Implementation:** +- New class: `InfrastructureBackupTask` extending `ManagedContextRunnable` +- Runs on management server (not KVM agent) +- Uses existing NAS mount point from backup storage pool +- Creates timestamped directory: `infra-backup/2026-03-27/` +- Runs `mysqldump --single-transaction` for hot backup +- Tars config directories +- Manages retention (delete backups older than N days) +- Logs to CloudStack events for audit trail + +**Restore:** +- Manual via CLI: `mysql cloud < backup.sql` + extract config tars +- Future: one-click restore from UI + +This is a much simpler change (~200 lines of Java) that addresses a real operational gap. Could target 4.22.1 or 4.23. + From 1981469099f3ea8bd08108d4ec94068dabb1505d Mon Sep 17 00:00:00 2001 From: James Peru Date: Mon, 27 Apr 2026 18:49:38 +0300 Subject: [PATCH 02/13] feat(backup): add chain-metadata keys + nas.backup.full.every config NASBackupChainKeys defines the keys this provider stores under the existing backup_details kv table (parent_backup_id, bitmap_name, chain_id, chain_position, type). This keeps the backups table provider-agnostic per the RFC review. nas.backup.full.every is a zone-scoped ConfigKey that controls how often a full backup is taken; the remaining backups in the cycle are incremental. Counts backups (not days), so it works for hourly, daily, and ad-hoc schedules. Default 10. Set to 1 to disable incrementals (every backup is full). Refs: apache/cloudstack#12899 --- .../cloudstack/backup/NASBackupChainKeys.java | 47 +++++++++++++++++++ .../cloudstack/backup/NASBackupProvider.java | 13 ++++- 2 files changed, 59 insertions(+), 1 deletion(-) create mode 100644 plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java new file mode 100644 index 000000000000..542844e19bf2 --- /dev/null +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java @@ -0,0 +1,47 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +package org.apache.cloudstack.backup; + +/** + * Keys used by the NAS backup provider when storing incremental-chain metadata + * in the existing {@code backup_details} key/value table. Stored here (not on + * the {@code backups} table) so other providers do not need a schema change to + * support their own incremental implementations. + */ +public final class NASBackupChainKeys { + + /** UUID of the parent backup (full or previous incremental). Empty for full backups. */ + public static final String PARENT_BACKUP_ID = "nas.parent_backup_id"; + + /** QEMU dirty-bitmap name created by this backup, used as the {@code } reference for the next one. */ + public static final String BITMAP_NAME = "nas.bitmap_name"; + + /** Identifier shared by every backup in the same chain (the full anchors a chain; its incrementals inherit the id). */ + public static final String CHAIN_ID = "nas.chain_id"; + + /** Position within the chain: 0 for the full, 1 for the first incremental, and so on. */ + public static final String CHAIN_POSITION = "nas.chain_position"; + + /** Backup type marker: {@value #TYPE_FULL} or {@value #TYPE_INCREMENTAL}. Mirrors {@code backups.type} for fast lookup without a join. */ + public static final String TYPE = "nas.type"; + + public static final String TYPE_FULL = "full"; + public static final String TYPE_INCREMENTAL = "incremental"; + + private NASBackupChainKeys() { + } +} diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java index df9336026f4d..856f8de8c997 100644 --- a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java @@ -85,6 +85,16 @@ public class NASBackupProvider extends AdapterBase implements BackupProvider, Co true, BackupFrameworkEnabled.key()); + ConfigKey NASBackupFullEvery = new ConfigKey<>("Advanced", Integer.class, + "nas.backup.full.every", + "10", + "Take a full NAS backup every Nth backup; remaining backups in between are incremental. " + + "Counts backups, not days, so it works for hourly, daily, and ad-hoc schedules. " + + "Set to 1 to disable incrementals (every backup is full).", + true, + ConfigKey.Scope.Zone, + BackupFrameworkEnabled.key()); + @Inject private BackupDao backupDao; @@ -629,7 +639,8 @@ public Boolean crossZoneInstanceCreationEnabled(BackupOffering backupOffering) { @Override public ConfigKey[] getConfigKeys() { return new ConfigKey[]{ - NASBackupRestoreMountTimeout + NASBackupRestoreMountTimeout, + NASBackupFullEvery }; } From fbb916b2546a2f1b238a7ffb035d2085569952d9 Mon Sep 17 00:00:00 2001 From: James Peru Date: Mon, 27 Apr 2026 18:53:20 +0300 Subject: [PATCH 03/13] feat(backup): nasbackup.sh full+incremental modes via backup-begin Adds three new optional CLI flags to nasbackup.sh: -M|--mode --bitmap-new (checkpoint to create with this backup) --bitmap-parent (incremental: parent bitmap to read changes since) --parent-path (incremental: parent backup file for rebase) Behavior: - When -M is omitted, behavior is unchanged (legacy full-only, no checkpoint created), so existing callers are not affected. - With -M full + --bitmap-new, a full backup is taken AND a libvirt checkpoint of that name is registered atomically (via backup-begin's --checkpointxml), giving the next incremental its starting bitmap. - With -M incremental, libvirt's element references the parent bitmap; only changed blocks are written. After completion, qemu-img rebase wires the new file to its parent so the chain on the NAS is self-describing for restore. - Stopped VMs cannot use backup-begin; if -M incremental is requested while VM is stopped, the script falls back to a full and emits INCREMENTAL_FALLBACK= on stderr so the orchestrator can record it correctly in the chain. - The script echoes BITMAP_CREATED= on success so the Java caller can store it under backup_details (NASBackupChainKeys.BITMAP_NAME). Works across local file, NFS-file, and LINSTOR primary storage. Ceph RBD running-VM support is a pre-existing limitation of this script, not affected by this change. Refs: apache/cloudstack#12899 --- scripts/vm/hypervisor/kvm/nasbackup.sh | 131 +++++++++++++++++++++++-- 1 file changed, 124 insertions(+), 7 deletions(-) diff --git a/scripts/vm/hypervisor/kvm/nasbackup.sh b/scripts/vm/hypervisor/kvm/nasbackup.sh index 441312f35e86..50c90588e99b 100755 --- a/scripts/vm/hypervisor/kvm/nasbackup.sh +++ b/scripts/vm/hypervisor/kvm/nasbackup.sh @@ -33,9 +33,15 @@ MOUNT_OPTS="" BACKUP_DIR="" DISK_PATHS="" QUIESCE="" +# Incremental backup parameters (all optional; legacy callers omit them) +MODE="" # "full" or "incremental"; empty => legacy full-only behavior (no checkpoint created) +BITMAP_NEW="" # Bitmap/checkpoint name to create with this backup (e.g. "backup-1711586400") +BITMAP_PARENT="" # For incremental: parent bitmap name to read changes since +PARENT_PATH="" # For incremental: parent backup file path (used as backing for qemu-img rebase) logFile="/var/log/cloudstack/agent/agent.log" EXIT_CLEANUP_FAILED=20 +EXIT_INCREMENTAL_UNSUPPORTED=21 log() { [[ "$verb" -eq 1 ]] && builtin echo "$@" @@ -113,20 +119,70 @@ backup_running_vm() { mount_operation mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit 1; } + # Determine effective mode for this run. + # Legacy callers (no -M argument) get the original full-only behavior with no checkpoint. + local effective_mode="${MODE:-legacy-full}" + local make_checkpoint=0 + case "$effective_mode" in + incremental) + if [[ -z "$BITMAP_PARENT" || -z "$BITMAP_NEW" || -z "$PARENT_PATH" ]]; then + echo "incremental mode requires --bitmap-parent, --bitmap-new, and --parent-path" + cleanup + exit 1 + fi + make_checkpoint=1 + ;; + full) + if [[ -z "$BITMAP_NEW" ]]; then + echo "full mode requires --bitmap-new (the bitmap to create for the next incremental)" + cleanup + exit 1 + fi + make_checkpoint=1 + ;; + legacy-full) + make_checkpoint=0 + ;; + *) + echo "Unknown mode: $effective_mode" + cleanup + exit 1 + ;; + esac + + # Build backup XML (and matching checkpoint XML when applicable). name="root" - echo "" > $dest/backup.xml + echo "" > $dest/backup.xml + if [[ "$effective_mode" == "incremental" ]]; then + echo "$BITMAP_PARENT" >> $dest/backup.xml + fi + echo "" >> $dest/backup.xml + if [[ $make_checkpoint -eq 1 ]]; then + echo "$BITMAP_NEW" > $dest/checkpoint.xml + fi while read -r disk fullpath; do if [[ "$fullpath" == /dev/drbd/by-res/* ]]; then volUuid=$(get_linstor_uuid_from_path "$fullpath") else volUuid="${fullpath##*/}" fi - echo "" >> $dest/backup.xml + if [[ "$effective_mode" == "incremental" ]]; then + # Incremental disk entry — no backupmode attr, libvirt picks it up from . + echo "" >> $dest/backup.xml + else + echo "" >> $dest/backup.xml + fi + if [[ $make_checkpoint -eq 1 ]]; then + echo "" >> $dest/checkpoint.xml + fi name="datadisk" done < <( virsh -c qemu:///system domblklist "$VM" --details 2>/dev/null | awk '$2=="disk"{print $3, $4}' ) echo "" >> $dest/backup.xml + if [[ $make_checkpoint -eq 1 ]]; then + echo "" >> $dest/checkpoint.xml + fi local thaw=0 if [[ ${QUIESCE} == "true" ]]; then @@ -135,10 +191,16 @@ backup_running_vm() { fi fi - # Start push backup + # Start push backup, atomically registering the new checkpoint when applicable. local backup_begin=0 - if virsh -c qemu:///system backup-begin --domain $VM --backupxml $dest/backup.xml 2>&1 > /dev/null; then - backup_begin=1; + if [[ $make_checkpoint -eq 1 ]]; then + if virsh -c qemu:///system backup-begin --domain $VM --backupxml $dest/backup.xml --checkpointxml $dest/checkpoint.xml 2>&1 > /dev/null; then + backup_begin=1; + fi + else + if virsh -c qemu:///system backup-begin --domain $VM --backupxml $dest/backup.xml 2>&1 > /dev/null; then + backup_begin=1; + fi fi if [[ $thaw -eq 1 ]]; then @@ -172,9 +234,25 @@ backup_running_vm() { sleep 5 done - # Use qemu-img convert to sparsify linstor backups which get bloated due to virsh backup-begin. + # Sparsify behavior: + # - For LINSTOR backups (existing): qemu-img convert sparsifies the bloated output. + # - For INCREMENTAL: rebase the resulting thin qcow2 onto its parent so the chain is self-describing + # (so a future restore can flatten without external chain metadata). name="root" while read -r disk fullpath; do + if [[ "$effective_mode" == "incremental" ]]; then + volUuid="${fullpath##*/}" + if [[ "$fullpath" == /dev/drbd/by-res/* ]]; then + volUuid=$(get_linstor_uuid_from_path "$fullpath") + fi + if ! qemu-img rebase -u -b "$PARENT_PATH" -F qcow2 "$dest/$name.$volUuid.qcow2" >> "$logFile" 2> >(cat >&2); then + echo "qemu-img rebase failed for $dest/$name.$volUuid.qcow2 onto $PARENT_PATH" + cleanup + exit 1 + fi + name="datadisk" + continue + fi if [[ "$fullpath" != /dev/drbd/by-res/* ]]; then continue fi @@ -191,18 +269,30 @@ backup_running_vm() { virsh -c qemu:///system domblklist "$VM" --details 2>/dev/null | awk '$2=="disk"{print $3, $4}' ) - rm -f $dest/backup.xml + rm -f $dest/backup.xml $dest/checkpoint.xml sync # Print statistics virsh -c qemu:///system domjobinfo $VM --completed du -sb $dest | cut -f1 + if [[ -n "$BITMAP_NEW" ]]; then + # Echo the bitmap name on its own line so the Java caller can capture it for backup_details. + echo "BITMAP_CREATED=$BITMAP_NEW" + fi umount $mount_point rmdir $mount_point } backup_stopped_vm() { + # Stopped VMs cannot use libvirt's backup-begin (no QEMU process). Take a full + # backup via qemu-img convert. If the caller asked for incremental, fall back + # to full and signal the fallback so the orchestrator can record it as a full + # in the chain. + if [[ "$MODE" == "incremental" ]]; then + echo "INCREMENTAL_FALLBACK=full (VM stopped — incremental requires running VM)" >&2 + fi + mount_operation mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit 1; } @@ -278,6 +368,13 @@ cleanup() { function usage { echo "" echo "Usage: $0 -o -v|--vm -t -s -m -p -d -q|--quiesce " + echo " [-M|--mode ] [--bitmap-new ] [--bitmap-parent ] [--parent-path ]" + echo "" + echo "Incremental backup options (running VMs only; requires QEMU >= 4.2 and libvirt >= 7.2):" + echo " -M|--mode full Take a full backup AND create a checkpoint (--bitmap-new required) for future incrementals." + echo " -M|--mode incremental Take an incremental backup since --bitmap-parent and create new checkpoint --bitmap-new." + echo " Requires --bitmap-parent, --bitmap-new, and --parent-path (parent backup file for rebase)." + echo " Without -M, behaves as legacy full-only backup with no checkpoint creation." echo "" exit 1 } @@ -324,6 +421,26 @@ while [[ $# -gt 0 ]]; do shift shift ;; + -M|--mode) + MODE="$2" + shift + shift + ;; + --bitmap-new) + BITMAP_NEW="$2" + shift + shift + ;; + --bitmap-parent) + BITMAP_PARENT="$2" + shift + shift + ;; + --parent-path) + PARENT_PATH="$2" + shift + shift + ;; -h|--help) usage shift From 1f2aebca36ab10585e40a81916342161062b20a0 Mon Sep 17 00:00:00 2001 From: James Peru Date: Mon, 27 Apr 2026 19:07:24 +0300 Subject: [PATCH 04/13] feat(backup): orchestrate full vs incremental in NAS provider MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the Java side of the incremental NAS backup feature: TakeBackupCommand + mode, bitmapNew, bitmapParent, parentPath fields (null for legacy callers — script preserves its existing behaviour when these are omitted). BackupAnswer + bitmapCreated (echoed by the agent on success) + incrementalFallback (true when an incremental was requested but the agent had to fall back to full because the VM was stopped). LibvirtTakeBackupCommandWrapper - Forwards the new fields to nasbackup.sh. - Strips the new BITMAP_CREATED= / INCREMENTAL_FALLBACK= marker lines out of stdout before the existing numeric-suffix size parser runs, so the script can keep the same "size as last line(s)" contract. - Surfaces both markers on the BackupAnswer. NASBackupProvider - decideChain(vm) walks backup_details (chain_id, chain_position, bitmap_name) for the latest BackedUp backup of the VM and decides: * Stopped VM -> full (libvirt backup-begin needs running QEMU) * No prior chain -> full (chain_position=0) * chain_position+1 >= nas.backup.full.every -> new full * otherwise -> incremental, parent=last bitmap - Generates timestamp-based bitmap names ("backup-") matching what the script then registers as the libvirt checkpoint name. - persistChainMetadata() writes parent_backup_id, bitmap_name, chain_id, chain_position, type into the existing backup_details key/value table (per the RFC review — no new columns on backups). - Honours the agent's INCREMENTAL_FALLBACK= signal: re-records the backup as a full and starts a fresh chain. - createBackupObject() now takes a type argument so the BackupVO reflects the actual decision instead of always being "FULL". Refs: apache/cloudstack#12899 --- .../cloudstack/backup/BackupAnswer.java | 22 ++ .../cloudstack/backup/TakeBackupCommand.java | 38 ++++ .../cloudstack/backup/NASBackupProvider.java | 188 +++++++++++++++++- .../LibvirtTakeBackupCommandWrapper.java | 56 +++++- 4 files changed, 295 insertions(+), 9 deletions(-) diff --git a/core/src/main/java/org/apache/cloudstack/backup/BackupAnswer.java b/core/src/main/java/org/apache/cloudstack/backup/BackupAnswer.java index ffc67b628a7e..7882b1fa0a34 100644 --- a/core/src/main/java/org/apache/cloudstack/backup/BackupAnswer.java +++ b/core/src/main/java/org/apache/cloudstack/backup/BackupAnswer.java @@ -29,6 +29,12 @@ public class BackupAnswer extends Answer { private Long virtualSize; private Map volumes; Boolean needsCleanup; + // Set by the NAS backup provider after a checkpoint/bitmap was created during this backup. + // The provider persists it in backup_details under NASBackupChainKeys.BITMAP_NAME. + private String bitmapCreated; + // Set when an incremental was requested but the agent had to fall back to a full + // (e.g. VM was stopped). Provider should record this backup as type=full. + private Boolean incrementalFallback; public BackupAnswer(final Command command, final boolean success, final String details) { super(command, success, details); @@ -68,4 +74,20 @@ public Boolean getNeedsCleanup() { public void setNeedsCleanup(Boolean needsCleanup) { this.needsCleanup = needsCleanup; } + + public String getBitmapCreated() { + return bitmapCreated; + } + + public void setBitmapCreated(String bitmapCreated) { + this.bitmapCreated = bitmapCreated; + } + + public Boolean getIncrementalFallback() { + return incrementalFallback != null && incrementalFallback; + } + + public void setIncrementalFallback(Boolean incrementalFallback) { + this.incrementalFallback = incrementalFallback; + } } diff --git a/core/src/main/java/org/apache/cloudstack/backup/TakeBackupCommand.java b/core/src/main/java/org/apache/cloudstack/backup/TakeBackupCommand.java index 5402b6b24760..3f5b911bdb6e 100644 --- a/core/src/main/java/org/apache/cloudstack/backup/TakeBackupCommand.java +++ b/core/src/main/java/org/apache/cloudstack/backup/TakeBackupCommand.java @@ -36,6 +36,12 @@ public class TakeBackupCommand extends Command { @LogLevel(LogLevel.Log4jLevel.Off) private String mountOptions; + // Incremental backup fields (NAS provider; null/empty for legacy full-only callers). + private String mode; // "full" or "incremental"; null => legacy behaviour (script default) + private String bitmapNew; // Checkpoint/bitmap name to create with this backup (timestamp-based) + private String bitmapParent; // Incremental: parent bitmap to read changes since + private String parentPath; // Incremental: parent backup file path on the mounted NAS (for qemu-img rebase) + public TakeBackupCommand(String vmName, String backupPath) { super(); this.vmName = vmName; @@ -106,6 +112,38 @@ public void setQuiesce(Boolean quiesce) { this.quiesce = quiesce; } + public String getMode() { + return mode; + } + + public void setMode(String mode) { + this.mode = mode; + } + + public String getBitmapNew() { + return bitmapNew; + } + + public void setBitmapNew(String bitmapNew) { + this.bitmapNew = bitmapNew; + } + + public String getBitmapParent() { + return bitmapParent; + } + + public void setBitmapParent(String bitmapParent) { + this.bitmapParent = bitmapParent; + } + + public String getParentPath() { + return parentPath; + } + + public void setParentPath(String parentPath) { + this.parentPath = parentPath; + } + @Override public boolean executeInSequence() { return true; diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java index 856f8de8c997..c8c56a65dce9 100644 --- a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java @@ -48,6 +48,7 @@ import org.apache.cloudstack.backup.dao.BackupDao; +import org.apache.cloudstack.backup.dao.BackupDetailsDao; import org.apache.cloudstack.backup.dao.BackupRepositoryDao; import org.apache.cloudstack.engine.subsystem.api.storage.DataStore; import org.apache.cloudstack.engine.subsystem.api.storage.DataStoreManager; @@ -140,6 +141,9 @@ public class NASBackupProvider extends AdapterBase implements BackupProvider, Co @Inject private DiskOfferingDao diskOfferingDao; + @Inject + private BackupDetailsDao backupDetailsDao; + private Long getClusterIdFromRootVolume(VirtualMachine vm) { VolumeVO rootVolume = volumeDao.getInstanceRootVolume(vm.getId()); StoragePoolVO rootDiskPool = primaryDataStoreDao.findById(rootVolume.getPoolId()); @@ -178,6 +182,168 @@ protected Host getVMHypervisorHost(VirtualMachine vm) { return resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM, vm.getDataCenterId()); } + /** + * Returned by {@link #decideChain(VirtualMachine)} to describe the next backup's place in + * the chain: full vs incremental, the bitmap name to create, and (for incrementals) the + * parent bitmap and parent file path. + */ + static final class ChainDecision { + final String mode; // "full" or "incremental" + final String bitmapNew; + final String bitmapParent; // null for full + final String parentPath; // null for full + final String chainId; // chain identifier this backup belongs to + final int chainPosition; // 0 for full, N for the Nth incremental in the chain + + private ChainDecision(String mode, String bitmapNew, String bitmapParent, String parentPath, + String chainId, int chainPosition) { + this.mode = mode; + this.bitmapNew = bitmapNew; + this.bitmapParent = bitmapParent; + this.parentPath = parentPath; + this.chainId = chainId; + this.chainPosition = chainPosition; + } + + static ChainDecision fullStart(String bitmapName) { + return new ChainDecision(NASBackupChainKeys.TYPE_FULL, bitmapName, null, null, + UUID.randomUUID().toString(), 0); + } + + static ChainDecision incremental(String bitmapNew, String bitmapParent, String parentPath, + String chainId, int chainPosition) { + return new ChainDecision(NASBackupChainKeys.TYPE_INCREMENTAL, bitmapNew, bitmapParent, + parentPath, chainId, chainPosition); + } + + boolean isIncremental() { + return NASBackupChainKeys.TYPE_INCREMENTAL.equals(mode); + } + } + + /** + * Decides whether the next backup for {@code vm} should be a fresh full or an incremental + * appended to the existing chain. Stopped VMs are always full (libvirt {@code backup-begin} + * requires a running QEMU process). The {@code nas.backup.full.every} ConfigKey controls + * how many backups (full + incrementals) form one chain before a new full is forced. + */ + protected ChainDecision decideChain(VirtualMachine vm) { + final String newBitmap = "backup-" + System.currentTimeMillis() / 1000L; + + // Stopped VMs cannot do incrementals — script will also fall back, but we make the + // decision here so we register the right type up-front. + if (VirtualMachine.State.Stopped.equals(vm.getState())) { + return ChainDecision.fullStart(newBitmap); + } + + Integer fullEvery = NASBackupFullEvery.valueIn(vm.getDataCenterId()); + if (fullEvery == null || fullEvery <= 1) { + // Disabled or every-backup-is-full mode. + return ChainDecision.fullStart(newBitmap); + } + + // Walk this VM's backups newest→oldest, find the most recent BackedUp backup that has a + // bitmap stored. If we don't find one, this is the first backup in a chain — start full. + List history = backupDao.listByVmId(vm.getDataCenterId(), vm.getId()); + if (history == null || history.isEmpty()) { + return ChainDecision.fullStart(newBitmap); + } + history.sort(Comparator.comparing(Backup::getDate).reversed()); + + Backup parent = null; + String parentBitmap = null; + String parentChainId = null; + int parentChainPosition = -1; + for (Backup b : history) { + if (!Backup.Status.BackedUp.equals(b.getStatus())) { + continue; + } + String bm = readDetail(b, NASBackupChainKeys.BITMAP_NAME); + if (bm == null) { + continue; + } + parent = b; + parentBitmap = bm; + parentChainId = readDetail(b, NASBackupChainKeys.CHAIN_ID); + String posStr = readDetail(b, NASBackupChainKeys.CHAIN_POSITION); + try { + parentChainPosition = posStr == null ? 0 : Integer.parseInt(posStr); + } catch (NumberFormatException e) { + parentChainPosition = 0; + } + break; + } + if (parent == null || parentBitmap == null || parentChainId == null) { + return ChainDecision.fullStart(newBitmap); + } + + // Force a fresh full when the chain has reached the configured length. + if (parentChainPosition + 1 >= fullEvery) { + return ChainDecision.fullStart(newBitmap); + } + + // The script needs the parent backup's on-NAS file path so it can rebase the new + // qcow2 onto it. The path is stored relative to the NAS mount point — the script + // resolves it inside its mount session. + String parentPath = composeParentBackupPath(parent); + return ChainDecision.incremental(newBitmap, parentBitmap, parentPath, + parentChainId, parentChainPosition + 1); + } + + private String readDetail(Backup backup, String key) { + BackupDetailVO d = backupDetailsDao.findDetail(backup.getId(), key); + return d == null ? null : d.getValue(); + } + + /** + * Compose the on-NAS path of a parent backup's root-disk qcow2. Relative to the NAS mount, + * matches the layout written by {@code nasbackup.sh} ({@code /root..qcow2}). + */ + private String composeParentBackupPath(Backup parent) { + // backupPath is stored as externalId by createBackupObject — e.g. "i-2-1234-VM/2026.04.27.13.45.00". + // Volume UUID for the root volume is what the script keys backup files on. + VolumeVO rootVolume = volumeDao.getInstanceRootVolume(parent.getVmId()); + String volUuid = rootVolume == null ? "root" : rootVolume.getUuid(); + return parent.getExternalId() + "/root." + volUuid + ".qcow2"; + } + + /** + * Persist chain metadata under backup_details. Stored here (not on the backups table) so + * other providers can implement their own chain semantics without schema changes. + */ + private void persistChainMetadata(Backup backup, ChainDecision decision, String bitmapFromAgent) { + // Prefer the bitmap name confirmed by the agent (BITMAP_CREATED= line). Fall back to + // what we asked it to create — they should match. + String bitmap = bitmapFromAgent != null ? bitmapFromAgent : decision.bitmapNew; + if (bitmap != null) { + backupDetailsDao.persist(new BackupDetailVO(backup.getId(), NASBackupChainKeys.BITMAP_NAME, bitmap, true)); + } + backupDetailsDao.persist(new BackupDetailVO(backup.getId(), NASBackupChainKeys.CHAIN_ID, decision.chainId, true)); + backupDetailsDao.persist(new BackupDetailVO(backup.getId(), NASBackupChainKeys.CHAIN_POSITION, + String.valueOf(decision.chainPosition), true)); + backupDetailsDao.persist(new BackupDetailVO(backup.getId(), NASBackupChainKeys.TYPE, decision.mode, true)); + if (decision.isIncremental()) { + // Resolve the parent backup's UUID so restore can walk the chain by id, not by path. + String parentUuid = lookupParentBackupUuid(backup.getVmId(), decision.bitmapParent); + if (parentUuid != null) { + backupDetailsDao.persist(new BackupDetailVO(backup.getId(), NASBackupChainKeys.PARENT_BACKUP_ID, parentUuid, true)); + } + } + } + + private String lookupParentBackupUuid(long vmId, String parentBitmap) { + if (parentBitmap == null) { + return null; + } + for (Backup b : backupDao.listByVmId(null, vmId)) { + String bm = readDetail(b, NASBackupChainKeys.BITMAP_NAME); + if (parentBitmap.equals(bm)) { + return b.getUuid(); + } + } + return null; + } + protected Host getVMHypervisorHostForBackup(VirtualMachine vm) { Long hostId = vm.getHostId(); if (hostId == null && VirtualMachine.State.Running.equals(vm.getState())) { @@ -215,12 +381,20 @@ public Pair takeBackup(final VirtualMachine vm, Boolean quiesce final String backupPath = String.format("%s/%s", vm.getInstanceName(), new SimpleDateFormat("yyyy.MM.dd.HH.mm.ss").format(creationDate)); - BackupVO backupVO = createBackupObject(vm, backupPath); + // Decide full vs incremental for this backup. Stopped VMs are always full + // (libvirt backup-begin requires a running QEMU process). + ChainDecision decision = decideChain(vm); + + BackupVO backupVO = createBackupObject(vm, backupPath, decision.isIncremental() ? "INCREMENTAL" : "FULL"); TakeBackupCommand command = new TakeBackupCommand(vm.getInstanceName(), backupPath); command.setBackupRepoType(backupRepository.getType()); command.setBackupRepoAddress(backupRepository.getAddress()); command.setMountOptions(backupRepository.getMountOptions()); command.setQuiesce(quiesceVM); + command.setMode(decision.mode); + command.setBitmapNew(decision.bitmapNew); + command.setBitmapParent(decision.bitmapParent); + command.setParentPath(decision.parentPath); if (VirtualMachine.State.Stopped.equals(vm.getState())) { List vmVolumes = volumeDao.findByInstance(vm.getId()); @@ -249,9 +423,17 @@ public Pair takeBackup(final VirtualMachine vm, Boolean quiesce backupVO.setDate(new Date()); backupVO.setSize(answer.getSize()); backupVO.setStatus(Backup.Status.BackedUp); + // If the agent fell back to full (stopped VM mid-incremental cycle), record this + // backup as a full and start a new chain. + ChainDecision effective = decision; + if (answer.getIncrementalFallback()) { + effective = ChainDecision.fullStart(decision.bitmapNew); + backupVO.setType("FULL"); + } List volumes = new ArrayList<>(volumeDao.findByInstance(vm.getId())); backupVO.setBackedUpVolumes(backupManager.createVolumeInfoFromVolumes(volumes)); if (backupDao.update(backupVO.getId(), backupVO)) { + persistChainMetadata(backupVO, effective, answer.getBitmapCreated()); return new Pair<>(true, backupVO); } else { throw new CloudRuntimeException("Failed to update backup"); @@ -270,11 +452,11 @@ public Pair takeBackup(final VirtualMachine vm, Boolean quiesce } } - private BackupVO createBackupObject(VirtualMachine vm, String backupPath) { + private BackupVO createBackupObject(VirtualMachine vm, String backupPath, String type) { BackupVO backup = new BackupVO(); backup.setVmId(vm.getId()); backup.setExternalId(backupPath); - backup.setType("FULL"); + backup.setType(type); backup.setDate(new Date()); long virtualSize = 0L; for (final Volume volume: volumeDao.findByInstance(vm.getId())) { diff --git a/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java b/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java index 42953aa9f835..8427242fc1c7 100644 --- a/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java +++ b/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java @@ -69,8 +69,7 @@ public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvir } } - List commands = new ArrayList<>(); - commands.add(new String[]{ + List argv = new ArrayList<>(Arrays.asList( libvirtComputingResource.getNasBackupPath(), "-o", "backup", "-v", vmName, @@ -80,7 +79,27 @@ public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvir "-p", backupPath, "-q", command.getQuiesce() != null && command.getQuiesce() ? "true" : "false", "-d", diskPaths.isEmpty() ? "" : String.join(",", diskPaths) - }); + )); + // Incremental NAS backup args (only added when the orchestrator asked for full/inc mode). + if (command.getMode() != null && !command.getMode().isEmpty()) { + argv.add("-M"); + argv.add(command.getMode()); + } + if (command.getBitmapNew() != null && !command.getBitmapNew().isEmpty()) { + argv.add("--bitmap-new"); + argv.add(command.getBitmapNew()); + } + if (command.getBitmapParent() != null && !command.getBitmapParent().isEmpty()) { + argv.add("--bitmap-parent"); + argv.add(command.getBitmapParent()); + } + if (command.getParentPath() != null && !command.getParentPath().isEmpty()) { + argv.add("--parent-path"); + argv.add(command.getParentPath()); + } + + List commands = new ArrayList<>(); + commands.add(argv.toArray(new String[0])); Pair result = Script.executePipedCommands(commands, timeout); @@ -94,21 +113,46 @@ public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvir return answer; } + // Strip out our incremental marker lines before parsing size, so the legacy + // numeric-suffix parser keeps working. + String stdout = result.second().trim(); + String bitmapCreated = null; + boolean incrementalFallback = false; + StringBuilder filtered = new StringBuilder(); + for (String line : stdout.split("\n")) { + String trimmed = line.trim(); + if (trimmed.startsWith("BITMAP_CREATED=")) { + bitmapCreated = trimmed.substring("BITMAP_CREATED=".length()); + continue; + } + if (trimmed.startsWith("INCREMENTAL_FALLBACK=")) { + incrementalFallback = true; + continue; + } + if (filtered.length() > 0) { + filtered.append("\n"); + } + filtered.append(line); + } + String numericOutput = filtered.toString().trim(); + long backupSize = 0L; if (CollectionUtils.isNullOrEmpty(diskPaths)) { - List outputLines = Arrays.asList(result.second().trim().split("\n")); + List outputLines = Arrays.asList(numericOutput.split("\n")); if (!outputLines.isEmpty()) { backupSize = Long.parseLong(outputLines.get(outputLines.size() - 1).trim()); } } else { - String[] outputLines = result.second().trim().split("\n"); + String[] outputLines = numericOutput.split("\n"); for(String line : outputLines) { backupSize = backupSize + Long.parseLong(line.split(" ")[0].trim()); } } - BackupAnswer answer = new BackupAnswer(command, true, result.second().trim()); + BackupAnswer answer = new BackupAnswer(command, true, stdout); answer.setSize(backupSize); + answer.setBitmapCreated(bitmapCreated); + answer.setIncrementalFallback(incrementalFallback); return answer; } } From 43e2f7504aad25a8d5ec46288d1dd86527ad079a Mon Sep 17 00:00:00 2001 From: James Peru Date: Mon, 27 Apr 2026 19:10:46 +0300 Subject: [PATCH 05/13] feat(backup): on-demand bitmap recreation for incremental NAS backup CloudStack rebuilds the libvirt domain XML on every VM start, which means persistent QEMU dirty bitmaps don't survive a stop/start cycle. Rather than hooking into the VM start lifecycle (intrusive across the orchestration layer), this commit handles the missing bitmap *lazily* at the next backup attempt: nasbackup.sh - When -M incremental is requested, the script first checks `virsh checkpoint-list` for the parent bitmap. If absent, it recreates the checkpoint on the running domain so libvirt accepts the reference. The next incremental will be larger than usual (it captures all writes since recreate, not since the previous incremental) but is correct; subsequent ones return to normal size. - On recreation, emits BITMAP_RECREATED= on stdout for the orchestrator to record. BackupAnswer + bitmapRecreated field surfaced from the agent. LibvirtTakeBackupCommandWrapper - Strips BITMAP_RECREATED= line from stdout before size parsing. - Sets answer.setBitmapRecreated(...). NASBackupChainKeys + BITMAP_RECREATED key for backup_details. NASBackupProvider - When the agent reports a recreated bitmap, persists it under backup_details and logs an info-level message so operators can correlate larger-than-usual incrementals with VM restarts. This satisfies the bitmap-loss-on-VM-restart concern from the RFC review without touching VirtualMachineManager / StartCommand / agent lifecycle. Refs: apache/cloudstack#12899 --- .../cloudstack/backup/BackupAnswer.java | 13 +++++++++++ .../cloudstack/backup/NASBackupChainKeys.java | 3 +++ .../cloudstack/backup/NASBackupProvider.java | 6 +++++ .../LibvirtTakeBackupCommandWrapper.java | 6 +++++ scripts/vm/hypervisor/kvm/nasbackup.sh | 23 +++++++++++++++++++ 5 files changed, 51 insertions(+) diff --git a/core/src/main/java/org/apache/cloudstack/backup/BackupAnswer.java b/core/src/main/java/org/apache/cloudstack/backup/BackupAnswer.java index 7882b1fa0a34..9e8282b16a80 100644 --- a/core/src/main/java/org/apache/cloudstack/backup/BackupAnswer.java +++ b/core/src/main/java/org/apache/cloudstack/backup/BackupAnswer.java @@ -35,6 +35,11 @@ public class BackupAnswer extends Answer { // Set when an incremental was requested but the agent had to fall back to a full // (e.g. VM was stopped). Provider should record this backup as type=full. private Boolean incrementalFallback; + // Set when the agent had to recreate the parent bitmap before this incremental + // (e.g. CloudStack rebuilt the domain XML on the previous VM start, losing bitmaps). + // The first incremental after a recreate is larger than usual; subsequent + // incrementals return to normal size. Informational — recorded in backup_details. + private String bitmapRecreated; public BackupAnswer(final Command command, final boolean success, final String details) { super(command, success, details); @@ -90,4 +95,12 @@ public Boolean getIncrementalFallback() { public void setIncrementalFallback(Boolean incrementalFallback) { this.incrementalFallback = incrementalFallback; } + + public String getBitmapRecreated() { + return bitmapRecreated; + } + + public void setBitmapRecreated(String bitmapRecreated) { + this.bitmapRecreated = bitmapRecreated; + } } diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java index 542844e19bf2..a3e811889114 100644 --- a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java @@ -42,6 +42,9 @@ public final class NASBackupChainKeys { public static final String TYPE_FULL = "full"; public static final String TYPE_INCREMENTAL = "incremental"; + /** Set to the bitmap name when this incremental had to recreate its parent bitmap on the host (informational; this incremental is larger than usual). */ + public static final String BITMAP_RECREATED = "nas.bitmap_recreated"; + private NASBackupChainKeys() { } } diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java index c8c56a65dce9..d4f95dfd0487 100644 --- a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java @@ -434,6 +434,12 @@ public Pair takeBackup(final VirtualMachine vm, Boolean quiesce backupVO.setBackedUpVolumes(backupManager.createVolumeInfoFromVolumes(volumes)); if (backupDao.update(backupVO.getId(), backupVO)) { persistChainMetadata(backupVO, effective, answer.getBitmapCreated()); + if (answer.getBitmapRecreated() != null) { + backupDetailsDao.persist(new BackupDetailVO(backupVO.getId(), + NASBackupChainKeys.BITMAP_RECREATED, answer.getBitmapRecreated(), true)); + logger.info("NAS incremental for VM {} recreated parent bitmap {} (likely VM was restarted since last backup)", + vm.getInstanceName(), answer.getBitmapRecreated()); + } return new Pair<>(true, backupVO); } else { throw new CloudRuntimeException("Failed to update backup"); diff --git a/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java b/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java index 8427242fc1c7..3654116869c5 100644 --- a/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java +++ b/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java @@ -117,6 +117,7 @@ public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvir // numeric-suffix parser keeps working. String stdout = result.second().trim(); String bitmapCreated = null; + String bitmapRecreated = null; boolean incrementalFallback = false; StringBuilder filtered = new StringBuilder(); for (String line : stdout.split("\n")) { @@ -125,6 +126,10 @@ public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvir bitmapCreated = trimmed.substring("BITMAP_CREATED=".length()); continue; } + if (trimmed.startsWith("BITMAP_RECREATED=")) { + bitmapRecreated = trimmed.substring("BITMAP_RECREATED=".length()); + continue; + } if (trimmed.startsWith("INCREMENTAL_FALLBACK=")) { incrementalFallback = true; continue; @@ -152,6 +157,7 @@ public Answer execute(TakeBackupCommand command, LibvirtComputingResource libvir BackupAnswer answer = new BackupAnswer(command, true, stdout); answer.setSize(backupSize); answer.setBitmapCreated(bitmapCreated); + answer.setBitmapRecreated(bitmapRecreated); answer.setIncrementalFallback(incrementalFallback); return answer; } diff --git a/scripts/vm/hypervisor/kvm/nasbackup.sh b/scripts/vm/hypervisor/kvm/nasbackup.sh index 50c90588e99b..04c7d254cdf0 100755 --- a/scripts/vm/hypervisor/kvm/nasbackup.sh +++ b/scripts/vm/hypervisor/kvm/nasbackup.sh @@ -150,6 +150,29 @@ backup_running_vm() { ;; esac + # When incremental, verify the parent bitmap still exists on the running domain. + # CloudStack rebuilds the libvirt domain XML on every VM start, so persistent bitmaps + # are lost across stop/start. If the parent is missing, recreate it as a fresh bitmap + # so libvirt accepts the reference. The first backup after a recreate + # captures all writes since the recreate point — slightly larger than ideal, but correct. + if [[ "$effective_mode" == "incremental" ]]; then + if ! virsh -c qemu:///system checkpoint-list "$VM" --name 2>/dev/null | grep -qx "$BITMAP_PARENT"; then + cat > $dest/recreate-checkpoint.xml <$BITMAP_PARENT +$(virsh -c qemu:///system domblklist "$VM" --details 2>/dev/null | awk '$2=="disk"{printf "\n", $3}') + +XML + if ! virsh -c qemu:///system checkpoint-create "$VM" --xmlfile $dest/recreate-checkpoint.xml > /dev/null 2>&1; then + echo "Failed to recreate parent bitmap $BITMAP_PARENT for $VM" + cleanup + exit 1 + fi + # Marker for the orchestrator: this incremental is larger because the bitmap was rebuilt. + echo "BITMAP_RECREATED=$BITMAP_PARENT" + rm -f $dest/recreate-checkpoint.xml + fi + fi + # Build backup XML (and matching checkpoint XML when applicable). name="root" echo "" > $dest/backup.xml From 39303fbf88edeb1b44dc064e0aff601eedc58fa1 Mon Sep 17 00:00:00 2001 From: James Peru Date: Mon, 27 Apr 2026 19:18:33 +0300 Subject: [PATCH 06/13] feat(backup): restore path follows incremental backing-chain Two changes that together let an incremental NAS backup be restored without manual chain assembly: scripts/vm/hypervisor/kvm/nasbackup.sh - qemu-img rebase now writes a backing-file path that is RELATIVE to the new qcow2's directory (e.g. ..//root..qcow2) rather than the absolute path on the current mount point. NAS mount points are ephemeral (mktemp -d), so an absolute reference would not resolve when the backup is re-mounted at restore time. Relative references are resolved by qemu-img against the file's own directory, so the chain stays valid no matter where the NAS is mounted next. - Verifies the parent file exists on the NAS before rebasing. LibvirtRestoreBackupCommandWrapper - For file-based primary storage (local, NFS-file), the existing code rsync'd the source qcow2 to the volume. That copies only the differential blocks of an incremental, leaving a volume whose backing-file reference points at a path the primary storage host doesn't have. Now: detect a backing-chain via qemu-img info JSON and flatten via 'qemu-img convert -O qcow2', which follows the chain and produces a self-contained qcow2. Full backups continue to use rsync (faster, no chain to flatten). - The block-storage path (RBD/Linstor) already used qemu-img convert via the QemuImg helper, which auto-flattens chains, so that path needed no change. Refs: apache/cloudstack#12899 --- .../LibvirtRestoreBackupCommandWrapper.java | 26 +++++++++++++++++++ scripts/vm/hypervisor/kvm/nasbackup.sh | 16 ++++++++++-- 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRestoreBackupCommandWrapper.java b/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRestoreBackupCommandWrapper.java index 22dbfbdd67a2..cc2a0868fe17 100644 --- a/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRestoreBackupCommandWrapper.java +++ b/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRestoreBackupCommandWrapper.java @@ -60,6 +60,15 @@ public class LibvirtRestoreBackupCommandWrapper extends CommandWrapper/dev/null | grep -q '\"backing-filename\"'"; private String getVolumeUuidFromPath(String volumePath, PrimaryDataStoreTO volumePool) { if (Storage.StoragePoolType.Linstor.equals(volumePool.getPoolType())) { @@ -270,10 +279,27 @@ private boolean replaceVolumeWithBackup(KVMStoragePoolManager storagePoolMgr, Pr return replaceBlockDeviceWithBackup(storagePoolMgr, volumePool, volumePath, backupPath, timeout, createTargetVolume, size); } + // For NAS-backed incremental backups, the source qcow2 has a backing-file + // reference to its parent (set by nasbackup.sh's qemu-img rebase). A plain + // rsync would copy only the differential blocks, leaving a volume that + // depends on a backing file the primary storage doesn't have. Flatten the + // chain via qemu-img convert, which follows the backing-file links and + // produces a single self-contained qcow2. + if (hasBackingChain(backupPath)) { + int flattenExit = Script.runSimpleBashScriptForExitValue( + String.format(QEMU_IMG_FLATTEN_COMMAND, backupPath, volumePath), timeout, false); + return flattenExit == 0; + } + int exitValue = Script.runSimpleBashScriptForExitValue(String.format(RSYNC_COMMAND, backupPath, volumePath), timeout, false); return exitValue == 0; } + private boolean hasBackingChain(String qcow2Path) { + return Script.runSimpleBashScriptForExitValue( + String.format(QEMU_IMG_HAS_BACKING_COMMAND, qcow2Path)) == 0; + } + private boolean replaceBlockDeviceWithBackup(KVMStoragePoolManager storagePoolMgr, PrimaryDataStoreTO volumePool, String volumePath, String backupPath, int timeout, boolean createTargetVolume, Long size) { KVMStoragePool volumeStoragePool = storagePoolMgr.getStoragePool(volumePool.getPoolType(), volumePool.getUuid()); QemuImg qemu; diff --git a/scripts/vm/hypervisor/kvm/nasbackup.sh b/scripts/vm/hypervisor/kvm/nasbackup.sh index 04c7d254cdf0..5611bae962d0 100755 --- a/scripts/vm/hypervisor/kvm/nasbackup.sh +++ b/scripts/vm/hypervisor/kvm/nasbackup.sh @@ -268,8 +268,20 @@ XML if [[ "$fullpath" == /dev/drbd/by-res/* ]]; then volUuid=$(get_linstor_uuid_from_path "$fullpath") fi - if ! qemu-img rebase -u -b "$PARENT_PATH" -F qcow2 "$dest/$name.$volUuid.qcow2" >> "$logFile" 2> >(cat >&2); then - echo "qemu-img rebase failed for $dest/$name.$volUuid.qcow2 onto $PARENT_PATH" + # PARENT_PATH from the orchestrator is the parent backup's path relative to the + # NAS mount root (e.g. "i-2-X/2026.04.27.12.00.00/root.UUID.qcow2"). Convert it to + # a path relative to THIS new qcow2's directory so the backing reference resolves + # correctly the next time the NAS is mounted (mount points are ephemeral). + local parent_abs="$mount_point/$PARENT_PATH" + if [[ ! -f "$parent_abs" ]]; then + echo "Parent backup file does not exist on NAS: $parent_abs" + cleanup + exit 1 + fi + local parent_rel + parent_rel=$(realpath --relative-to="$dest" "$parent_abs") + if ! qemu-img rebase -u -b "$parent_rel" -F qcow2 "$dest/$name.$volUuid.qcow2" >> "$logFile" 2> >(cat >&2); then + echo "qemu-img rebase failed for $dest/$name.$volUuid.qcow2 onto $parent_rel" cleanup exit 1 fi From b8d069e1279134dcc25732b18c3f0c38d904424f Mon Sep 17 00:00:00 2001 From: James Peru Date: Mon, 27 Apr 2026 19:24:02 +0300 Subject: [PATCH 07/13] feat(backup): cascade-delete + chain repair for NAS incrementals Adds the delete-with-chain-repair semantics agreed in the RFC review: scripts/vm/hypervisor/kvm/nasbackup.sh - New '-o rebase' operation: rebases an existing on-NAS qcow2 onto a new backing parent. Uses a SAFE rebase (no -u) so the target absorbs blocks of the about-to-be-deleted parent before the backing pointer is moved up to the grandparent. Writes the new backing reference relative to the target's directory so it survives mount-point changes. - New CLI flags --rebase-target, --rebase-new-backing (both passed mount-relative). RebaseBackupCommand + LibvirtRebaseBackupCommandWrapper - New agent command that wraps the script's rebase operation. The provider sends one of these per child that needs re-pointing. NASBackupProvider.deleteBackup - Now plans the chain repair before touching files via computeChainRepair(): * No chain metadata -> single-file delete (legacy behaviour) * Tail incremental -> single delete, no rebase * Middle incremental -> rebase immediate child onto our parent, then delete; shift chain_position of all later descendants by -1 * Full with descendants -> refuse unless forced=true; with forced=true delete full + every descendant newest-first - Updates parent_backup_id, chain_position metadata in backup_details after each rebase so the model in the DB matches the on-disk chain. This implements the cascade-delete behaviour requested in @abh1sar's review point #7. Refs: apache/cloudstack#12899 --- .../backup/RebaseBackupCommand.java | 73 ++++++ .../cloudstack/backup/NASBackupProvider.java | 246 +++++++++++++++++- .../LibvirtRebaseBackupCommandWrapper.java | 59 +++++ scripts/vm/hypervisor/kvm/nasbackup.sh | 60 +++++ 4 files changed, 425 insertions(+), 13 deletions(-) create mode 100644 core/src/main/java/org/apache/cloudstack/backup/RebaseBackupCommand.java create mode 100644 plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRebaseBackupCommandWrapper.java diff --git a/core/src/main/java/org/apache/cloudstack/backup/RebaseBackupCommand.java b/core/src/main/java/org/apache/cloudstack/backup/RebaseBackupCommand.java new file mode 100644 index 000000000000..e31114461263 --- /dev/null +++ b/core/src/main/java/org/apache/cloudstack/backup/RebaseBackupCommand.java @@ -0,0 +1,73 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +// + +package org.apache.cloudstack.backup; + +import com.cloud.agent.api.Command; +import com.cloud.agent.api.LogLevel; + +/** + * Tells the KVM agent to rebase a NAS backup qcow2 onto a new backing parent. Used by the + * NAS backup provider during chain repair when a middle incremental is being deleted: the + * immediate child must absorb the soon-to-be-deleted parent's blocks and then re-link to + * the grandparent. Both target and new-backing paths are NAS-mount-relative. + */ +public class RebaseBackupCommand extends Command { + private String targetPath; // mount-relative path of the qcow2 to repoint + private String newBackingPath; // mount-relative path of the new backing parent + private String backupRepoType; + private String backupRepoAddress; + @LogLevel(LogLevel.Log4jLevel.Off) + private String mountOptions; + + public RebaseBackupCommand(String targetPath, String newBackingPath, + String backupRepoType, String backupRepoAddress, String mountOptions) { + super(); + this.targetPath = targetPath; + this.newBackingPath = newBackingPath; + this.backupRepoType = backupRepoType; + this.backupRepoAddress = backupRepoAddress; + this.mountOptions = mountOptions; + } + + public String getTargetPath() { + return targetPath; + } + + public String getNewBackingPath() { + return newBackingPath; + } + + public String getBackupRepoType() { + return backupRepoType; + } + + public String getBackupRepoAddress() { + return backupRepoAddress; + } + + public String getMountOptions() { + return mountOptions == null ? "" : mountOptions; + } + + @Override + public boolean executeInSequence() { + return true; + } +} diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java index d4f95dfd0487..aca3dc108f27 100644 --- a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java @@ -693,24 +693,244 @@ public boolean deleteBackup(Backup backup, boolean forced) { throw new CloudRuntimeException(String.format("Unable to find a running KVM host in zone %d to delete backup %s", backup.getZoneId(), backup.getUuid())); } - DeleteBackupCommand command = new DeleteBackupCommand(backup.getExternalId(), backupRepository.getType(), - backupRepository.getAddress(), backupRepository.getMountOptions()); + // Repair the chain (if any) before removing the backup file. For chained backups, + // children that point at this backup must be re-pointed at this backup's parent + // (with their blocks merged via qemu-img rebase). For a full at the head of a chain + // with surviving children, refuse unless forced — `forced=true` then deletes the + // full plus every descendant. + ChainRepairPlan plan = computeChainRepair(backup, forced); + if (!plan.proceed) { + throw new CloudRuntimeException(plan.reason); + } - BackupAnswer answer; - try { - answer = (BackupAnswer) agentManager.send(host.getId(), command); - } catch (AgentUnavailableException e) { - throw new CloudRuntimeException("Unable to contact backend control plane to initiate backup"); - } catch (OperationTimedoutException e) { - throw new CloudRuntimeException("Operation to delete backup timed out, please try again"); + // Issue rebase commands for each child that needs re-pointing (ordered so each rebase + // operates on a chain that still resolves: children first if there are nested ones). + for (RebaseStep step : plan.rebaseSteps) { + RebaseBackupCommand rebase = new RebaseBackupCommand(step.targetMountRelativePath, + step.newBackingMountRelativePath, backupRepository.getType(), + backupRepository.getAddress(), backupRepository.getMountOptions()); + BackupAnswer rebaseAnswer; + try { + rebaseAnswer = (BackupAnswer) agentManager.send(host.getId(), rebase); + } catch (AgentUnavailableException e) { + throw new CloudRuntimeException("Unable to contact backend control plane to repair backup chain"); + } catch (OperationTimedoutException e) { + throw new CloudRuntimeException("Backup chain repair (rebase) timed out, please try again"); + } + if (rebaseAnswer == null || !rebaseAnswer.getResult()) { + throw new CloudRuntimeException(String.format( + "Backup chain repair failed: rebase of %s onto %s returned %s", + step.targetMountRelativePath, step.newBackingMountRelativePath, + rebaseAnswer == null ? "no answer" : rebaseAnswer.getDetails())); + } + // Update the rebased child's parent reference + position in backup_details. + BackupDetailVO parentDetail = backupDetailsDao.findDetail(step.childBackupId, NASBackupChainKeys.PARENT_BACKUP_ID); + if (parentDetail != null) { + parentDetail.setValue(step.newParentUuid == null ? "" : step.newParentUuid); + backupDetailsDao.update(parentDetail.getId(), parentDetail); + } else if (step.newParentUuid != null) { + backupDetailsDao.persist(new BackupDetailVO(step.childBackupId, + NASBackupChainKeys.PARENT_BACKUP_ID, step.newParentUuid, true)); + } + BackupDetailVO posDetail = backupDetailsDao.findDetail(step.childBackupId, NASBackupChainKeys.CHAIN_POSITION); + if (posDetail != null) { + posDetail.setValue(String.valueOf(step.newChainPosition)); + backupDetailsDao.update(posDetail.getId(), posDetail); + } } - if (answer != null && answer.getResult()) { - return backupDao.remove(backup.getId()); + // Now delete this backup's files. For a forced delete of a full with descendants we + // also delete all descendants' files (newest first so each rm targets a leaf). + for (Backup victim : plan.toDelete) { + DeleteBackupCommand command = new DeleteBackupCommand(victim.getExternalId(), backupRepository.getType(), + backupRepository.getAddress(), backupRepository.getMountOptions()); + BackupAnswer answer; + try { + answer = (BackupAnswer) agentManager.send(host.getId(), command); + } catch (AgentUnavailableException e) { + throw new CloudRuntimeException("Unable to contact backend control plane to initiate backup"); + } catch (OperationTimedoutException e) { + throw new CloudRuntimeException("Operation to delete backup timed out, please try again"); + } + if (answer == null || !answer.getResult()) { + logger.warn("Failed to delete backup file for {} ({}); leaving DB row intact", victim.getUuid(), victim.getExternalId()); + return false; + } + backupDao.remove(victim.getId()); } - logger.debug("There was an error removing the backup with id {}", backup.getId()); - return false; + // Shift chain_position down by 1 for any survivors deeper in the chain than the + // backup we just removed (their direct parent reference is unchanged, but their + // numeric position needs to stay consistent so future full-every cadence math works). + if (plan.shiftPositionsBelow != null) { + for (Backup b : backupDao.listByVmId(null, backup.getVmId())) { + if (!plan.shiftPositionsBelow.chainId.equals(readDetail(b, NASBackupChainKeys.CHAIN_ID))) { + continue; + } + int pos = chainPosition(b); + if (pos > plan.shiftPositionsBelow.afterPosition && pos != Integer.MAX_VALUE) { + BackupDetailVO posDetail = backupDetailsDao.findDetail(b.getId(), NASBackupChainKeys.CHAIN_POSITION); + if (posDetail != null) { + posDetail.setValue(String.valueOf(pos - 1)); + backupDetailsDao.update(posDetail.getId(), posDetail); + } + } + } + } + + return true; + } + + private static final class PositionShift { + final String chainId; + final int afterPosition; // shift positions strictly greater than this by -1 + PositionShift(String chainId, int afterPosition) { + this.chainId = chainId; + this.afterPosition = afterPosition; + } + } + + /** + * Result of {@link #computeChainRepair}: whether to proceed, what to rebase, what to delete. + */ + private static final class ChainRepairPlan { + final boolean proceed; + final String reason; + final List rebaseSteps; + final List toDelete; + final PositionShift shiftPositionsBelow; + + private ChainRepairPlan(boolean proceed, String reason, List rebaseSteps, List toDelete, + PositionShift shiftPositionsBelow) { + this.proceed = proceed; + this.reason = reason; + this.rebaseSteps = rebaseSteps; + this.toDelete = toDelete; + this.shiftPositionsBelow = shiftPositionsBelow; + } + + static ChainRepairPlan refuse(String reason) { + return new ChainRepairPlan(false, reason, Collections.emptyList(), Collections.emptyList(), null); + } + + static ChainRepairPlan proceed(List rebaseSteps, List toDelete) { + return new ChainRepairPlan(true, null, rebaseSteps, toDelete, null); + } + + static ChainRepairPlan proceed(List rebaseSteps, List toDelete, PositionShift shift) { + return new ChainRepairPlan(true, null, rebaseSteps, toDelete, shift); + } + } + + private static final class RebaseStep { + final long childBackupId; + final String targetMountRelativePath; + final String newBackingMountRelativePath; + final String newParentUuid; // null when re-pointed onto an existing full's UUID is desired but unavailable + final int newChainPosition; + + RebaseStep(long childBackupId, String targetMountRelativePath, String newBackingMountRelativePath, + String newParentUuid, int newChainPosition) { + this.childBackupId = childBackupId; + this.targetMountRelativePath = targetMountRelativePath; + this.newBackingMountRelativePath = newBackingMountRelativePath; + this.newParentUuid = newParentUuid; + this.newChainPosition = newChainPosition; + } + } + + /** + * Compute the chain-repair plan for deleting {@code backup}. Conservative semantics: + * - Backups outside any tracked chain (no NAS chain metadata) are deleted as-is. + * - A standalone backup with no children is deleted as-is. + * - A middle incremental: rebase its immediate child onto its own parent, then delete it. + * Descendants of that child are unaffected (their backing chain still resolves). + * - A full with surviving descendants: refuse unless {@code forced=true}; then delete + * full + every descendant (newest first). + */ + private ChainRepairPlan computeChainRepair(Backup backup, boolean forced) { + String chainId = readDetail(backup, NASBackupChainKeys.CHAIN_ID); + if (chainId == null) { + // Pre-incremental backups (or callers that never wrote chain metadata) — single delete. + return ChainRepairPlan.proceed(Collections.emptyList(), Collections.singletonList(backup)); + } + + // Gather every backup in the same chain for this VM. + List chain = new ArrayList<>(); + for (Backup b : backupDao.listByVmId(null, backup.getVmId())) { + if (chainId.equals(readDetail(b, NASBackupChainKeys.CHAIN_ID))) { + chain.add(b); + } + } + chain.sort(Comparator.comparingInt(b -> chainPosition(b))); + + int targetPos = chainPosition(backup); + boolean isFull = targetPos == 0; + List descendants = chain.stream() + .filter(b -> chainPosition(b) > targetPos) + .collect(Collectors.toList()); + + if (isFull) { + if (descendants.isEmpty()) { + return ChainRepairPlan.proceed(Collections.emptyList(), Collections.singletonList(backup)); + } + if (!forced) { + return ChainRepairPlan.refuse(String.format( + "Backup %s is the full anchor of a chain with %d incremental(s). Delete the incrementals first, " + + "or pass forced=true to remove the entire chain.", + backup.getUuid(), descendants.size())); + } + // Forced delete: remove descendants newest first, then the full. + List victims = new ArrayList<>(descendants); + victims.sort(Comparator.comparingInt((Backup b) -> chainPosition(b)).reversed()); + victims.add(backup); + return ChainRepairPlan.proceed(Collections.emptyList(), victims); + } + + // Middle (or tail) incremental. + if (descendants.isEmpty()) { + // Tail: nothing to rebase, just delete. + return ChainRepairPlan.proceed(Collections.emptyList(), Collections.singletonList(backup)); + } + + // Middle: only the immediate child needs to absorb our blocks and rebase onto our parent. + Backup immediateChild = descendants.stream() + .min(Comparator.comparingInt(b -> chainPosition(b))) + .orElseThrow(() -> new CloudRuntimeException("Internal error: no immediate child found for chain repair")); + Backup ourParent = chain.stream() + .filter(b -> chainPosition(b) == targetPos - 1) + .findFirst() + .orElseThrow(() -> new CloudRuntimeException(String.format( + "Cannot delete %s: its parent (chain_position=%d) is missing from the chain", + backup.getUuid(), targetPos - 1))); + + VolumeVO rootVolume = volumeDao.getInstanceRootVolume(backup.getVmId()); + String volUuid = rootVolume == null ? "root" : rootVolume.getUuid(); + String childPath = immediateChild.getExternalId() + "/root." + volUuid + ".qcow2"; + String parentPath = ourParent.getExternalId() + "/root." + volUuid + ".qcow2"; + + RebaseStep step = new RebaseStep(immediateChild.getId(), childPath, parentPath, + ourParent.getUuid(), chainPosition(immediateChild) - 1); + + // After we delete the middle backup, every descendant's numeric chain_position + // becomes stale (off by one). Their backing-file pointers don't need re-writing + // (only the immediate child changed parents) but their position metadata does. + return ChainRepairPlan.proceed( + Collections.singletonList(step), + Collections.singletonList(backup), + new PositionShift(chainId, targetPos)); + } + + private int chainPosition(Backup b) { + String s = readDetail(b, NASBackupChainKeys.CHAIN_POSITION); + if (s == null) { + return Integer.MAX_VALUE; // no metadata => sort to end + } + try { + return Integer.parseInt(s); + } catch (NumberFormatException e) { + return Integer.MAX_VALUE; + } } public void syncBackupMetrics(Long zoneId) { diff --git a/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRebaseBackupCommandWrapper.java b/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRebaseBackupCommandWrapper.java new file mode 100644 index 000000000000..3238e0393c94 --- /dev/null +++ b/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRebaseBackupCommandWrapper.java @@ -0,0 +1,59 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +// + +package com.cloud.hypervisor.kvm.resource.wrapper; + +import com.cloud.agent.api.Answer; +import com.cloud.hypervisor.kvm.resource.LibvirtComputingResource; +import com.cloud.resource.CommandWrapper; +import com.cloud.resource.ResourceWrapper; +import com.cloud.utils.Pair; +import com.cloud.utils.script.Script; +import org.apache.cloudstack.backup.BackupAnswer; +import org.apache.cloudstack.backup.RebaseBackupCommand; + +import java.util.ArrayList; +import java.util.List; + +@ResourceWrapper(handles = RebaseBackupCommand.class) +public class LibvirtRebaseBackupCommandWrapper extends CommandWrapper { + @Override + public Answer execute(RebaseBackupCommand command, LibvirtComputingResource libvirtComputingResource) { + List commands = new ArrayList<>(); + commands.add(new String[]{ + libvirtComputingResource.getNasBackupPath(), + "-o", "rebase", + "-t", command.getBackupRepoType(), + "-s", command.getBackupRepoAddress(), + "-m", command.getMountOptions(), + "--rebase-target", command.getTargetPath(), + "--rebase-new-backing", command.getNewBackingPath() + }); + + Pair result = Script.executePipedCommands(commands, libvirtComputingResource.getCmdsTimeout()); + logger.debug("Backup rebase result: {} , exit code: {}", result.second(), result.first()); + + if (result.first() != 0) { + logger.warn("Failed to rebase backup file {} onto {}: {}", + command.getTargetPath(), command.getNewBackingPath(), result.second()); + return new BackupAnswer(command, false, result.second()); + } + return new BackupAnswer(command, true, null); + } +} diff --git a/scripts/vm/hypervisor/kvm/nasbackup.sh b/scripts/vm/hypervisor/kvm/nasbackup.sh index 5611bae962d0..9058a6a0fc98 100755 --- a/scripts/vm/hypervisor/kvm/nasbackup.sh +++ b/scripts/vm/hypervisor/kvm/nasbackup.sh @@ -38,6 +38,9 @@ MODE="" # "full" or "incremental"; empty => legacy full-only behav BITMAP_NEW="" # Bitmap/checkpoint name to create with this backup (e.g. "backup-1711586400") BITMAP_PARENT="" # For incremental: parent bitmap name to read changes since PARENT_PATH="" # For incremental: parent backup file path (used as backing for qemu-img rebase) +# Rebase operation parameters (used only with -o rebase, for chain repair on delete-middle) +REBASE_TARGET="" # The qcow2 file to repoint at a new backing (mount-relative path) +REBASE_NEW_BACKING="" # The new backing parent file (mount-relative path) logFile="/var/log/cloudstack/agent/agent.log" EXIT_CLEANUP_FAILED=20 @@ -363,6 +366,51 @@ delete_backup() { rmdir $mount_point } +# Rebase an existing backup qcow2 (e.g. a chain child) onto a new backing parent so the chain +# stays valid after a middle backup is deleted. Both --target and --new-backing are passed as +# paths relative to the NAS mount root; we resolve them under $mount_point and write the new +# backing reference relative to the target file's directory (mount points are ephemeral). +rebase_backup() { + mount_operation + + if [[ -z "$REBASE_TARGET" || -z "$REBASE_NEW_BACKING" ]]; then + echo "rebase requires --rebase-target and --rebase-new-backing" + cleanup + exit 1 + fi + + local target_abs="$mount_point/$REBASE_TARGET" + local backing_abs="$mount_point/$REBASE_NEW_BACKING" + if [[ ! -f "$target_abs" ]]; then + echo "Rebase target file does not exist: $target_abs" + cleanup + exit 1 + fi + if [[ ! -f "$backing_abs" ]]; then + echo "New backing file does not exist: $backing_abs" + cleanup + exit 1 + fi + local target_dir + target_dir=$(dirname "$target_abs") + local backing_rel + backing_rel=$(realpath --relative-to="$target_dir" "$backing_abs") + + # SAFE rebase (no -u): qemu-img reads blocks from the old chain and writes them into + # the target where the new chain doesn't cover them. This is the "merge into" semantic + # required when we're about to delete the old immediate parent — the target needs to + # absorb the to-be-deleted parent's blocks so the chain remains consistent against the + # new (further-back) backing. + if ! qemu-img rebase -b "$backing_rel" -F qcow2 "$target_abs" >> "$logFile" 2> >(cat >&2); then + echo "qemu-img rebase failed for $target_abs onto $backing_rel" + cleanup + exit 1 + fi + sync + umount $mount_point + rmdir $mount_point +} + get_backup_stats() { mount_operation @@ -476,6 +524,16 @@ while [[ $# -gt 0 ]]; do shift shift ;; + --rebase-target) + REBASE_TARGET="$2" + shift + shift + ;; + --rebase-new-backing) + REBASE_NEW_BACKING="$2" + shift + shift + ;; -h|--help) usage shift @@ -499,6 +557,8 @@ if [ "$OP" = "backup" ]; then fi elif [ "$OP" = "delete" ]; then delete_backup +elif [ "$OP" = "rebase" ]; then + rebase_backup elif [ "$OP" = "stats" ]; then get_backup_stats fi From 49edc7f22c3ed1f3fc7c2cd46bf7d2bd8b4fe3fe Mon Sep 17 00:00:00 2001 From: James Peru Date: Mon, 27 Apr 2026 19:25:30 +0300 Subject: [PATCH 08/13] test(backup): smoke tests for incremental NAS backup chain MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds five new test cases to test_backup_recovery_nas.py covering the end-to-end behaviour of the incremental NAS backup feature: * test_incremental_chain_cadence - Sets nas.backup.full.every=3, takes 5 backups, verifies the type pattern is FULL, INC, INC, FULL, INC. * test_restore_from_incremental - FULL + 2 INCs, each with a marker file. Restores from the latest INC and verifies all three markers are present (i.e. qemu-img convert flattened the chain correctly). * test_delete_middle_incremental_repairs_chain - Builds FULL, INC1, INC2; deletes INC1 (no force needed); restores from the surviving INC2 and verifies that markers from FULL, INC1 (which was deleted), and INC2 are all present — proving the rebase merged INC1's blocks into INC2. * test_refuse_delete_full_with_children - Verifies plain delete of a FULL that has children fails, and delete with forced=true succeeds and removes the whole chain. * test_stopped_vm_falls_back_to_full - Sets cadence to 2, takes one backup (FULL), stops the VM, triggers another (cadence would say INC). Verifies the second backup is recorded as FULL because the agent fell back when backup-begin couldn't run on a stopped VM. All tests restore nas.backup.full.every to 10 in finally blocks. Refs: apache/cloudstack#12899 --- .../smoke/test_backup_recovery_nas.py | 219 ++++++++++++++++++ 1 file changed, 219 insertions(+) diff --git a/test/integration/smoke/test_backup_recovery_nas.py b/test/integration/smoke/test_backup_recovery_nas.py index 409a08acc9f0..3284e566f95f 100644 --- a/test/integration/smoke/test_backup_recovery_nas.py +++ b/test/integration/smoke/test_backup_recovery_nas.py @@ -265,3 +265,222 @@ def test_vm_backup_create_vm_from_backup_in_another_zone(self): self.assertEqual(backup_repository.crosszoneinstancecreation, True, "Cross-Zone Instance Creation could not be enabled on the backup repository") self.vm_backup_create_vm_from_backup_int(template.id, [network.id]) + + # ------------------------------------------------------------------ + # Incremental backup tests (RFC #12899 / PR #13074) + # ------------------------------------------------------------------ + # These tests exercise the incremental NAS backup chain semantics: + # full -> incN cadence, restore-from-incremental, delete-middle chain + # repair, refuse-delete-full-with-children, and stopped-VM fallback. + # + # All tests set nas.backup.full.every to a small value (3) so a chain + # forms quickly without needing many backup iterations. They restore + # the original value at teardown. + + def _set_full_every(self, value): + Configurations.update(self.apiclient, name='nas.backup.full.every', + value=str(value), zoneid=self.zone.id) + + def _backup_type(self, backup): + # Backup objects expose `type`; for chained backups it's "INCREMENTAL", else "FULL". + return getattr(backup, 'type', 'FULL') or 'FULL' + + @attr(tags=["advanced", "backup"], required_hardware="true") + def test_incremental_chain_cadence(self): + """ + With nas.backup.full.every=3, the sequence of backups should be + FULL, INCREMENTAL, INCREMENTAL, FULL, INCREMENTAL, ... + """ + self.backup_offering.assignOffering(self.apiclient, self.vm.id) + self._set_full_every(3) + try: + ssh_client_vm = self.vm.get_ssh_client(reconnect=True) + ssh_client_vm.execute("touch /root/incremental_marker_1.txt") + + created = [] + for i in range(5): + Backup.create(self.apiclient, self.vm.id, "inc_chain_%d" % i) + # write a small change so each incremental has something to capture + ssh_client_vm.execute("dd if=/dev/urandom of=/root/delta_%d bs=64k count=4 2>/dev/null" % i) + time.sleep(2) + created = Backup.list(self.apiclient, self.vm.id) + + self.assertEqual(len(created), 5, "Expected 5 backups after 5 Backup.create calls") + # Sort oldest-first by date + created.sort(key=lambda b: b.created) + + expected = ['FULL', 'INCREMENTAL', 'INCREMENTAL', 'FULL', 'INCREMENTAL'] + actual = [self._backup_type(b).upper() for b in created] + self.assertEqual(actual, expected, + "With nas.backup.full.every=3, chain pattern should be %s but was %s" % (expected, actual)) + + # Cleanup all backups (newest first to satisfy chain rules without forced=true) + for b in reversed(created): + Backup.delete(self.apiclient, b.id) + finally: + self._set_full_every(10) + self.backup_offering.removeOffering(self.apiclient, self.vm.id) + + @attr(tags=["advanced", "backup"], required_hardware="true") + def test_restore_from_incremental(self): + """ + Take FULL + 2 INCREMENTAL backups, each with a marker file. Restore from the + latest incremental and verify all three markers are present (chain flatten). + """ + self.backup_offering.assignOffering(self.apiclient, self.vm.id) + self._set_full_every(5) + try: + ssh_client_vm = self.vm.get_ssh_client(reconnect=True) + ssh_client_vm.execute("touch /root/marker_full.txt") + Backup.create(self.apiclient, self.vm.id, "rfi_full") + time.sleep(3) + + ssh_client_vm.execute("touch /root/marker_inc1.txt") + Backup.create(self.apiclient, self.vm.id, "rfi_inc1") + time.sleep(3) + + ssh_client_vm.execute("touch /root/marker_inc2.txt") + Backup.create(self.apiclient, self.vm.id, "rfi_inc2") + time.sleep(3) + + backups = Backup.list(self.apiclient, self.vm.id) + backups.sort(key=lambda b: b.created) + self.assertEqual(len(backups), 3) + self.assertEqual(self._backup_type(backups[0]).upper(), 'FULL') + self.assertEqual(self._backup_type(backups[2]).upper(), 'INCREMENTAL') + + new_vm_name = "vm-from-inc-" + str(int(time.time())) + new_vm = Backup.createVMFromBackup(self.apiclient, self.services["small"], + mode=self.services["mode"], backupid=backups[2].id, vmname=new_vm_name, + accountname=self.account.name, domainid=self.account.domainid, + zoneid=self.zone.id) + self.cleanup.append(new_vm) + + ssh_new = new_vm.get_ssh_client(reconnect=True) + for marker in ("marker_full.txt", "marker_inc1.txt", "marker_inc2.txt"): + result = ssh_new.execute("ls /root/%s" % marker) + self.assertIn(marker, result[0], + "Restored VM should have %s (chain flattened correctly)" % marker) + + for b in reversed(backups): + Backup.delete(self.apiclient, b.id) + finally: + self._set_full_every(10) + self.backup_offering.removeOffering(self.apiclient, self.vm.id) + + @attr(tags=["advanced", "backup"], required_hardware="true") + def test_delete_middle_incremental_repairs_chain(self): + """ + Delete a MIDDLE incremental from a FULL -> INC1 -> INC2 chain. + The chain repair should rebase INC2 onto FULL, and the final restore + should still produce a working VM with all expected blocks. + """ + self.backup_offering.assignOffering(self.apiclient, self.vm.id) + self._set_full_every(5) + try: + ssh_client_vm = self.vm.get_ssh_client(reconnect=True) + ssh_client_vm.execute("touch /root/dmi_full.txt") + Backup.create(self.apiclient, self.vm.id, "dmi_full") + time.sleep(3) + ssh_client_vm.execute("touch /root/dmi_inc1.txt") + Backup.create(self.apiclient, self.vm.id, "dmi_inc1") + time.sleep(3) + ssh_client_vm.execute("touch /root/dmi_inc2.txt") + Backup.create(self.apiclient, self.vm.id, "dmi_inc2") + time.sleep(3) + + backups = Backup.list(self.apiclient, self.vm.id) + backups.sort(key=lambda b: b.created) + full, inc1, inc2 = backups[0], backups[1], backups[2] + + # Delete the middle incremental — should succeed via chain repair (no force needed) + Backup.delete(self.apiclient, inc1.id) + remaining = Backup.list(self.apiclient, self.vm.id) + self.assertEqual(len(remaining), 2, "After deleting middle inc, two backups should remain") + + # Restore from the remaining tail (formerly inc2) — must still produce a usable VM + new_vm_name = "vm-after-mid-del-" + str(int(time.time())) + new_vm = Backup.createVMFromBackup(self.apiclient, self.services["small"], + mode=self.services["mode"], backupid=inc2.id, vmname=new_vm_name, + accountname=self.account.name, domainid=self.account.domainid, + zoneid=self.zone.id) + self.cleanup.append(new_vm) + ssh_new = new_vm.get_ssh_client(reconnect=True) + # Both the FULL marker and (importantly) the deleted-INC1 marker should still + # be present, because the rebase merged INC1's blocks into INC2. + for marker in ("dmi_full.txt", "dmi_inc1.txt", "dmi_inc2.txt"): + result = ssh_new.execute("ls /root/%s" % marker) + self.assertIn(marker, result[0], + "After mid-incremental delete and rebase, %s should still be restorable" % marker) + + Backup.delete(self.apiclient, inc2.id) + Backup.delete(self.apiclient, full.id) + finally: + self._set_full_every(10) + self.backup_offering.removeOffering(self.apiclient, self.vm.id) + + @attr(tags=["advanced", "backup"], required_hardware="true") + def test_refuse_delete_full_with_children(self): + """ + Deleting a FULL that has surviving incrementals must fail without forced=true. + With forced=true it must succeed and remove the entire chain. + """ + self.backup_offering.assignOffering(self.apiclient, self.vm.id) + self._set_full_every(5) + try: + Backup.create(self.apiclient, self.vm.id, "rdc_full") + time.sleep(3) + Backup.create(self.apiclient, self.vm.id, "rdc_inc") + time.sleep(3) + + backups = Backup.list(self.apiclient, self.vm.id) + backups.sort(key=lambda b: b.created) + full = backups[0] + + failed = False + try: + Backup.delete(self.apiclient, full.id) + except Exception: + failed = True + self.assertTrue(failed, "Deleting a FULL with children should be refused without forced=true") + + # Forced delete should succeed and clear the whole chain + Backup.delete(self.apiclient, full.id, forced=True) + remaining = Backup.list(self.apiclient, self.vm.id) + self.assertIsNone(remaining, "Forced delete of FULL should remove the entire chain") + finally: + self._set_full_every(10) + self.backup_offering.removeOffering(self.apiclient, self.vm.id) + + @attr(tags=["advanced", "backup"], required_hardware="true") + def test_stopped_vm_falls_back_to_full(self): + """ + When a backup is requested while the VM is stopped, even if the chain cadence + would call for an incremental, the agent must fall back to a full and start a + new chain. The incrementalFallback flag should be reflected in backup.type=FULL. + """ + self.backup_offering.assignOffering(self.apiclient, self.vm.id) + self._set_full_every(2) # next backup after the first should be incremental + try: + Backup.create(self.apiclient, self.vm.id, "svf_first") + time.sleep(3) + + # Stop the VM and trigger another backup — should fall back to FULL + self.vm.stop(self.apiclient) + time.sleep(5) + Backup.create(self.apiclient, self.vm.id, "svf_second") + time.sleep(3) + + backups = Backup.list(self.apiclient, self.vm.id) + backups.sort(key=lambda b: b.created) + self.assertEqual(len(backups), 2) + self.assertEqual(self._backup_type(backups[0]).upper(), 'FULL') + self.assertEqual(self._backup_type(backups[1]).upper(), 'FULL', + "Stopped-VM backup must be a FULL even when cadence would have asked for an INCREMENTAL") + + self.vm.start(self.apiclient) + for b in reversed(backups): + Backup.delete(self.apiclient, b.id) + finally: + self._set_full_every(10) + self.backup_offering.removeOffering(self.apiclient, self.vm.id) From d80ed1672374a1142920500e13e268b32132fe3a Mon Sep 17 00:00:00 2001 From: James Peru Date: Tue, 28 Apr 2026 11:36:53 +0300 Subject: [PATCH 09/13] test(backup): mock returns no-backing-chain for rsync-failure test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 6 added a hasBackingChain() check before rsync that uses qemu-img info to detect chained incrementals. The existing testExecuteWithRsyncFailure test mocks Script.runSimpleBashScriptForExitValue to return 0 for any command, so the new qemu-img info check incorrectly evaluates as "has backing chain" and routes the test through the chain-flatten path instead of rsync — the test then asserts a failure that never occurs. Add a clause to the mock that returns 1 (no backing chain) for the qemu-img info backing-filename probe, so the test continues to exercise the rsync path it was designed for. --- .../wrapper/LibvirtRestoreBackupCommandWrapperTest.java | 2 ++ 1 file changed, 2 insertions(+) diff --git a/plugins/hypervisors/kvm/src/test/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRestoreBackupCommandWrapperTest.java b/plugins/hypervisors/kvm/src/test/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRestoreBackupCommandWrapperTest.java index ef6b5c08189d..fd8a3b02e0a0 100644 --- a/plugins/hypervisors/kvm/src/test/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRestoreBackupCommandWrapperTest.java +++ b/plugins/hypervisors/kvm/src/test/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRestoreBackupCommandWrapperTest.java @@ -407,6 +407,8 @@ public void testExecuteWithRsyncFailure() throws Exception { return 0; // File exists } else if (command.contains("qemu-img check")) { return 0; // File is valid + } else if (command.contains("qemu-img info") && command.contains("backing-filename")) { + return 1; // No backing chain — exercise the rsync path (full backups) } return 0; // Other commands success }); From 976402535852499c10570b696c4ee4e977d7c89f Mon Sep 17 00:00:00 2001 From: James Peru Date: Wed, 29 Apr 2026 00:02:59 +0300 Subject: [PATCH 10/13] docs: move RFC out of repo per reviewer feedback @bernardodemarco pointed out that design docs / RFCs go in the project wiki or as a separate issue rather than into the source tree. The RFC content has been posted as a comment on the existing tracking issue #12899 (which is where the design discussion already lives), and the docs/rfcs/ directory is removed from this PR. --- docs/rfcs/incremental-nas-backup.md | 406 ---------------------------- 1 file changed, 406 deletions(-) delete mode 100644 docs/rfcs/incremental-nas-backup.md diff --git a/docs/rfcs/incremental-nas-backup.md b/docs/rfcs/incremental-nas-backup.md deleted file mode 100644 index aa2e55ff55bf..000000000000 --- a/docs/rfcs/incremental-nas-backup.md +++ /dev/null @@ -1,406 +0,0 @@ -# RFC: Incremental NAS Backup Support for KVM Hypervisor - -| Field | Value | -|---------------|--------------------------------------------| -| **Author** | James Peru, Xcobean Systems Limited | -| **Status** | Draft | -| **Created** | 2026-03-27 | -| **Target** | Apache CloudStack 4.23+ | -| **Component** | Backup & Recovery (NAS Backup Provider) | - ---- - -## Summary - -This RFC proposes adding incremental backup support to CloudStack's NAS backup provider for KVM hypervisors. By leveraging QEMU dirty bitmaps and libvirt's `backup-begin` API, CloudStack can track changed disk blocks between backups and export only the delta, reducing daily backup storage consumption by 80--95% and shortening backup windows from hours to minutes for large VMs. The feature is opt-in at the zone level, backward-compatible with existing full-backup behavior, and gracefully degrades on older QEMU/libvirt versions. - ---- - -## Motivation - -CloudStack's NAS backup provider currently performs a full disk copy every time a backup is taken. For a 500 GB VM with five daily backups retained, that amounts to 2.5 TB of storage consumed. At scale -- tens or hundreds of VMs -- this becomes a serious operational and financial burden. - -**Problems with the current approach:** - -1. **Storage waste.** Every backup is a full copy of the entire virtual disk, regardless of how little data actually changed since the last backup. -2. **Long backup windows.** Copying hundreds of gigabytes over NFS or SMB takes hours, increasing the risk of I/O contention on production workloads. -3. **Network bandwidth pressure.** Full-disk transfers saturate the storage network during backup windows, impacting other VMs on the same host. -4. **Uncompetitive feature set.** VMware (Changed Block Tracking / VADP), Proxmox Backup Server, and Veeam all support incremental backups natively. CloudStack's lack of incremental backup is a common complaint on the users@ mailing list and a blocker for adoption in environments with large VMs. - -**What incremental backup achieves:** - -- Only changed blocks are transferred and stored after the initial full backup. -- A typical daily incremental for a 500 GB VM with moderate write activity is 5--15 GB, a reduction of 97--99% compared to a full copy. -- Backup completes in minutes rather than hours. -- Retention of 30+ daily restore points becomes economically feasible. - ---- - -## Proposed Design - -### Backup Chain Model - -Incremental backups form a chain anchored by a periodic full backup: - -``` -Full (Day 0) -> Inc 1 (Day 1) -> Inc 2 (Day 2) -> ... -> Inc 6 (Day 6) -> Full (Day 7) -> ... -``` - -Restoring to any point in time requires the full backup plus every incremental up to the desired restore point. To bound restore complexity and protect against chain corruption, a new full backup is forced at a configurable interval. - -**Global settings (zone scope):** - -| Setting | Type | Default | Description | -|--------------------------------------|---------|---------|------------------------------------------------------| -| `nas.backup.incremental.enabled` | Boolean | `false` | Enable incremental backup for the zone | -| `nas.backup.full.interval` | Integer | `7` | Days between full backups | -| `nas.backup.incremental.max.chain` | Integer | `6` | Max incremental backups before forcing a new full | - -When `nas.backup.incremental.enabled` is `false` (the default), behavior is identical to today -- every backup is a full copy. Existing deployments are unaffected. - ---- - -### Technical Approach - -#### 1. Dirty Bitmap Tracking (QEMU Layer) - -QEMU 4.0 introduced persistent dirty bitmaps: per-disk bitmaps that record which blocks have been written since the bitmap was created. These bitmaps survive QEMU restarts (they are stored in the qcow2 image header) and are the foundation for incremental backup. - -**Lifecycle:** - -1. When incremental backup is enabled for a VM, the agent creates a persistent dirty bitmap on each virtual disk via QMP: - ```json - { - "execute": "block-dirty-bitmap-add", - "arguments": { - "node": "drive-virtio-disk0", - "name": "backup-20260327", - "persistent": true - } - } - ``` -2. QEMU automatically sets bits in this bitmap whenever the guest writes to a block. -3. At backup time, the bitmap tells the backup process exactly which blocks to read. -4. After a successful backup, a new bitmap is created for the next cycle and the old bitmap is optionally removed. - -#### 2. Backup Flow - -**Full backup (Day 0 or every `nas.backup.full.interval` days):** - -```bash -# 1. Export the entire disk to the NAS mount -qemu-img convert -f qcow2 -O qcow2 \ - /var/lib/libvirt/images/vm-disk.qcow2 \ - /mnt/nas/backups/vm-uuid/backup-full-20260327.qcow2 - -# 2. Create a new dirty bitmap to track changes from this point -virsh qemu-monitor-command $DOMAIN --hmp \ - 'block-dirty-bitmap-add drive-virtio-disk0 backup-20260327 persistent=true' -``` - -**Incremental backup (Day 1 through Day N):** - -```bash -# 1. Use libvirt backup-begin with incremental mode -# This exports only blocks dirty since bitmap "backup-20260327" -cat > /tmp/backup.xml <<'XML' - - - - - - - - backup-20260327 - -XML - -virsh backup-begin $DOMAIN /tmp/backup.xml - -# 2. Wait for completion -virsh domjobinfo $DOMAIN --completed - -# 3. Rotate bitmaps: remove old, create new -virsh qemu-monitor-command $DOMAIN --hmp \ - 'block-dirty-bitmap-remove drive-virtio-disk0 backup-20260327' -virsh qemu-monitor-command $DOMAIN --hmp \ - 'block-dirty-bitmap-add drive-virtio-disk0 backup-20260328 persistent=true' -``` - -**New full backup cycle (Day 7):** - -```bash -# Remove all existing bitmaps -virsh qemu-monitor-command $DOMAIN --hmp \ - 'block-dirty-bitmap-remove drive-virtio-disk0 backup-20260327' - -# Take a full backup (same as Day 0) -# Optionally prune expired chains from NAS -``` - -#### 3. Restore Flow - -Restoring from an incremental chain requires replaying the full backup plus all incrementals up to the target restore point. This is handled entirely within `nasbackup.sh` and is transparent to the management server and the end user. - -**Example: Restore to Day 3 (full + 3 incrementals):** - -```bash -# 1. Create a working copy from the full backup -cp /mnt/nas/backups/vm-uuid/backup-full-20260327.qcow2 /tmp/restored.qcow2 - -# 2. Apply each incremental in order using qemu-img rebase -# Each incremental is a thin qcow2 containing only changed blocks. -# Rebasing merges the incremental's blocks into the chain. -qemu-img rebase -u -b /tmp/restored.qcow2 \ - /mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2 - -qemu-img rebase -u -b /mnt/nas/backups/vm-uuid/backup-inc-20260328.qcow2 \ - /mnt/nas/backups/vm-uuid/backup-inc-20260329.qcow2 - -qemu-img rebase -u -b /mnt/nas/backups/vm-uuid/backup-inc-20260329.qcow2 \ - /mnt/nas/backups/vm-uuid/backup-inc-20260330.qcow2 - -# 3. Flatten the chain into a single image -qemu-img convert -f qcow2 -O qcow2 \ - /mnt/nas/backups/vm-uuid/backup-inc-20260330.qcow2 \ - /tmp/vm-restored-final.qcow2 - -# 4. Return the flattened image for CloudStack to import -``` - -An alternative approach uses `qemu-img commit` to merge each layer down. The implementation will benchmark both methods and choose the faster one for large images. - -#### 4. Database Schema Changes - -**Modified table: `backups`** - -| Column | Type | Description | -|--------------------|--------------|------------------------------------------------| -| `backup_type` | VARCHAR(16) | `FULL` or `INCREMENTAL` | -| `parent_backup_id` | BIGINT (FK) | For incremental: ID of the previous backup | -| `bitmap_name` | VARCHAR(128) | QEMU dirty bitmap identifier for this backup | -| `chain_id` | BIGINT (FK) | Links to the backup chain this backup belongs to | - -**New table: `backup_chains`** - -| Column | Type | Description | -|-------------------|-------------|------------------------------------------------| -| `id` | BIGINT (PK) | Auto-increment primary key | -| `vm_instance_id` | BIGINT (FK) | The VM this chain belongs to | -| `full_backup_id` | BIGINT (FK) | The full backup anchoring this chain | -| `state` | VARCHAR(16) | `ACTIVE`, `SEALED`, `EXPIRED` | -| `created` | DATETIME | When the chain was started | - -**Schema migration** will be provided as a Liquibase changeset, consistent with CloudStack's existing migration framework. The new columns are nullable to maintain backward compatibility with existing backup records. - -#### 5. Management Server Changes - -**`BackupManagerImpl` (orchestration):** - -- Before taking a backup, query the active chain for the VM. -- If no active chain exists, or the chain has reached `nas.backup.incremental.max.chain` incrementals, or `nas.backup.full.interval` days have elapsed since the last full backup: schedule a full backup and start a new chain. -- Otherwise: schedule an incremental backup linked to the previous backup in the chain. -- On backup failure: if the bitmap is suspected corrupt, mark the chain as `SEALED` and force a full backup on the next run. - -**`NASBackupProvider.takeBackup()`:** - -- Accept a new parameter `BackupType` (FULL or INCREMENTAL). -- For incremental: pass the parent backup's bitmap name and NAS path to the agent command. - -**`TakeBackupCommand` / `TakeBackupAnswer`:** - -- Add fields: `backupType` (FULL/INCREMENTAL), `parentBackupId`, `bitmapName`, `parentBackupPath`. -- The answer includes the actual size of the backup (important for incrementals, which are much smaller than the disk size). - -**`RestoreBackupCommand`:** - -- Add field: `backupChain` (ordered list of backup paths from full through the target incremental). -- The agent reconstructs the full image from the chain before importing. - -#### 6. KVM Agent Changes - -**`LibvirtTakeBackupCommandWrapper`:** - -- For `FULL` backups: existing behavior (qemu-img convert), plus create the initial dirty bitmap. -- For `INCREMENTAL` backups: use `virsh backup-begin` with `` XML, then rotate bitmaps. -- Pre-flight check: verify QEMU version >= 4.0 and libvirt version >= 6.0. If not met, fall back to full backup and log a warning. - -**`nasbackup.sh` enhancements:** - -- New flag `-i` for incremental mode. -- New flag `-p ` to specify the parent backup on the NAS. -- New flag `-b ` to specify which dirty bitmap to use. -- New subcommand `restore-chain` that accepts an ordered list of backup paths and produces a flattened image. - -**`LibvirtRestoreBackupCommandWrapper`:** - -- If the restore target is an incremental backup, request the full chain from the management server and pass it to `nasbackup.sh restore-chain`. - -#### 7. API Changes - -**Existing API: `createBackup`** - -No change to the API signature. The management server automatically decides full vs. incremental based on the zone configuration and the current chain state. Callers do not need to specify the backup type. - -**Existing API: `listBackups`** - -Response gains two new fields: -- `backuptype` (string): `Full` or `Incremental` -- `parentbackupid` (string): UUID of the parent backup (null for full backups) - -**Existing API: `restoreBackup`** - -No change. The management server resolves the full chain internally. - -#### 8. UI Changes - -- **Backup list view:** Add a "Type" column showing `Full` or `Incremental`, with a visual indicator (e.g., a small chain icon for incrementals). -- **Backup detail view:** Show the backup chain as a vertical timeline: full backup at the top, incrementals branching down, with sizes and timestamps. -- **Restore dialog:** When the user selects an incremental restore point, display a note: "This restore will replay N backups (total chain size: X GB)." -- **Backup schedule settings** (zone-level): Toggle for incremental backup, full backup interval slider, max chain length input. - ---- - -### Storage Savings Projections - -The following estimates assume a moderate write workload (2--5% of disk blocks changed per day), which is typical for application servers, databases with WAL, and file servers. - -| Scenario | Full Backups Only | With Incremental | Savings | -|-----------------------------------|-------------------|--------------------|------------| -| 500 GB VM, 7 daily backups | 3.5 TB | ~550 GB | **84%** | -| 1 TB VM, 30 daily backups | 30 TB | ~1.3 TB | **96%** | -| 100 VMs x 100 GB, weekly cycle | 70 TB/week | ~12 TB/week | **83%** | -| 50 VMs x 200 GB, 30-day retain | 300 TB | ~18 TB | **94%** | - -For environments with higher change rates (e.g., heavy database writes), incremental sizes will be larger, but savings still typically exceed 60%. - ---- - -### Requirements - -| Requirement | Minimum Version | Notes | -|-------------------------|----------------|-------------------------------------------------| -| QEMU | 4.0+ | Dirty bitmap support. Ubuntu 20.04+, RHEL 8+. | -| libvirt | 6.0+ | `virsh backup-begin` support. Ubuntu 22.04+, RHEL 8.3+. | -| CloudStack | 4.19+ | NAS backup provider must already be present. | -| NAS storage | NFS or SMB | No special requirements beyond existing NAS backup support. | - -**Graceful degradation:** If a KVM host runs QEMU < 4.0 or libvirt < 6.0, the agent will detect this at startup and report `incrementalBackupCapable=false` to the management server. Backups for VMs on that host will remain full-only, with a warning logged. No manual intervention is required. - ---- - -### Risks and Mitigations - -| Risk | Impact | Mitigation | -|------|--------|------------| -| **Bitmap corruption** (host crash during backup, QEMU bug) | Incremental backup produces an incomplete or incorrect image | Detect bitmap inconsistency via QMP query; force a new full backup and start a fresh chain. Data in previous full backup is unaffected. | -| **Chain too long** (missed full backup schedule) | Restore time increases; single corrupt link breaks the chain | Enforce `nas.backup.incremental.max.chain` hard limit. If exceeded, the next backup is automatically a full. | -| **Restore complexity** | User confusion about which backup to pick; longer restore for deep chains | Restore logic is fully automated in `nasbackup.sh`. The UI shows a single "Restore" button per restore point, with the chain replayed transparently. | -| **VM live migration during backup** | Dirty bitmap may be lost if migrated mid-backup | Check VM state before backup; abort and retry if migration is in progress. Bitmaps persist across clean shutdowns and restarts but not across live migration in older QEMU versions. For QEMU 6.2+, bitmaps survive migration. | -| **Backward compatibility** | Existing full-backup users should not be affected | Feature is disabled by default. No schema changes affect existing rows (new columns are nullable). Full-backup code path is unchanged. | -| **Disk space during restore** | Flattening a chain requires temporary disk space equal to the full disk size | Use the same scratch space already used for full backup restores. Document the requirement. | - ---- - -### Implementation Plan - -| Phase | Scope | Estimated Effort | -|-------|-------|------------------| -| **Phase 1** | Core incremental backup and restore in `nasbackup.sh` and KVM agent wrappers. Dirty bitmap lifecycle management. Manual testing with `virsh` and `qemu-img`. | 2--3 weeks | -| **Phase 2** | Management server changes: chain management, scheduling logic, global settings, database schema migration, API response changes. | 2 weeks | -| **Phase 3** | UI changes: backup type column, chain visualization, restore dialog enhancements, zone-level settings. | 1 week | -| **Phase 4** | Integration testing (full cycle: enable, backup, restore, disable, upgrade from older version). Edge case testing (host crash, bitmap loss, migration, mixed QEMU versions). Documentation. | 2 weeks | - -**Total estimated effort: 7--8 weeks.** - -We (Xcobean Systems) intend to implement this and submit PRs against the `main` branch. We would appreciate early design feedback before starting implementation to avoid rework. - ---- - -### Prior Art - -- **VMware VADP / Changed Block Tracking (CBT):** VMware's CBT is the industry-standard approach. A change tracking driver inside the hypervisor records changed blocks, and backup vendors query the CBT via the vSphere API. This RFC's approach is analogous, using QEMU dirty bitmaps as the CBT equivalent. - -- **Proxmox Backup Server (PBS):** PBS uses QEMU dirty bitmaps to implement incremental backups natively. Their implementation validates that the dirty bitmap approach is production-ready for KVM/QEMU environments. PBS has been stable since Proxmox VE 6.4 (2020). - -- **Veeam Backup & Replication:** Veeam uses a "reverse incremental" model where the most recent backup is always a synthetic full, and older backups are stored as reverse deltas. This simplifies restore (always restore from the latest full) at the cost of more I/O during backup. We chose the forward-incremental model for simplicity and because it aligns with how QEMU dirty bitmaps work natively. - -- **libvirt backup API:** The `virsh backup-begin` command and its underlying `virDomainBackupBegin()` API were specifically designed for this use case. The libvirt documentation includes examples of incremental backup using dirty bitmaps. See: https://libvirt.org/kbase/incremental-backup.html - ---- - -### About the Author - -Xcobean Systems Limited operates a production Apache CloudStack deployment providing IaaS to 50+ client VMs. We use the NAS backup provider daily and have contributed several improvements to it: - -- PR [#12805](https://github.com/apache/cloudstack/pull/12805) -- NAS backup NPE fix -- PR [#12822](https://github.com/apache/cloudstack/pull/12822) -- Backup restore improvements -- PR [#12826](https://github.com/apache/cloudstack/pull/12826) -- NAS backup script hardening -- PRs [#12843](https://github.com/apache/cloudstack/pull/12843)--[#12848](https://github.com/apache/cloudstack/pull/12848) -- Various NAS backup fixes -- PR [#12872](https://github.com/apache/cloudstack/pull/12872) -- Additional backup provider fixes - -We experience the storage and bandwidth cost of full-only backups firsthand and are motivated to solve this problem upstream rather than maintaining a fork. - ---- - -## Open Questions for Discussion - -We welcome feedback from the community on the following: - -1. **Interest level.** Is there sufficient demand for this feature to justify the implementation effort? We believe so based on mailing list threads, but would like confirmation. - -2. **Dirty bitmaps vs. alternatives.** Are there concerns about relying on QEMU dirty bitmaps? Alternative approaches include file-level deduplication on the NAS (less efficient, not hypervisor-aware) or `qemu-img compare` (slower, requires reading both images). - -3. **Target release.** Should this target CloudStack 4.23, or is a later release more appropriate given the scope? - -4. **Chain model.** We proposed forward-incremental with periodic full backups. Would the community prefer a different model (e.g., reverse-incremental like Veeam, or forever-incremental with periodic synthetic fulls)? - -5. **Scope of first PR.** Should we submit the entire feature as one PR, or break it into smaller PRs (e.g., nasbackup.sh changes first, then agent, then management server, then UI)? - -6. **Testing infrastructure.** We can test against our production environment (Ubuntu 22.04, QEMU 6.2, libvirt 8.0). Are there CI environments or community test labs available for broader testing (RHEL, Rocky, older QEMU versions)? - ---- - -*This RFC is posted as a GitHub Discussion to gather community feedback before implementation begins. Please share your thoughts, concerns, and suggestions.* - ---- - -## Appendix: Related Proposal — CloudStack Infrastructure Backup to NAS - -### Problem -CloudStack's NAS backup provider only backs up VM disks. The management server database, agent configurations, SSL certificates, and global settings are not backed up. If the management server fails, all metadata is lost unless someone manually configured mysqldump. - -### Proposed Solution -Add a new scheduled task that automatically backs up CloudStack infrastructure to the same NAS backup storage. - -**What gets backed up:** -| Component | Method | Size | -|-----------|--------|------| -| CloudStack database (`cloud`, `cloud_usage`) | mysqldump | ~50-500MB | -| Management server config (`/etc/cloudstack/management/`) | tar | <1MB | -| Agent configs (`/etc/cloudstack/agent/`) | tar | <1MB | -| SSL certificates and keystores | tar | <1MB | -| Global settings export | SQL dump | <1MB | - -**Configuration:** -- `nas.infra.backup.enabled` (global, default: false) -- `nas.infra.backup.schedule` (cron expression, default: `0 2 * * *` — daily at 2am) -- `nas.infra.backup.retention` (number of backups to keep, default: 7) - -**Implementation:** -- New class: `InfrastructureBackupTask` extending `ManagedContextRunnable` -- Runs on management server (not KVM agent) -- Uses existing NAS mount point from backup storage pool -- Creates timestamped directory: `infra-backup/2026-03-27/` -- Runs `mysqldump --single-transaction` for hot backup -- Tars config directories -- Manages retention (delete backups older than N days) -- Logs to CloudStack events for audit trail - -**Restore:** -- Manual via CLI: `mysql cloud < backup.sql` + extract config tars -- Future: one-click restore from UI - -This is a much simpler change (~200 lines of Java) that addresses a real operational gap. Could target 4.22.1 or 4.23. - From 72f967aa6d6591d61dcfcc2756f019c40eb3edde Mon Sep 17 00:00:00 2001 From: jmsperu Date: Tue, 5 May 2026 11:23:20 +0300 Subject: [PATCH 11/13] test(backup): mock BackupDetailsDao to fix NPE in NASBackupProviderTest Adds @Mock injection for BackupDetailsDao so NASBackupProvider's backupDetailsDao field is wired during testDeleteBackup and takeBackupSuccessfully, fixing the NPE flagged by @harikrishna-patnala. --- .../org/apache/cloudstack/backup/NASBackupProviderTest.java | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java b/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java index a512292cd28f..5a44562c583d 100644 --- a/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java +++ b/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java @@ -52,6 +52,7 @@ import com.cloud.vm.dao.VMInstanceDao; import org.apache.cloudstack.backup.dao.BackupDao; +import org.apache.cloudstack.backup.dao.BackupDetailsDao; import org.apache.cloudstack.backup.dao.BackupRepositoryDao; import org.apache.cloudstack.backup.dao.BackupOfferingDao; import org.apache.cloudstack.storage.datastore.db.PrimaryDataStoreDao; @@ -96,6 +97,9 @@ public class NASBackupProviderTest { @Mock private VMSnapshotDao vmSnapshotDaoMock; + @Mock + private BackupDetailsDao backupDetailsDao; + @Test public void testDeleteBackup() throws OperationTimedoutException, AgentUnavailableException { Long hostId = 1L; From 7e1691b8dd97cbf05922787f0a2ee33ae4fe38d4 Mon Sep 17 00:00:00 2001 From: jmsperu Date: Sat, 16 May 2026 19:45:45 +0300 Subject: [PATCH 12/13] feat(backup): anchor incremental chain on VM active_checkpoint_id MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per @abh1sar's review at NASBackupProvider.java:251, the "find the latest backup with a bitmap" heuristic in decideChain breaks after a restore — the restored disk image has no QEMU dirty-bitmap, so taking the last-backed-up row as parent for the next incremental produces a chain whose first link references blocks that no longer exist. Track the active checkpoint per-VM in vm_instance_details under the new nas.active_checkpoint_id key: * takeBackup writes it after every successful backup (cleared on the stopped-VM offline path where no bitmap is created). * restoreVMFromBackup / restoreBackupToVM clear it on success, so the next backup is automatically a full and starts a fresh chain. * decideChain reads it first: null => full; non-null => find the matching bitmap_name on a BackedUp backup row; if none, fall back to full. The new VM_ACTIVE_CHECKPOINT_ID key lives next to the existing chain keys in NASBackupChainKeys (it's a vm_instance_details detail, not a backup detail, so no schema migration is needed). --- .../cloudstack/backup/NASBackupChainKeys.java | 9 ++ .../cloudstack/backup/NASBackupProvider.java | 139 ++++++++++++++---- .../backup/NASBackupProviderTest.java | 80 ++++++++++ 3 files changed, 199 insertions(+), 29 deletions(-) diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java index a3e811889114..1bc888a245de 100644 --- a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java @@ -45,6 +45,15 @@ public final class NASBackupChainKeys { /** Set to the bitmap name when this incremental had to recreate its parent bitmap on the host (informational; this incremental is larger than usual). */ public static final String BITMAP_RECREATED = "nas.bitmap_recreated"; + /** + * VM-scoped detail (stored in {@code vm_instance_details}) holding the QEMU dirty-bitmap + * name that currently exists on the running VM and is therefore the only valid parent + * for the next incremental backup. Written by {@link #BITMAP_NAME} on each successful + * backup; cleared on restore (the restored disk image has no bitmap, so the next backup + * must be a fresh full). When the VM has no detail, {@code decideChain} forces full. + */ + public static final String VM_ACTIVE_CHECKPOINT_ID = "nas.active_checkpoint_id"; + private NASBackupChainKeys() { } } diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java index aca3dc108f27..63df4df5b324 100644 --- a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java @@ -38,8 +38,10 @@ import com.cloud.utils.Pair; import com.cloud.utils.component.AdapterBase; import com.cloud.utils.exception.CloudRuntimeException; +import com.cloud.vm.VMInstanceDetailVO; import com.cloud.vm.VirtualMachine; import com.cloud.vm.dao.VMInstanceDao; +import com.cloud.vm.dao.VMInstanceDetailsDao; import com.cloud.vm.snapshot.VMSnapshot; import com.cloud.vm.snapshot.VMSnapshotDetailsVO; import com.cloud.vm.snapshot.VMSnapshotVO; @@ -117,6 +119,9 @@ public class NASBackupProvider extends AdapterBase implements BackupProvider, Co @Inject private VMInstanceDao vmInstanceDao; + @Inject + private VMInstanceDetailsDao vmInstanceDetailsDao; + @Inject private PrimaryDataStoreDao primaryDataStoreDao; @@ -226,6 +231,12 @@ boolean isIncremental() { * appended to the existing chain. Stopped VMs are always full (libvirt {@code backup-begin} * requires a running QEMU process). The {@code nas.backup.full.every} ConfigKey controls * how many backups (full + incrementals) form one chain before a new full is forced. + * + *

The decision is anchored on the VM's {@code nas.active_checkpoint_id} detail, which + * records the bitmap that currently exists on the running QEMU. After a restore that + * detail is cleared, so the next backup is automatically full — even though there may be + * a more recent "last backup taken" row in the database. This matches the prescription in + * the PR review (avoid relying on "last backup" because that breaks after restore).

*/ protected ChainDecision decideChain(VirtualMachine vm) { final String newBitmap = "backup-" + System.currentTimeMillis() / 1000L; @@ -242,38 +253,29 @@ protected ChainDecision decideChain(VirtualMachine vm) { return ChainDecision.fullStart(newBitmap); } - // Walk this VM's backups newest→oldest, find the most recent BackedUp backup that has a - // bitmap stored. If we don't find one, this is the first backup in a chain — start full. - List history = backupDao.listByVmId(vm.getDataCenterId(), vm.getId()); - if (history == null || history.isEmpty()) { + // 1. If the VM has no active_checkpoint_id, there is no bitmap on the host to use as + // a parent. This is the case after restore (we clear it), after VM was just assigned + // to the offering, or on the very first backup. + String activeCheckpoint = readVmActiveCheckpoint(vm.getId()); + if (activeCheckpoint == null) { return ChainDecision.fullStart(newBitmap); } - history.sort(Comparator.comparing(Backup::getDate).reversed()); - Backup parent = null; - String parentBitmap = null; - String parentChainId = null; - int parentChainPosition = -1; - for (Backup b : history) { - if (!Backup.Status.BackedUp.equals(b.getStatus())) { - continue; - } - String bm = readDetail(b, NASBackupChainKeys.BITMAP_NAME); - if (bm == null) { - continue; - } - parent = b; - parentBitmap = bm; - parentChainId = readDetail(b, NASBackupChainKeys.CHAIN_ID); - String posStr = readDetail(b, NASBackupChainKeys.CHAIN_POSITION); - try { - parentChainPosition = posStr == null ? 0 : Integer.parseInt(posStr); - } catch (NumberFormatException e) { - parentChainPosition = 0; - } - break; + // 2. Find the latest BackedUp backup of this VM whose BITMAP_NAME matches the VM's + // active_checkpoint_id. Only that backup is a safe parent — any earlier backup + // would have a bitmap that QEMU may have rotated out. Per the review: + // "The latest backup should have the bitmap_name equal to the VM's + // active_checkpoint_id which will become the parent backup. If not, force full." + Backup parent = findLatestBackedUpBackupWithBitmap(vm.getId(), activeCheckpoint); + if (parent == null) { + LOG.debug("VM {} has active_checkpoint_id={} but no matching backup found — forcing full", + vm.getInstanceName(), activeCheckpoint); + return ChainDecision.fullStart(newBitmap); } - if (parent == null || parentBitmap == null || parentChainId == null) { + + String parentChainId = readDetail(parent, NASBackupChainKeys.CHAIN_ID); + int parentChainPosition = chainPosition(parent); + if (parentChainId == null || parentChainPosition == Integer.MAX_VALUE) { return ChainDecision.fullStart(newBitmap); } @@ -286,10 +288,44 @@ protected ChainDecision decideChain(VirtualMachine vm) { // qcow2 onto it. The path is stored relative to the NAS mount point — the script // resolves it inside its mount session. String parentPath = composeParentBackupPath(parent); - return ChainDecision.incremental(newBitmap, parentBitmap, parentPath, + return ChainDecision.incremental(newBitmap, activeCheckpoint, parentPath, parentChainId, parentChainPosition + 1); } + /** + * Read the {@code nas.active_checkpoint_id} VM detail. Returns {@code null} when no detail + * exists (post-restore, first backup, or after explicit reset). + */ + private String readVmActiveCheckpoint(long vmId) { + VMInstanceDetailVO d = vmInstanceDetailsDao.findDetail(vmId, NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID); + if (d == null) { + return null; + } + String v = d.getValue(); + return (v == null || v.isEmpty()) ? null : v; + } + + /** + * Locate the most-recent {@code BackedUp} backup for {@code vmId} whose bitmap name equals + * {@code bitmapName}. Used by {@link #decideChain} to anchor the next incremental. + */ + private Backup findLatestBackedUpBackupWithBitmap(long vmId, String bitmapName) { + List history = backupDao.listByVmId(null, vmId); + if (history == null || history.isEmpty()) { + return null; + } + history.sort(Comparator.comparing(Backup::getDate).reversed()); + for (Backup b : history) { + if (!Backup.Status.BackedUp.equals(b.getStatus())) { + continue; + } + if (bitmapName.equals(readDetail(b, NASBackupChainKeys.BITMAP_NAME))) { + return b; + } + } + return null; + } + private String readDetail(Backup backup, String key) { BackupDetailVO d = backupDetailsDao.findDetail(backup.getId(), key); return d == null ? null : d.getValue(); @@ -331,6 +367,34 @@ private void persistChainMetadata(Backup backup, ChainDecision decision, String } } + /** + * Upsert the VM's {@code nas.active_checkpoint_id} detail to {@code bitmapName}. Called + * after every successful backup so the next backup's parent-bitmap decision is anchored + * on what actually exists on QEMU, not on "last backup taken". + */ + private void upsertVmActiveCheckpoint(long vmId, String bitmapName) { + VMInstanceDetailVO existing = vmInstanceDetailsDao.findDetail(vmId, NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID); + if (existing == null) { + vmInstanceDetailsDao.addDetail(vmId, NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID, bitmapName, false); + return; + } + existing.setValue(bitmapName); + vmInstanceDetailsDao.update(existing.getId(), existing); + } + + /** + * Remove the VM's {@code nas.active_checkpoint_id} detail. Called from the restore paths: + * after restore the disk image has no QEMU bitmap attached, so any future incremental + * would be based on stale state. Clearing forces the next backup to be a fresh full. + */ + private void clearVmActiveCheckpoint(long vmId) { + VMInstanceDetailVO existing = vmInstanceDetailsDao.findDetail(vmId, NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID); + if (existing != null) { + vmInstanceDetailsDao.removeDetail(vmId, NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID); + LOG.debug("Cleared nas.active_checkpoint_id for VM id={} (was {})", vmId, existing.getValue()); + } + } + private String lookupParentBackupUuid(long vmId, String parentBitmap) { if (parentBitmap == null) { return null; @@ -440,6 +504,18 @@ public Pair takeBackup(final VirtualMachine vm, Boolean quiesce logger.info("NAS incremental for VM {} recreated parent bitmap {} (likely VM was restarted since last backup)", vm.getInstanceName(), answer.getBitmapRecreated()); } + // Pin the VM's active_checkpoint_id to whichever bitmap the agent actually + // created. This is the only valid parent for the next incremental (see + // decideChain). For the stopped-VM offline path BITMAP_CREATED is null — + // no bitmap exists on the host, so we also clear any stale detail from a + // prior online backup. Either way, after this step the detail accurately + // reflects what's on the running QEMU (or absence thereof). + String confirmedBitmap = answer.getBitmapCreated(); + if (confirmedBitmap != null) { + upsertVmActiveCheckpoint(vm.getId(), confirmedBitmap); + } else { + clearVmActiveCheckpoint(vm.getId()); + } return new Pair<>(true, backupVO); } else { throw new CloudRuntimeException("Failed to update backup"); @@ -531,6 +607,11 @@ private Pair restoreVMBackup(VirtualMachine vm, Backup backup) } catch (OperationTimedoutException e) { throw new CloudRuntimeException("Operation to restore backup timed out, please try again"); } + // After a restore the QEMU dirty-bitmap chain is gone — clear active_checkpoint_id so + // the next backup is taken as a fresh full and starts a new chain. See decideChain. + if (answer != null && answer.getResult()) { + clearVmActiveCheckpoint(vm.getId()); + } return new Pair<>(answer.getResult(), answer.getDetails()); } diff --git a/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java b/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java index 5a44562c583d..438a4a3a6325 100644 --- a/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java +++ b/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java @@ -48,8 +48,10 @@ import com.cloud.storage.VolumeVO; import com.cloud.storage.dao.VolumeDao; import com.cloud.utils.Pair; +import com.cloud.vm.VMInstanceDetailVO; import com.cloud.vm.VMInstanceVO; import com.cloud.vm.dao.VMInstanceDao; +import com.cloud.vm.dao.VMInstanceDetailsDao; import org.apache.cloudstack.backup.dao.BackupDao; import org.apache.cloudstack.backup.dao.BackupDetailsDao; @@ -100,6 +102,9 @@ public class NASBackupProviderTest { @Mock private BackupDetailsDao backupDetailsDao; + @Mock + private VMInstanceDetailsDao vmInstanceDetailsDao; + @Test public void testDeleteBackup() throws OperationTimedoutException, AgentUnavailableException { Long hostId = 1L; @@ -357,4 +362,79 @@ public void testGetVMHypervisorHostFallbackToZoneWideKVMHost() { Mockito.verify(hostDao).findHypervisorHostInCluster(clusterId); Mockito.verify(resourceManager).findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM, zoneId); } + + // -- decideChain anchored on VM's active_checkpoint_id ------------------------------- + + /** + * No active_checkpoint_id on the VM (post-restore, first-ever backup, or detail purged) => + * decideChain must return a fresh full. This is the regression abh1sar called out at + * NASBackupProvider.java:251 ("This logic breaks after restore. We can't use the last + * backup taken as parent in that case."). + */ + @Test + public void decideChainReturnsFullWhenVmHasNoActiveCheckpoint() { + Long zoneId = 1L; + Long vmId = 42L; + VMInstanceVO vm = mock(VMInstanceVO.class); + Mockito.when(vm.getId()).thenReturn(vmId); + Mockito.when(vm.getDataCenterId()).thenReturn(zoneId); + Mockito.when(vm.getState()).thenReturn(VMInstanceVO.State.Running); + + Mockito.when(vmInstanceDetailsDao.findDetail(vmId, NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID)).thenReturn(null); + + NASBackupProvider.ChainDecision decision = nasBackupProvider.decideChain(vm); + Assert.assertNotNull(decision); + Assert.assertEquals(NASBackupChainKeys.TYPE_FULL, decision.mode); + Assert.assertNull(decision.bitmapParent); + Assert.assertEquals(0, decision.chainPosition); + } + + /** + * After a successful restoreVMFromBackup, decideChain on the next backup must produce + * a full. We verify by checking that vmInstanceDetailsDao.removeDetail is called with + * the active_checkpoint_id key. + */ + @Test + public void restoreClearsActiveCheckpointDetail() throws AgentUnavailableException, OperationTimedoutException { + Long vmId = 7L; + Long hostId = 8L; + Long backupOfferingId = 9L; + + VMInstanceVO vm = mock(VMInstanceVO.class); + Mockito.when(vm.getId()).thenReturn(vmId); + Mockito.when(vm.getLastHostId()).thenReturn(hostId); + Mockito.when(vm.getRemoved()).thenReturn(null); + Mockito.when(vm.getName()).thenReturn("vm7"); + + HostVO host = mock(HostVO.class); + Mockito.when(host.getStatus()).thenReturn(Status.Up); + Mockito.when(host.getId()).thenReturn(hostId); + Mockito.when(hostDao.findById(hostId)).thenReturn(host); + + BackupVO backup = new BackupVO(); + backup.setVmId(vmId); + backup.setBackupOfferingId(backupOfferingId); + backup.setExternalId("i-2-7-VM/2026.05.16.10.00.00"); + ReflectionTestUtils.setField(backup, "id", 100L); + // backedUpVolumes defaults to null => BackupVO.getBackedUpVolumes returns emptyList(). + + BackupRepositoryVO repo = new BackupRepositoryVO(1L, "nas", "test-repo", + "nfs", "address", "sync", 1024L, null); + Mockito.when(backupRepositoryDao.findByBackupOfferingId(backupOfferingId)).thenReturn(repo); + + Mockito.when(volumeDao.findByInstance(vmId)).thenReturn(Collections.emptyList()); + + BackupAnswer answer = mock(BackupAnswer.class); + Mockito.when(answer.getResult()).thenReturn(true); + Mockito.when(agentManager.send(Mockito.anyLong(), Mockito.any(RestoreBackupCommand.class))).thenReturn(answer); + + // Pre-existing checkpoint detail so removeDetail has something to "clear". + VMInstanceDetailVO existing = mock(VMInstanceDetailVO.class); + Mockito.when(existing.getValue()).thenReturn("backup-1715000000"); + Mockito.when(vmInstanceDetailsDao.findDetail(vmId, NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID)).thenReturn(existing); + + boolean ok = nasBackupProvider.restoreVMFromBackup(vm, backup); + Assert.assertTrue(ok); + Mockito.verify(vmInstanceDetailsDao).removeDetail(vmId, NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID); + } } From b7b74c4f883167e9b8973eccacfc25572006540b Mon Sep 17 00:00:00 2001 From: jmsperu Date: Sat, 16 May 2026 19:46:12 +0300 Subject: [PATCH 13/13] refactor(backup): mirror snapshot-style delete chain for NAS incrementals MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per @abh1sar's review at NASBackupProvider.java:696, the previous rebase-the-child / chain-repair logic is hard to reason about and not consistent with how CloudStack handles incremental-snapshot deletion elsewhere. Replace it with the same pattern used by DefaultSnapshotStrategy#deleteSnapshotChain: * Delete request for backup B1: * If B1 has live children, mark B1 with nas.delete_pending=true and return. The on-NAS file and DB row stay until the last descendant is gone. * If B1 has no live children, physically delete its qcow2 directory and DB row, then walk up the chain — for each ancestor that is delete-pending and now childless, sweep it too. * forced=true cascades the entire subtree leaf-first in one call (for operators who want a chain gone immediately). Drops the rebase code path entirely from NASBackupProvider (the old ChainRepairPlan / RebaseStep / PositionShift inner classes and the RebaseBackupCommand dispatch). The RebaseBackupCommand class and its libvirt wrapper remain in tree as dead code for now — separate cleanup. Tests: * New deleteWithLiveChildMarksDeletePendingAndPreservesFile verifies the tombstone-on-busy-parent path: no agent traffic, no DB removal, a DELETE_PENDING detail is persisted. * New deletingLeafSweepsUpDeletePendingParent verifies the sweep: deleting the last child cascades up and physically deletes the tombstoned parent. * Existing testDeleteBackup still passes — without a CHAIN_ID detail the backup hits the no-chain fast path. --- .../cloudstack/backup/NASBackupChainKeys.java | 8 + .../cloudstack/backup/NASBackupProvider.java | 350 ++++++++---------- .../backup/NASBackupProviderTest.java | 159 +++++++- 3 files changed, 321 insertions(+), 196 deletions(-) diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java index 1bc888a245de..1931a24f6a2e 100644 --- a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java @@ -45,6 +45,14 @@ public final class NASBackupChainKeys { /** Set to the bitmap name when this incremental had to recreate its parent bitmap on the host (informational; this incremental is larger than usual). */ public static final String BITMAP_RECREATED = "nas.bitmap_recreated"; + /** + * Tombstone key stored in {@code backup_details} when a backup is requested for deletion + * but still has live children. The on-NAS file is preserved until the last child is gone, + * at which point cascade deletion collects every tombstoned ancestor. Mirrors the snapshot + * subsystem's {@code Hidden} state semantics (see {@code DefaultSnapshotStrategy}). + */ + public static final String DELETE_PENDING = "nas.delete_pending"; + /** * VM-scoped detail (stored in {@code vm_instance_details}) holding the QEMU dirty-bitmap * name that currently exists on the running VM and is therefore the only valid parent diff --git a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java index 63df4df5b324..2c9cc6153108 100644 --- a/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java +++ b/plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java @@ -774,232 +774,194 @@ public boolean deleteBackup(Backup backup, boolean forced) { throw new CloudRuntimeException(String.format("Unable to find a running KVM host in zone %d to delete backup %s", backup.getZoneId(), backup.getUuid())); } - // Repair the chain (if any) before removing the backup file. For chained backups, - // children that point at this backup must be re-pointed at this backup's parent - // (with their blocks merged via qemu-img rebase). For a full at the head of a chain - // with surviving children, refuse unless forced — `forced=true` then deletes the - // full plus every descendant. - ChainRepairPlan plan = computeChainRepair(backup, forced); - if (!plan.proceed) { - throw new CloudRuntimeException(plan.reason); - } - - // Issue rebase commands for each child that needs re-pointing (ordered so each rebase - // operates on a chain that still resolves: children first if there are nested ones). - for (RebaseStep step : plan.rebaseSteps) { - RebaseBackupCommand rebase = new RebaseBackupCommand(step.targetMountRelativePath, - step.newBackingMountRelativePath, backupRepository.getType(), - backupRepository.getAddress(), backupRepository.getMountOptions()); - BackupAnswer rebaseAnswer; - try { - rebaseAnswer = (BackupAnswer) agentManager.send(host.getId(), rebase); - } catch (AgentUnavailableException e) { - throw new CloudRuntimeException("Unable to contact backend control plane to repair backup chain"); - } catch (OperationTimedoutException e) { - throw new CloudRuntimeException("Backup chain repair (rebase) timed out, please try again"); - } - if (rebaseAnswer == null || !rebaseAnswer.getResult()) { - throw new CloudRuntimeException(String.format( - "Backup chain repair failed: rebase of %s onto %s returned %s", - step.targetMountRelativePath, step.newBackingMountRelativePath, - rebaseAnswer == null ? "no answer" : rebaseAnswer.getDetails())); - } - // Update the rebased child's parent reference + position in backup_details. - BackupDetailVO parentDetail = backupDetailsDao.findDetail(step.childBackupId, NASBackupChainKeys.PARENT_BACKUP_ID); - if (parentDetail != null) { - parentDetail.setValue(step.newParentUuid == null ? "" : step.newParentUuid); - backupDetailsDao.update(parentDetail.getId(), parentDetail); - } else if (step.newParentUuid != null) { - backupDetailsDao.persist(new BackupDetailVO(step.childBackupId, - NASBackupChainKeys.PARENT_BACKUP_ID, step.newParentUuid, true)); - } - BackupDetailVO posDetail = backupDetailsDao.findDetail(step.childBackupId, NASBackupChainKeys.CHAIN_POSITION); - if (posDetail != null) { - posDetail.setValue(String.valueOf(step.newChainPosition)); - backupDetailsDao.update(posDetail.getId(), posDetail); - } + // Backups outside any tracked chain (legacy or non-incremental providers) are + // deleted straight away — no children semantics apply. + if (readDetail(backup, NASBackupChainKeys.CHAIN_ID) == null) { + return deleteBackupFileAndRow(backup, backupRepository, host); } - // Now delete this backup's files. For a forced delete of a full with descendants we - // also delete all descendants' files (newest first so each rm targets a leaf). - for (Backup victim : plan.toDelete) { - DeleteBackupCommand command = new DeleteBackupCommand(victim.getExternalId(), backupRepository.getType(), - backupRepository.getAddress(), backupRepository.getMountOptions()); - BackupAnswer answer; - try { - answer = (BackupAnswer) agentManager.send(host.getId(), command); - } catch (AgentUnavailableException e) { - throw new CloudRuntimeException("Unable to contact backend control plane to initiate backup"); - } catch (OperationTimedoutException e) { - throw new CloudRuntimeException("Operation to delete backup timed out, please try again"); - } - if (answer == null || !answer.getResult()) { - logger.warn("Failed to delete backup file for {} ({}); leaving DB row intact", victim.getUuid(), victim.getExternalId()); - return false; - } - backupDao.remove(victim.getId()); + // Snapshot-style cascade: defer the on-NAS rm + DB row while there are live children, + // mark this backup as delete-pending, and let the leaf's deletion sweep it up later. + // See DefaultSnapshotStrategy#deleteSnapshotChain for the same pattern on incremental + // snapshots. forced=true means the caller wants the entire subtree gone right now. + if (forced) { + return cascadeDeleteSubtree(backup, backupRepository, host); } - // Shift chain_position down by 1 for any survivors deeper in the chain than the - // backup we just removed (their direct parent reference is unchanged, but their - // numeric position needs to stay consistent so future full-every cadence math works). - if (plan.shiftPositionsBelow != null) { - for (Backup b : backupDao.listByVmId(null, backup.getVmId())) { - if (!plan.shiftPositionsBelow.chainId.equals(readDetail(b, NASBackupChainKeys.CHAIN_ID))) { - continue; - } - int pos = chainPosition(b); - if (pos > plan.shiftPositionsBelow.afterPosition && pos != Integer.MAX_VALUE) { - BackupDetailVO posDetail = backupDetailsDao.findDetail(b.getId(), NASBackupChainKeys.CHAIN_POSITION); - if (posDetail != null) { - posDetail.setValue(String.valueOf(pos - 1)); - backupDetailsDao.update(posDetail.getId(), posDetail); - } - } - } + List liveChildren = findLiveChildren(backup); + if (!liveChildren.isEmpty()) { + markDeletePending(backup); + LOG.debug("Backup {} has {} live child backup(s); marking as delete-pending. The on-NAS file " + + "and DB row will be removed once the last descendant is gone, or pass forced=true.", + backup.getUuid(), liveChildren.size()); + return true; } - return true; - } - - private static final class PositionShift { - final String chainId; - final int afterPosition; // shift positions strictly greater than this by -1 - PositionShift(String chainId, int afterPosition) { - this.chainId = chainId; - this.afterPosition = afterPosition; - } + // No live children — physically delete this backup, then walk up the chain and + // collect any ancestors that were left in delete-pending state. + return deleteBackupAndSweepPendingAncestors(backup, backupRepository, host); } /** - * Result of {@link #computeChainRepair}: whether to proceed, what to rebase, what to delete. + * The single physical-delete step: rm the on-NAS directory, then remove the DB row. + * Returns {@code false} (and leaves both intact) if the agent reports failure, so the + * caller's recursion stops cleanly. */ - private static final class ChainRepairPlan { - final boolean proceed; - final String reason; - final List rebaseSteps; - final List toDelete; - final PositionShift shiftPositionsBelow; - - private ChainRepairPlan(boolean proceed, String reason, List rebaseSteps, List toDelete, - PositionShift shiftPositionsBelow) { - this.proceed = proceed; - this.reason = reason; - this.rebaseSteps = rebaseSteps; - this.toDelete = toDelete; - this.shiftPositionsBelow = shiftPositionsBelow; + private boolean deleteBackupFileAndRow(Backup backup, BackupRepository repo, Host host) { + DeleteBackupCommand command = new DeleteBackupCommand(backup.getExternalId(), repo.getType(), + repo.getAddress(), repo.getMountOptions()); + BackupAnswer answer; + try { + answer = (BackupAnswer) agentManager.send(host.getId(), command); + } catch (AgentUnavailableException e) { + throw new CloudRuntimeException("Unable to contact backend control plane to initiate backup"); + } catch (OperationTimedoutException e) { + throw new CloudRuntimeException("Operation to delete backup timed out, please try again"); } - - static ChainRepairPlan refuse(String reason) { - return new ChainRepairPlan(false, reason, Collections.emptyList(), Collections.emptyList(), null); + if (answer == null || !answer.getResult()) { + logger.warn("Failed to delete backup file for {} ({}); leaving DB row intact", + backup.getUuid(), backup.getExternalId()); + return false; } + backupDao.remove(backup.getId()); + return true; + } - static ChainRepairPlan proceed(List rebaseSteps, List toDelete) { - return new ChainRepairPlan(true, null, rebaseSteps, toDelete, null); + /** + * Mark {@code backup} as delete-pending in {@code backup_details}. Idempotent. + */ + private void markDeletePending(Backup backup) { + BackupDetailVO existing = backupDetailsDao.findDetail(backup.getId(), NASBackupChainKeys.DELETE_PENDING); + if (existing == null) { + backupDetailsDao.persist(new BackupDetailVO(backup.getId(), + NASBackupChainKeys.DELETE_PENDING, "true", true)); } + } - static ChainRepairPlan proceed(List rebaseSteps, List toDelete, PositionShift shift) { - return new ChainRepairPlan(true, null, rebaseSteps, toDelete, shift); - } + /** + * @return true if this backup carries the delete-pending tombstone. + */ + private boolean isDeletePending(Backup backup) { + BackupDetailVO d = backupDetailsDao.findDetail(backup.getId(), NASBackupChainKeys.DELETE_PENDING); + return d != null && "true".equalsIgnoreCase(d.getValue()); } - private static final class RebaseStep { - final long childBackupId; - final String targetMountRelativePath; - final String newBackingMountRelativePath; - final String newParentUuid; // null when re-pointed onto an existing full's UUID is desired but unavailable - final int newChainPosition; - - RebaseStep(long childBackupId, String targetMountRelativePath, String newBackingMountRelativePath, - String newParentUuid, int newChainPosition) { - this.childBackupId = childBackupId; - this.targetMountRelativePath = targetMountRelativePath; - this.newBackingMountRelativePath = newBackingMountRelativePath; - this.newParentUuid = newParentUuid; - this.newChainPosition = newChainPosition; + /** + * Return the live (not delete-pending, not Removed) children of {@code parent} within the + * same chain. Equivalent to "incrementals whose parent_backup_id points at parent". + */ + private List findLiveChildren(Backup parent) { + String parentUuid = parent.getUuid(); + String chainId = readDetail(parent, NASBackupChainKeys.CHAIN_ID); + if (parentUuid == null || chainId == null) { + return Collections.emptyList(); + } + List children = new ArrayList<>(); + for (Backup b : backupDao.listByVmId(null, parent.getVmId())) { + if (b.getId() == parent.getId()) { + continue; + } + if (!chainId.equals(readDetail(b, NASBackupChainKeys.CHAIN_ID))) { + continue; + } + if (!parentUuid.equals(readDetail(b, NASBackupChainKeys.PARENT_BACKUP_ID))) { + continue; + } + if (isDeletePending(b)) { + // Tombstoned children don't keep us alive — they're already on the way out. + continue; + } + children.add(b); } + return children; } /** - * Compute the chain-repair plan for deleting {@code backup}. Conservative semantics: - * - Backups outside any tracked chain (no NAS chain metadata) are deleted as-is. - * - A standalone backup with no children is deleted as-is. - * - A middle incremental: rebase its immediate child onto its own parent, then delete it. - * Descendants of that child are unaffected (their backing chain still resolves). - * - A full with surviving descendants: refuse unless {@code forced=true}; then delete - * full + every descendant (newest first). + * Look up this backup's immediate parent in the chain (by {@code PARENT_BACKUP_ID}). + * Returns {@code null} if this is the full (no parent) or the parent row is gone. */ - private ChainRepairPlan computeChainRepair(Backup backup, boolean forced) { - String chainId = readDetail(backup, NASBackupChainKeys.CHAIN_ID); - if (chainId == null) { - // Pre-incremental backups (or callers that never wrote chain metadata) — single delete. - return ChainRepairPlan.proceed(Collections.emptyList(), Collections.singletonList(backup)); + private Backup findChainParent(Backup backup) { + String parentUuid = readDetail(backup, NASBackupChainKeys.PARENT_BACKUP_ID); + if (parentUuid == null || parentUuid.isEmpty()) { + return null; } - - // Gather every backup in the same chain for this VM. - List chain = new ArrayList<>(); for (Backup b : backupDao.listByVmId(null, backup.getVmId())) { - if (chainId.equals(readDetail(b, NASBackupChainKeys.CHAIN_ID))) { - chain.add(b); + if (parentUuid.equals(b.getUuid())) { + return b; } } - chain.sort(Comparator.comparingInt(b -> chainPosition(b))); - - int targetPos = chainPosition(backup); - boolean isFull = targetPos == 0; - List descendants = chain.stream() - .filter(b -> chainPosition(b) > targetPos) - .collect(Collectors.toList()); + return null; + } - if (isFull) { - if (descendants.isEmpty()) { - return ChainRepairPlan.proceed(Collections.emptyList(), Collections.singletonList(backup)); - } - if (!forced) { - return ChainRepairPlan.refuse(String.format( - "Backup %s is the full anchor of a chain with %d incremental(s). Delete the incrementals first, " + - "or pass forced=true to remove the entire chain.", - backup.getUuid(), descendants.size())); + /** + * Physically delete {@code backup}, then walk up the chain while each ancestor is in + * delete-pending state with no other live children. Mirrors the snapshot subsystem + * pattern: once a leaf is gone, garbage-collect any tombstoned parents. + */ + private boolean deleteBackupAndSweepPendingAncestors(Backup backup, BackupRepository repo, Host host) { + if (!deleteBackupFileAndRow(backup, repo, host)) { + return false; + } + Backup parent = findChainParent(backup); + while (parent != null && isDeletePending(parent) && findLiveChildren(parent).isEmpty()) { + Backup nextParent = findChainParent(parent); + if (!deleteBackupFileAndRow(parent, repo, host)) { + // Stop the sweep; the rest of the tombstoned chain will be collected on a + // future delete that re-runs the sweep. + return true; } - // Forced delete: remove descendants newest first, then the full. - List victims = new ArrayList<>(descendants); - victims.sort(Comparator.comparingInt((Backup b) -> chainPosition(b)).reversed()); - victims.add(backup); - return ChainRepairPlan.proceed(Collections.emptyList(), victims); + parent = nextParent; } + return true; + } - // Middle (or tail) incremental. - if (descendants.isEmpty()) { - // Tail: nothing to rebase, just delete. - return ChainRepairPlan.proceed(Collections.emptyList(), Collections.singletonList(backup)); + /** + * Forced delete: remove this backup plus every descendant, leaf-first. Used when the + * caller explicitly passes {@code forced=true} and wants the whole subtree gone now. + */ + private boolean cascadeDeleteSubtree(Backup root, BackupRepository repo, Host host) { + List subtree = collectSubtree(root); + // Sort by chain_position descending so leaves go first — this keeps every rm + // operating on a chain that still resolves, even if the user is watching mid-flight. + subtree.sort(Comparator.comparingInt((Backup b) -> chainPosition(b)).reversed()); + boolean ok = true; + for (Backup victim : subtree) { + if (!deleteBackupFileAndRow(victim, repo, host)) { + ok = false; + break; + } } + return ok; + } - // Middle: only the immediate child needs to absorb our blocks and rebase onto our parent. - Backup immediateChild = descendants.stream() - .min(Comparator.comparingInt(b -> chainPosition(b))) - .orElseThrow(() -> new CloudRuntimeException("Internal error: no immediate child found for chain repair")); - Backup ourParent = chain.stream() - .filter(b -> chainPosition(b) == targetPos - 1) - .findFirst() - .orElseThrow(() -> new CloudRuntimeException(String.format( - "Cannot delete %s: its parent (chain_position=%d) is missing from the chain", - backup.getUuid(), targetPos - 1))); - - VolumeVO rootVolume = volumeDao.getInstanceRootVolume(backup.getVmId()); - String volUuid = rootVolume == null ? "root" : rootVolume.getUuid(); - String childPath = immediateChild.getExternalId() + "/root." + volUuid + ".qcow2"; - String parentPath = ourParent.getExternalId() + "/root." + volUuid + ".qcow2"; - - RebaseStep step = new RebaseStep(immediateChild.getId(), childPath, parentPath, - ourParent.getUuid(), chainPosition(immediateChild) - 1); - - // After we delete the middle backup, every descendant's numeric chain_position - // becomes stale (off by one). Their backing-file pointers don't need re-writing - // (only the immediate child changed parents) but their position metadata does. - return ChainRepairPlan.proceed( - Collections.singletonList(step), - Collections.singletonList(backup), - new PositionShift(chainId, targetPos)); + /** + * Collect {@code root} plus every transitive child in the same chain. BFS-style. + */ + private List collectSubtree(Backup root) { + List result = new ArrayList<>(); + result.add(root); + int idx = 0; + while (idx < result.size()) { + Backup cur = result.get(idx++); + // findLiveChildren skips delete-pending — for forced cascade we want them too. + String parentUuid = cur.getUuid(); + String chainId = readDetail(cur, NASBackupChainKeys.CHAIN_ID); + if (parentUuid == null || chainId == null) { + continue; + } + for (Backup b : backupDao.listByVmId(null, cur.getVmId())) { + if (b.getId() == cur.getId()) { + continue; + } + if (!chainId.equals(readDetail(b, NASBackupChainKeys.CHAIN_ID))) { + continue; + } + if (parentUuid.equals(readDetail(b, NASBackupChainKeys.PARENT_BACKUP_ID))) { + result.add(b); + } + } + } + return result; } private int chainPosition(Backup b) { diff --git a/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java b/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java index 438a4a3a6325..287695578d1d 100644 --- a/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java +++ b/plugins/backup/nas/src/test/java/org/apache/cloudstack/backup/NASBackupProviderTest.java @@ -28,6 +28,7 @@ import org.junit.Assert; import org.junit.Test; import org.junit.runner.RunWith; +import org.mockito.ArgumentCaptor; import org.mockito.InjectMocks; import org.mockito.Mock; import org.mockito.Mockito; @@ -389,10 +390,12 @@ public void decideChainReturnsFullWhenVmHasNoActiveCheckpoint() { Assert.assertEquals(0, decision.chainPosition); } + // -- restore clears active_checkpoint_id --------------------------------------------- + /** * After a successful restoreVMFromBackup, decideChain on the next backup must produce - * a full. We verify by checking that vmInstanceDetailsDao.removeDetail is called with - * the active_checkpoint_id key. + * a full. We verify this end-to-end by checking that vmInstanceDetailsDao.removeDetail + * is called with the active_checkpoint_id key. */ @Test public void restoreClearsActiveCheckpointDetail() throws AgentUnavailableException, OperationTimedoutException { @@ -437,4 +440,156 @@ public void restoreClearsActiveCheckpointDetail() throws AgentUnavailableExcepti Assert.assertTrue(ok); Mockito.verify(vmInstanceDetailsDao).removeDetail(vmId, NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID); } + + // -- delete-pending cascade ---------------------------------------------------------- + + /** + * Deleting an incremental that has a live child must mark the incremental as + * delete-pending in backup_details and NOT touch the on-NAS file or the backups row. + * This is the snapshot-style pattern abh1sar requested at NASBackupProvider.java:696. + */ + @Test + public void deleteWithLiveChildMarksDeletePendingAndPreservesFile() + throws AgentUnavailableException, OperationTimedoutException { + Long zoneId = 1L; + Long vmId = 2L; + Long hostId = 3L; + Long offeringId = 4L; + + BackupVO parent = new BackupVO(); + parent.setVmId(vmId); + parent.setBackupOfferingId(offeringId); + parent.setExternalId("i-2-2-VM/2026.05.10.10.00.00"); + parent.setZoneId(zoneId); + ReflectionTestUtils.setField(parent, "id", 50L); + ReflectionTestUtils.setField(parent, "uuid", "parent-uuid"); + + VMInstanceVO vm = mock(VMInstanceVO.class); + Mockito.when(vm.getLastHostId()).thenReturn(hostId); + HostVO host = mock(HostVO.class); + Mockito.when(host.getStatus()).thenReturn(Status.Up); + Mockito.when(host.getId()).thenReturn(hostId); + Mockito.when(hostDao.findById(hostId)).thenReturn(host); + + BackupRepositoryVO repo = new BackupRepositoryVO(1L, "nas", "test-repo", + "nfs", "address", "sync", 1024L, null); + Mockito.when(backupRepositoryDao.findByBackupOfferingId(offeringId)).thenReturn(repo); + Mockito.when(vmInstanceDao.findByIdIncludingRemoved(vmId)).thenReturn(vm); + + // CHAIN_ID on the parent => not the no-chain fast path. + BackupDetailVO chainIdDetail = new BackupDetailVO(50L, NASBackupChainKeys.CHAIN_ID, "chain-1", true); + Mockito.when(backupDetailsDao.findDetail(50L, NASBackupChainKeys.CHAIN_ID)).thenReturn(chainIdDetail); + + // A live child references parent-uuid via PARENT_BACKUP_ID. + BackupVO child = new BackupVO(); + child.setVmId(vmId); + child.setBackupOfferingId(offeringId); + child.setExternalId("i-2-2-VM/2026.05.10.10.30.00"); + child.setZoneId(zoneId); + child.setStatus(Backup.Status.BackedUp); + ReflectionTestUtils.setField(child, "id", 51L); + ReflectionTestUtils.setField(child, "uuid", "child-uuid"); + + BackupDetailVO childChainId = new BackupDetailVO(51L, NASBackupChainKeys.CHAIN_ID, "chain-1", true); + BackupDetailVO childParent = new BackupDetailVO(51L, NASBackupChainKeys.PARENT_BACKUP_ID, "parent-uuid", true); + Mockito.when(backupDetailsDao.findDetail(51L, NASBackupChainKeys.CHAIN_ID)).thenReturn(childChainId); + Mockito.when(backupDetailsDao.findDetail(51L, NASBackupChainKeys.PARENT_BACKUP_ID)).thenReturn(childParent); + Mockito.when(backupDetailsDao.findDetail(51L, NASBackupChainKeys.DELETE_PENDING)).thenReturn(null); + + Mockito.when(backupDao.listByVmId(null, vmId)).thenReturn(List.of(parent, child)); + + boolean result = nasBackupProvider.deleteBackup(parent, false); + Assert.assertTrue(result); + + // No agent traffic — the on-NAS file must be preserved while children are alive. + Mockito.verify(agentManager, Mockito.never()).send(Mockito.anyLong(), Mockito.any(DeleteBackupCommand.class)); + // No DB row removal — the row is the tombstone marker. + Mockito.verify(backupDao, Mockito.never()).remove(50L); + // A DELETE_PENDING detail was persisted. + ArgumentCaptor captor = ArgumentCaptor.forClass(BackupDetailVO.class); + Mockito.verify(backupDetailsDao).persist(captor.capture()); + Assert.assertEquals(NASBackupChainKeys.DELETE_PENDING, captor.getValue().getName()); + Assert.assertEquals("true", captor.getValue().getValue()); + } + + /** + * Deleting a leaf incremental whose parent is delete-pending must (a) delete the leaf and + * then (b) sweep up the tombstoned parent. Mirrors DefaultSnapshotStrategy's + * "delete leaf, then walk up while parent is destroying-and-childless". + */ + @Test + public void deletingLeafSweepsUpDeletePendingParent() + throws AgentUnavailableException, OperationTimedoutException { + Long zoneId = 1L; + Long vmId = 2L; + Long hostId = 3L; + Long offeringId = 4L; + + BackupVO leaf = new BackupVO(); + leaf.setVmId(vmId); + leaf.setBackupOfferingId(offeringId); + leaf.setExternalId("i-2-2-VM/2026.05.10.11.00.00"); + leaf.setZoneId(zoneId); + ReflectionTestUtils.setField(leaf, "id", 51L); + ReflectionTestUtils.setField(leaf, "uuid", "leaf-uuid"); + + BackupVO parent = new BackupVO(); + parent.setVmId(vmId); + parent.setBackupOfferingId(offeringId); + parent.setExternalId("i-2-2-VM/2026.05.10.10.30.00"); + parent.setZoneId(zoneId); + ReflectionTestUtils.setField(parent, "id", 50L); + ReflectionTestUtils.setField(parent, "uuid", "parent-uuid"); + + VMInstanceVO vm = mock(VMInstanceVO.class); + Mockito.when(vm.getLastHostId()).thenReturn(hostId); + HostVO host = mock(HostVO.class); + Mockito.when(host.getStatus()).thenReturn(Status.Up); + Mockito.when(host.getId()).thenReturn(hostId); + Mockito.when(hostDao.findById(hostId)).thenReturn(host); + + BackupRepositoryVO repo = new BackupRepositoryVO(1L, "nas", "test-repo", + "nfs", "address", "sync", 1024L, null); + Mockito.when(backupRepositoryDao.findByBackupOfferingId(offeringId)).thenReturn(repo); + Mockito.when(vmInstanceDao.findByIdIncludingRemoved(vmId)).thenReturn(vm); + + // Leaf details. + BackupDetailVO leafChainId = new BackupDetailVO(51L, NASBackupChainKeys.CHAIN_ID, "chain-1", true); + BackupDetailVO leafParent = new BackupDetailVO(51L, NASBackupChainKeys.PARENT_BACKUP_ID, "parent-uuid", true); + Mockito.when(backupDetailsDao.findDetail(51L, NASBackupChainKeys.CHAIN_ID)).thenReturn(leafChainId); + Mockito.when(backupDetailsDao.findDetail(51L, NASBackupChainKeys.PARENT_BACKUP_ID)).thenReturn(leafParent); + + // Parent is tombstoned. + BackupDetailVO parentChainId = new BackupDetailVO(50L, NASBackupChainKeys.CHAIN_ID, "chain-1", true); + BackupDetailVO parentPending = new BackupDetailVO(50L, NASBackupChainKeys.DELETE_PENDING, "true", true); + Mockito.when(backupDetailsDao.findDetail(50L, NASBackupChainKeys.CHAIN_ID)).thenReturn(parentChainId); + Mockito.when(backupDetailsDao.findDetail(50L, NASBackupChainKeys.DELETE_PENDING)).thenReturn(parentPending); + // Parent has no parent of its own (it's the full anchor). + Mockito.when(backupDetailsDao.findDetail(50L, NASBackupChainKeys.PARENT_BACKUP_ID)).thenReturn(null); + + // listByVmId is called repeatedly. We use a mutable list so the sweep observes the + // leaf has been removed and treats the parent as childless. + java.util.List liveBackups = new java.util.ArrayList<>(List.of(parent, leaf)); + Mockito.when(backupDao.listByVmId(null, vmId)).thenAnswer(inv -> new java.util.ArrayList<>(liveBackups)); + + // Agent acknowledges every delete. + Mockito.when(agentManager.send(Mockito.anyLong(), Mockito.any(DeleteBackupCommand.class))) + .thenReturn(new BackupAnswer(new DeleteBackupCommand(null, null, null, null), true, "ok")); + // backupDao.remove(id) drops the corresponding row from the live list so the next + // listByVmId call reflects post-delete state — mirrors the real DAO contract. + Mockito.when(backupDao.remove(Mockito.anyLong())).thenAnswer(inv -> { + Long id = inv.getArgument(0); + liveBackups.removeIf(b -> b.getId() == id); + return true; + }); + + boolean result = nasBackupProvider.deleteBackup(leaf, false); + Assert.assertTrue(result); + + // Both backups must be physically deleted (leaf first, then tombstoned parent). + Mockito.verify(agentManager, Mockito.times(2)) + .send(Mockito.anyLong(), Mockito.any(DeleteBackupCommand.class)); + Mockito.verify(backupDao).remove(51L); + Mockito.verify(backupDao).remove(50L); + } }