Skip to content

Automated (unattended) installation from a YAML answer file#180

Open
ggiesen wants to merge 105 commits into
linuxmint:masterfrom
ggiesen:automated-install
Open

Automated (unattended) installation from a YAML answer file#180
ggiesen wants to merge 105 commits into
linuxmint:masterfrom
ggiesen:automated-install

Conversation

@ggiesen

@ggiesen ggiesen commented Jun 13, 2026

Copy link
Copy Markdown

Addresses #145.

We can add automation support in live-installer, but so far we haven't. The mint developers, myself included, are not experienced with PXE, automation and large scale deployments... If you had an ini file with all the automations you wanted to see happen, what would it look like?

That question has been open since 2022. Rather than answer it on paper, I built a working version and tested it end to end, so there is something concrete to react to.

This is an MVP for unattended installation: mergeable as is, and built to be extended. I am putting it up so you can review it and decide on the direction. It works end to end today (see below), and everything here is open to change.

I want to be clear about one thing up front. The goal is for this to work on Mint, but Mint does not ship live-installer yet (it arrives with Mint 23), so I have only been able to test on LMDE. I cannot test the Mint side until the Mint 23 beta is available. The code has a Mint vs LMDE branch, and I have pinned that branch with unit tests so it will not silently break, but the Mint path has not run end to end. There is more on this further down.

What works today, as automated installs in a VM driven only by an answer file, all on LMDE:

  • BIOS and UEFI, on the simple, lvm, and lvm-on-luks layouts
  • fully custom partition layouts: explicit partitions, custom LVM (VGs/LVs), btrfs subvolumes (@/@home), and software RAID (md, levels 0/1/5/10, including LVM-on-RAID and a RAID /boot, with GRUB installed to every member disk), validated and verified on the booted system
  • static networking: static IPv4/IPv6 addresses, gateways, DNS, routes, 802.1Q VLANs, wifi (WPA-PSK / open), and 802.1X/EAP (wired and WPA-Enterprise; PEAP/TTLS/TLS, certs by reference), expressed in netplan's v2 schema and rendered to NetworkManager keyfiles directly (no netplan binary dependency)
  • target disk selection by stable attribute (by-id, by-path, model, size-min, first-non-removable), including picking the right disk out of several, and multi-disk selection for RAID
  • a LUKS-encrypted machine that prompts for its passphrase on the serial console at boot and unlocks unattended — by keyfile, or with no passphrase in the answer file at all (prompt-on-first-boot: the installer formats with a random throwaway key embedded in the initramfs so the first boot is unattended, then a one-shot service prompts on the console/serial for the real passphrase, rekeys, and reboots)
  • users with SSH keys, packages, package removal, third-party apt repositories (cloud-init's apt: shape), an http(s) proxy, a custom CA trust store (ca_certs:), proprietary/DKMS drivers (drivers:), flatpak remotes and apps, Timeshift snapshots on btrfs, multi-layout keyboards + supplementary locales, pre-install early_commands (%pre-style), post-install commands, kernel command line, serial console
  • a full PXE / iPXE netboot install on BIOS and UEFI: the same unattended install with no media at all, network-delivered end to end, verified over SSH on the installed system; plus an IPv6 data-path install (rootfs and answer file fetched over IPv6)
  • answer-file delivery over local file, http(s), NFS (v3/v4.2, IPv4/IPv6/hostname), or TFTP (RFC 2347 blksize negotiation with RFC 1350 fallback), plus auto: identity-based discovery so one boot entry serves a whole fleet (it finds its own file by MAC / SMBIOS serial / UUID)
  • the GUI installer behaves exactly as today when no answer file is present

It is covered by 356 unit tests and an integration suite that performs real installs in QEMU/KVM (including custom LVM/btrfs layouts, software RAID 1 and RAID 5 + LVM, a first-boot LUKS passphrase prompt over serial, flatpak app installs, static dual-stack IP + VLAN networking, and a no-media PXE netboot install), plus a real-NFS suite that fetches an answer file from a live NFS server over NFSv3 and NFSv4.2 across IPv4, IPv6, and a hostname. All run green in GitHub Actions on standard hosted runners, with no special hardware.

Bugs this turned up

Building and testing this found three pre-existing issues in the installer that affect more than automated installs. I am offering these as fixes regardless of what happens with the feature.

  1. A udev race in full_disk_format() (partitioning.py). Each parted call triggers a partition-table rescan, during which udev briefly removes and recreates the partition device nodes. On a multi-partition EFI install, mkfs can race a node that has not reappeared yet. Because the failure was tolerated, the file copy then landed in tmpfs until it filled. This is the sort of sporadic "install failed" or "won't boot" that is very hard to diagnose from a user bug report. The fix is udevadm settle before the node check, and treating a failed mkfs as fatal. The integration harness reproduces it reliably.

  2. The LUKS passphrase was logged in cleartext. The engine pipes it with echo <pass> | cryptsetup, which puts it in the process table (ps) and, once commands are logged, on the serial console and in the journal. Serial output is routinely captured off real hardware over BMC and serial-over-LAN. The fix is to feed it to cryptsetup on stdin (--key-file -), and to redact secrets from logs as a backstop.

  3. Encrypted installs do not boot in the live environment without help. The engine runs update-initramfs, but in the live session that is a live-tools-diverted no-op, so the crypttab never reaches the initramfs and the encrypted root cannot be unlocked. It drops to a rescue shell. Plain and LVM installs are not affected because they boot from the stock initramfs, which is why this stayed hidden. The fix is in the automated path for now, and I am happy to discuss whether the engine should handle it for the GUI as well.

I am not certain that (2) and (3) reproduce on a normally booted GUI install the same way they do in my harness. They may be partly artifacts of how the test boots. (1) is firmware level and looks real on hardware. The reproductions are in the test suite.

What the answer file looks like

I went with YAML rather than ini. It expresses the lists this needs (users, packages, commands) without ad-hoc conventions. Where it genuinely overlaps with cloud-init, identity, users, packages, apt repositories, CA certs, proxy, and network config, it reuses cloud-init's own key names and structure, so anyone who has used cloud-init or Ubuntu autoinstall should find it familiar. Where the semantics differ I use the name that matches the behaviour rather than overloading a cloud-init key: post-install commands run in the target at install time, so they are late_commands (Ubuntu autoinstall's term, and kickstart's %post), not cloud-init's first-boot runcmd. It is not a drop-in for either format, but the shared parts are deliberately the same shape. PyYAML is already in the live ISO dependency chain.

version: 1

locale: en_US.UTF-8            # cloud-init style, top level
timezone: America/Toronto
hostname: mint-ws-01

keyboard:
  model: pc105
  layout: us
  additional_layouts: [{layout: ca, variant: fr}]   # multi-layout
  toggle: grp:alt_shift_toggle

additional_locales: [fr_CA.UTF-8]                    # generated alongside `locale`

users:
  - name: admin
    gecos: Workstation Admin
    passwd: "$6$rounds=..."    # sha512crypt; plaintext is rejected
    groups: [sudo]             # cloud-init idiom: 'sudo' in groups grants admin
    ssh_authorized_keys:
      - "ssh-ed25519 AAAA... admin@laptop"

storage:
  target:
    match:                       # selected by stable attribute, never /dev/sdX
      by-id: "nvme-Samsung_SSD_980_PRO_*"
      # alternatives: by-path, model, size-min, first-non-removable
    on_no_match: abort
  layout: lvm-on-luks            # presets: simple | lvm | lvm-on-luks (or custom)
  luks:
    passphrase_source: prompt-on-first-boot   # default: nothing secret in the file
    # or: passphrase_source: keyfile + keyfile: "https://cfg.example.com/keys/ws-01.key"

network:                         # netplan v2 subset -> NetworkManager keyfiles
  version: 2
  ethernets:
    lan0:
      match: {macaddress: "52:54:00:aa:bb:01"}
      addresses: [192.168.50.10/24, 2001:db8:50::10/64]   # dual-stack
      routes: [{to: default, via: 192.168.50.1}]
      nameservers: {addresses: [192.168.50.53], search: [lab.example]}
      # auth: {method: peap, identity: ..., ca-certificate: /etc/ssl/certs/corp.pem, ...}  # 802.1X
  vlans:
    vlan50: {id: 50, link: lan0, addresses: [10.50.0.5/24]}
  # wifis: { wlan0: { access-points: { "SSID": { password: "..." } } } }

packages: [openssh-server]
package_remove: [thunderbird]    # declarative removal (extension; cloud-init has none)
proxy: http://proxy.corp.example:3128

apt:                             # cloud-init's apt: shape
  sources:
    vendor:
      source: "deb https://example.com/repo trixie main"
      key_url: https://example.com/repo.gpg   # or cloud-init's key / keyid

flatpak:
  remotes: [{name: flathub, url: "https://flathub.org/repo/flathub.flatpakrepo"}]
  install: [org.gnome.Calculator]

drivers: {install: true}         # recommended proprietary/DKMS drivers (Mint)

kernel:
  serial_console: "ttyS0,115200" # headless: console and LUKS prompt on serial

early_commands:                  # %pre-style, in the live env before partitioning
  - mdadm --stop --scan
late_commands:                   # autoinstall-style; runs in the target at install time
  - /cdrom/scripts/site-setup.sh

on_failure:
  early_command_failure: abort
  partition_mismatch: abort
  network_unavailable: continue
  package_install_failure: abort

logging:
  destination: /var/log/live-installer-auto.log
  also_serial: ttyS0

The storage, users, locale, and keyboard sections map almost one to one onto the existing Setup class in installer.py, which is what makes this tractable.

Four safety rules

These come from a decade of preseed and kickstart pain.

  1. No raw /dev/sdX. Disk targeting only via stable match expressions. Enumeration order is not stable across NVMe, SATA, and USB, and "wiped the wrong disk" is the bug class that kills trust. No match, or more than one match, aborts with the list of disks. It never guesses.
  2. Crypted password hashes only. Plaintext is rejected outright, with no override.
  3. Explicit failure behaviour. Every failure mode has a declared abort or continue policy, and the defaults fail closed (abort). An aborted install leaves the machine unbooted, never half installed.
  4. Config is data, not a program. No conditionals, loops, or templating. Generate the YAML beforehand if you need that. (early_commands run shell, but cannot rewrite the partition layout — that stays declarative.)

The whole file is validated strictly. Unknown keys, wrong types, and malformed YAML are hard errors before any disk is touched. A --check mode validates an answer file without touching disks, so it can run in CI, and --list-disks prints a machine's stable disk attributes so an operator can build a match expression.

How it is invoked

  • A file at a known path on the install media, for example /cdrom/auto-install.yaml. No infrastructure, covers USB and remastered ISO.
  • A kernel argument live-installer.auto=<source>, where source is a path, an http(s)://, nfs://, or tftp:// URL, or auto:<base-url> for fleet-wide identity-based discovery (it finds its own file by MAC / SMBIOS serial / UUID). Cleartext transports that carry secrets are refused unless live-installer.auto-insecure is also passed.
  • The same argument is how a full PXE / iPXE netboot triggers the install. Netboot has a few non-obvious image requirements (a single combined squashfs, the kernel/initrd/manifests carried in the rootfs, a NetworkManager-based live system); these are written up in docs/pxe-netboot.md.

With no answer file, the GUI runs exactly as today.

How it fits the existing code

This is what made me think it was worth doing. The engine is already shaped for it.

InstallerEngine has no GTK imports and talks to the UI through two callbacks, set_progress_hook and set_error_hook. The headless driver, auto_installer.py, is a sibling of InstallerWindow that registers console and log implementations of those same hooks. It does not change the engine's control flow. The automated partition path (setup.automated = True) already runs from flat Setup fields and never touches the GTK partition widgets.

The new modules are self-contained: auto_installer.py (driver), schema.py (validation), diskmatch.py (disk selection), discovery.py (answer-file discovery), netconfig.py (renders the network: section to NetworkManager keyfiles), pkgbackend.py (a swappable package backend so the repo/install step is not apt-hardcoded), catrust.py (renders ca_certs: into the system trust store), snapshotbackend.py (renders snapshots: to Timeshift), and commandrunner.py (one place that all shell-outs go through, for logging and secret handling). The swappable backends (package, CA trust, snapshot) all sit behind the commandrunner chroot boundary, so adding dnf/zypper, or a Snapper/rsync snapshot backend, is a drop-in rather than a rewrite. The changes to shared engine code are small and additive, and the CD/GUI path is unaffected: routing the engine's shell-outs through commandrunner (behaviour-preserving); one optional before_unmount_hook parameter on finish_installation() so the driver can do post-install work while the chroot is still mounted; and, for netboot, reading the kernel/initrd/package manifests from the rootfs when they are absent from the medium (a no-op on CD/USB, where they sit next to the squashfs).

Tests and CI

There is no CI in the repo today. This ships with three GitHub Actions workflows you can adopt incrementally.

Unit tests run pytest on Python 3.11 to 3.13, on every push and PR. They need no special hardware, since GTK and parted are stubbed. There are 356 of them.

Integration tests perform real installs in QEMU/KVM, nightly and on demand. The whole matrix (BIOS and UEFI across simple, LVM, and LUKS; multi-disk by-id; custom layouts with LVM and with btrfs subvolumes; software RAID 1 and RAID 5 + LVM-on-RAID; a prompt-on-first-boot LUKS passphrase typed over the serial console; flatpak remote + app installs; Timeshift-on-btrfs; static dual-stack IP + 802.1Q VLAN networking; no-media PXE/iPXE netboot installs on both BIOS and UEFI plus an IPv6 data-path install; and the malformed-input, no-disk-match, bios_grub-on-UEFI, and snapshots-on-ext4 failure cases) runs green on standard GitHub hosted runners. They have KVM, so no self-hosted lab is needed; the PXE scenarios use QEMU's built-in TFTP and iPXE, so they need no real DHCP/TFTP infrastructure. Serial logs are uploaded as artifacts so a failed run is debuggable without reproducing it locally.

A third workflow stands up a real NFS server on the runner (GitHub runners have passwordless sudo and a kernel nfsd) and fetches an answer file through the installer's actual nfs:// transport over the full matrix of protocol version (v3, v4.2) × address form (IPv4, IPv6, hostname), so NFS delivery is exercised end to end rather than mocked.

What it does not do yet

This is a base for the common desktop and workstation case, not a kickstart-class enterprise provisioner. Not implemented:

  • Bonding and bridging. Not modelled yet (the rest of the netplan v2 model — static IPv4/IPv6, DNS, routes, VLANs, wifi, and 802.1X/EAP — is covered under "what works today"); wifi and EAP are unit-tested only, since QEMU does not emulate 802.11 and there is no RADIUS server in CI.
  • Services enable/disable, firewall, and package groups / metapackages.
  • TPM2 or network-bound (Clevis and Tang) LUKS unlock. The schema reserves tpm2, but the driver does keyfile and prompt-on-first-boot.
  • IPv6 PXE firmware boot. The IPv6 data path is covered (a CI scenario installs with the rootfs and the answer file both fetched over IPv6), but booting the firmware over IPv6 (DHCPv6 + UEFI HTTP boot) cannot be emulated in QEMU's user-mode network, so that part is verified by reasoning, not by CI.

The biggest caveat is Mint itself, so to repeat it plainly: everything above has been tested on LMDE only. Mint does not ship live-installer until Mint 23, so I cannot run the Mint path end to end until the Mint 23 beta. The integration harness also assumes a Debian-live boot environment, and Mint uses casper, so the harness itself will need a Mint variant before it can cover Mint. The Mint vs LMDE code branch is pinned by unit tests so it will not silently regress, but please treat Mint support as designed-for and not yet verified.

Questions

  1. Is YAML acceptable? If you would strongly prefer ini or something else, better to know now.
  2. Strict validation currently uses python3-pydantic, which is packaged in Debian and Ubuntu. I can fall back to hand-rolled validation over PyYAML with no new dependencies if you would rather not add one to the live ISO. What is your policy on new dependencies for the ISO?
  3. Is live-installer.auto= an acceptable kernel argument, and /cdrom/auto-install.yaml a reasonable well-known path? I handle both /cdrom and LMDE's /run/live/medium.
  4. Should the engine own any of the bug fixes above, the udev race especially, or would you rather they stay in the automated path?
  5. With Mint 23 unifying on live-installer, would you prefer this lands after 23.0 settles, or earlier?

ggiesen added 30 commits June 12, 2026 01:19
The project is licensed GPL-2+ but the license text was declared only
in debian/copyright, so GitHub's license detection reports "no license".
Add the canonical GPL-2 text as a top-level COPYING file.

Also fix Upstream-Name in debian/copyright: the upstream project is
live-installer, not livemaker.
Add a unit test suite for the pure-logic helpers in partitioning.py:
get_device_naming_scheme_prefix, to_human_readable, is_efi_supported,
and the Partition class size/classification math.

The conftest stubs gi/parted/dialogs in sys.modules and redirects the
import-time read of /usr/share/live-installer/disk-partitions.html to
the copy in the repo, so the tests run on any machine with only Python
and pytest - no GTK, pyparted or installed package required.

Run with: python3 -m venv .venv && .venv/bin/pip install pytest && .venv/bin/pytest
Scenario-driven integration harness that boots the pinned LMDE ISO in
QEMU, serves an answer file to the guest over HTTP, waits for the
installer's completion marker on the serial console, then reboots into
the installed disk and runs SSH assertions. Emits JUnit XML.

Supports BIOS, UEFI (OVMF) and UEFI+SecureBoot firmware plus an
emulated TPM2 via swtpm, on both EL and Debian/Ubuntu firmware layouts.
KVM is used when available, with TCG fallback.

The full install phase needs the headless driver (not yet written), so
the baseline scenario currently runs in --smoke mode only: boot the
ISO, confirm the VM stays up, tear down. The bundled answer file
documents the target v1 schema for the bios-simple case.
Strictly-validated YAML answer file (pydantic v2) implementing the
automated-install design rules:

- Disk targeting by stable match expressions only; raw /dev paths are
  rejected with an explanatory error
- crypt(5) password hashes only; plaintext is rejected with no override
- Explicit per-failure-mode abort/continue policy, defaulting to abort
- Versioned format (version: 1 required)
- Strict parsing throughout: unknown keys are errors, YAML type
  coercion surprises fail validation instead of being guessed at

Custom partition layouts are intentionally unsupported in v1 (presets:
simple, lvm, lvm-on-luks); the error message directs users to the GUI.

The integration-test answer fixture is parsed in the unit suite so the
two cannot drift apart.
Replace the engine's 87 bare os.system() calls and 10
subprocess.getoutput() calls with a CommandRunner instance owned by
InstallerEngine (constructor-injectable, defaults to the real thing).
do_run_in_chroot and exec_cmd delegate to the runner; the rsync
progress stream uses runner.popen().

Behaviour-preserving by design: commands still tolerate failure unless
check=True is requested, but every command and every non-zero exit is
now logged in one place instead of failing silently.

This makes the engine drivable and assertable in tests without
touching the host system; component tests using a recording runner are
included. partitioning.py's shell-outs are unchanged (separate step).
auto_installer.py drives InstallerEngine without a GUI, as a sibling of
the GTK InstallerWindow:

- Acquires the answer file from a local path or URL (plain HTTP refused
  unless --insecure: answer files carry password hashes), validates it
  against the v1 schema, and maps it onto the engine's Setup object
- Resolves the target disk by stable attributes via diskmatch.py
  (by-id/by-path globs, model, size-min, first-non-removable); no match
  or an ambiguous match aborts with the list of available disks
- Registers console/logfile/serial implementations of the engine's
  progress and error hooks, then runs the same start_installation /
  finish_installation sequence the GUI does
- Afterwards re-enters the target to create additional users, apply
  package add/remove and run post-install steps, honouring the answer
  file's per-failure-mode abort/continue policy
- Prints "Automated installation complete" / "... FAILED" as the final
  serial markers; --dry-run validates the config and disk match only

Entry points: live-installer --automated=<source>, or
live-installer.auto=<source> on the kernel command line (detected in
main.py before any GUI setup). The GUI path is unchanged.

Engine change: Setup.password_is_crypted switches setup_user to
chpasswd -e, since unattended installs only ever carry crypt(5) hashes.

The integration harness now watches for the driver's markers and fails
fast when the failure marker appears.
To exercise working-tree installer code against a stock ISO, the
harness builds a dev ISO: a small overlay squashfs containing usr/ is
added to the ISO's /live directory (live-boot union-mounts every
*.squashfs found there, so our files shadow the originals — no
root-required remaster of the main squashfs).

- harness/isotools.py: overlay build (mksquashfs -all-root), boot-record
  preserving ISO rewrite (xorriso -boot_image any replay), and
  vmlinuz/initrd extraction for direct-kernel boot
- harness/make_test_iso.py: CLI to (re)build fixtures/lmde-7-dev.iso
- run_scenario.py full mode: direct-kernel boots the dev ISO with
  console=ttyS0 + live-installer.auto=<http url>, generates a per-run
  SSH keypair and injects the public key into the served answer file,
  then verifies over SSH with that key

New systemd unit live-installer-auto.service (shipped enabled, gated by
ExecCondition on live-installer.auto= in /proc/cmdline) launches the
installer in the live session — ConditionKernelCommandLine= can't
prefix-match key=value arguments, hence the ExecCondition.

Schema: users[].ssh_authorized_keys — installs OpenSSH public keys for
any user via the headless driver. Useful for fleet provisioning, and
what the harness's verify phase logs in with.
First end-to-end run surfaced that the LMDE 7 live ISO does not ship
PyYAML (or pydantic) — the headless driver crashed at import. Declare
both in debian/control Depends, and have the integration harness stand
in for the .deb dependencies by bundling the wheels for the target live
system's Python into the overlay squashfs's dist-packages.

Also add an ExecStopPost safety net to live-installer-auto.service: if
the installer process dies without printing its own failure marker
(e.g. an import-time crash), the unit emits the marker to /dev/console
so unattended callers fail fast instead of waiting out their timeout.
A condition skip counts as success, so normal boots stay silent.
The HTTPS-only rule for answer files had an argv escape hatch
(--insecure) but no kernel-cmdline equivalent — and cmdline boots have
no argv, which made plain-HTTP delivery impossible for PXE-style
deployments on trusted networks (and for the integration harness, whose
answer file travels over QEMU's host-only user network).

Boot with live-installer.auto-insecure alongside live-installer.auto=
to accept an http:// source. HTTPS remains the default requirement.
full_disk_format() read setup.gptonefi through the module-global
'installer', which is only assigned when the GUI calls
build_partitions(). On the headless path that global never exists and
the automated partition step crashed with NameError. Take the setup
object as a parameter; the engine passes self.setup.
The engine fires the progress hook once per copied file during the
rsync phase (~400k calls for a stock install). The headless driver
logged every call to console, journal and serial, making console I/O
the bottleneck of the whole installation. Only log when the rendered
progress line actually changes (~100 lines per phase).
finish_installation() unmounts the target filesystem at its end, so
the headless driver's attempt to re-enter the chroot afterwards found
an empty /target (every bind mount failed rc=32, chroot rc=127).

Add an optional before_unmount_hook parameter to finish_installation(),
invoked after the system is fully configured (post clean_apt) but while
the chroot is still mounted and has working DNS. The driver applies
extra users, SSH keys, package changes and post-install scripts there,
and no longer carries its own mount/unmount plumbing.
On EFI installs the engine dpkg-installs the bootloader stack
(shim-signed, grub-efi-*) from the ISO pool without its dependency
closure (shim-signed-common, shim-helpers-amd64-signed), leaving dpkg
in a state apt-get refuses to build on — the driver's package step
failed with unmet dependencies on UEFI while passing on BIOS.

Run 'apt-get install -f -y' after apt-get update so the half-installed
bootloader packages are completed from the network before the answer
file's package additions are attempted.
Each parted invocation triggers a partition-table rescan during which
udev removes and recreates the partition device nodes. On EFI installs
(three partitions, plus a trailing 'set 1 boot on') mkfs.ext4 raced a
vanishing /dev/vda3: the existence check passed, mke2fs then found no
node, and because the failure was tolerated the root mount silently
failed and rsync copied the entire system into the live session's
tmpfs until ENOSPC.

Run 'udevadm settle' before the node-existence check, and raise on a
non-zero mkfs exit instead of carrying on with an unformatted target.
Two scenarios assert that bad input fails fast and cleanly — the
failure marker appears and nothing is half-installed:

- fail-malformed-yaml: syntactically broken answer file
- fail-no-disk-match: valid file whose disk matcher matches nothing
  (on_no_match: abort)

The harness gains expect.outcome: failure — the failure marker is the
expected result, the answer file is served verbatim (it may be
deliberately malformed, so no SSH-key staging), and there is no
boot/verify phase.

Also adds the uefi-lvm scenario (LVM layout preset, root and swap on
the lvmmint VG).
The LUKS keyfile may now be a URL as well as a local path — per-machine
keyfiles served by a provisioning server alongside the answer file.
URLs follow the same HTTPS-only rule (and --insecure escape hatches) as
the answer file; the shared fetch error message is generalized since it
now covers key material too.

Harness: the HTTP server now starts before answer-file staging so its
base URL can be substituted for {server} placeholders, auxiliary files
next to the answer file are served too, and scenarios can set
skip_boot_phase (the LUKS scenario's installed system prompts for the
passphrase at the initramfs, which needs interactive serial support —
its install phase still exercises luksFormat/LVM/crypttab/grub-efi).
The LUKS scenario showed the passphrase verbatim on the serial console
and in the journal: the engine pipes it to cryptsetup via echo, and the
runner logs every command it executes. Serial output is routinely
captured (BMC/SOL logging, test harnesses), so this is a real key-material
leak, not a cosmetic issue.

CommandRunner.run() gains a secrets parameter: the given strings (and
their shell-quoted forms) are replaced with [REDACTED] in the EXEC log
lines and in CommandError messages, while the executed command is
untouched. The engine's luksFormat/luksOpen call sites use it.
Redacting the passphrase from logs closed the serial/journal leak but
the engine still built 'echo -n <pass> | cryptsetup ...', so the
passphrase was briefly visible in the process table (ps) on the live
system. cryptsetup reads the key from stdin with --key-file -; pass it
through the new CommandRunner.run(stdin=) channel instead. The
passphrase is now never a command argument: not in ps, not in logs.

The secrets= redaction stays as a belt-and-braces guard for any future
case where a secret must appear in a command.
Upstream has no CI; this adds two workflows it can adopt incrementally.

- unit-tests.yml: runs pytest tests/unit on Python 3.11-3.13 for every
  push and PR. Fast, no special hardware (the conftest stubs gi/parted).
- integration-tests.yml: boots real VMs and performs full unattended
  installs. KVM-dependent and slow (~10-15 min/scenario), so it runs
  nightly and on workflow_dispatch, not per-push. A fast failure-mode
  gate runs first; the four install scenarios then run as a matrix.
- actions/vm-setup: composite action shared by both integration jobs —
  enables KVM on the runner, installs QEMU/OVMF/swtpm + ISO tooling,
  fetches the pinned ISO (cached), and builds the dev ISO.

requirements-test.txt pins the unit suite's deps (pytest + the runtime
deps declared in debian/control) for a bare-virtualenv install.
GitHub is deprecating Node 20 actions (forced to Node 24 on 2026-06-16).
Bump to the current majors: checkout v4->v6, setup-python v5->v6,
cache v4->v5, upload-artifact v4->v7.
Three additions:

- kernel.cmdline_extra (schema + driver): append kernel parameters to
  the installed system's GRUB_CMDLINE_LINUX_DEFAULT and regenerate grub
  in the post-install hook. A real fleet need (serial console, driver
  blacklists) and what puts the LUKS prompt on serial.

- Interactive serial in the harness: serial is now a bidirectional unix
  socket teed to the log file, with VM.send_serial() to type into the
  guest — the channel an admin drives over IPMI Serial-over-LAN. The
  uefi-lvm-luks scenario now does a full boot/verify: it waits for the
  initramfs unlock prompt on serial, types the passphrase, and verifies
  the unlocked system over SSH (no more skip_boot_phase).

- bios-multi-disk scenario: two disks where the by-id target is the
  second and smaller one, proving the installer selects by stable
  attribute rather than enumeration order, and leaves the decoy disk
  untouched. Needed multi-disk support in the VM harness (per-disk
  serials via virtio-blk-pci -> /dev/disk/by-id/virtio-<serial>).
For upstream maintainability, the automated-install feature now ships
with the documentation and CI a maintainer needs to support it without
automation/VM expertise:

- docs/automated-install.md: answer-file reference, triggering, delivery
  mechanisms, worked examples, failure handling, security notes.
- tests/TESTING.md: architecture of the unattended path, the two test
  layers, how to run/extend the suite, how to debug a scenario from its
  serial log, and the known traps (udev races, the unmount-hook
  ordering, dpkg repair, secret handling).
- README.md: the repo had none; orients a reader and links both guides.

Integration CI hardening: per-job timeout-minutes (a hung VM can no
longer consume the 6h ceiling) and concurrency cancellation of
superseded runs.
The engine reads the squashfs/kernel and grub-title script from
different paths on Mint (Ubuntu/casper) vs LMDE (Debian/live-boot), but
every integration scenario exercises only the LMDE branch (LMDE is the
edition shipping live-installer today; Mint adopts it in Mint 23).

Pin both path sets with unit tests so a wrong Mint path can't regress
silently before Mint 23 ships, and document the coverage boundary and
what Mint-side integration would require (a casper injection variant in
the harness, ideally validated against the Mint 23 beta) in TESTING.md.
The multi-disk install correctly targeted the by-id disk (vdb), but
phase 2 booted the first-enumerated disk (vda, the empty decoy) and SSH
never came up. Add VM.start(boot_serial=...) to pin bootindex=0 on the
disk the OS was installed to, and set boot_disk_serial in the scenario.
This is the harness standing in for the firmware boot-order an admin
would configure on real multi-disk hardware.
Driving the LUKS unlock over serial revealed that an appended
console=ttyS0 is not enough: LMDE's default 'quiet splash' makes
plymouth grab the passphrase prompt graphically, so it never reaches
ttyS0. Surfacing it on serial needs full serial-console provisioning
(drop quiet/splash + a GRUB serial terminal), which is more than the
cmdline_extra append can do.

Keep the install-phase verification (which exercises luksFormat via
stdin, LVM-on-LUKS, crypttab, grub-efi and cmdline_extra) and record
the serial-unlock work as a documented follow-up. The harness's
send_serial capability remains for when the system is provisioned for
serial.
The fixtures directory is empty in git (its only contents — the ISO and
sha256sum.txt — are gitignored), so it doesn't exist on a fresh GHA
checkout. The ISO-download step's working-directory then failed with
"No such file or directory" before running, and the whole integration
run aborted in setup. Track the directory with a .gitkeep and mkdir -p
defensively in the step.
Three issues surfaced by the first real GitHub Actions run:

- The serial unix socket lived in the (deep) work directory. AF_UNIX
  paths are capped at ~108 bytes; a CI runner's long workdir prefix blew
  past it so QEMU silently failed to create the socket and every VM
  start aborted. Put the socket in a short tempdir instead (the log
  stays in the workdir). This is why it worked locally but not on CI.

- KVM enablement raced: the udev rule sets /dev/kvm to 0666
  asynchronously, but `test -w` ran before it applied. Apply the mode
  synchronously with a direct chmod (keep the udev rule for persistence).
  Hosted runners DO have KVM — the earlier "not available" was this race.

- The serial reader raised a misleading "could not connect to socket"
  even when QEMU had died at launch. Detect process exit and surface
  QEMU's stderr so the next such failure is diagnosable.
ggiesen added 9 commits June 13, 2026 23:27
Item 3 (repo/package format, cloud-init A+C). Adopt cloud-init's apt:
shapes in the answer file and execute via distro-native tooling behind a
swappable backend, rather than a neutral repo schema or a cloud-init dep.

- schema.py: replace repositories: [{source, key_url}] with apt:
  {sources: {<name>: {source, key|keyid|keyserver|key_url, filename}}},
  mirroring cloud-init. Validation: deb/deb-src source line, at most one
  signing key per source, https-only key_url, hex keyid, safe names.
  Breaking answer-file change (repositories -> apt).
- pkgbackend.py (new): PackageBackend interface + AptBackend; get_backend()
  picks per distro family. Applies sources (inline key / keyserver / https
  key fetch), runs update (with the EFI apt-get install -f fixup), installs
  the agnostic packages list, removes package_remove. All via CommandRunner
  so it stays unit-testable and library-free.
- auto_installer.py: _apply_packages now delegates to the backend; keep
  packages:/package_remove: top-level and agnostic.
- 22 new unit tests (schema apt + backend command sequences). Docs updated
  (### apt). Existing integration scenarios already cover the install path
  via packages: openssh-server.
…xmint#1-linuxmint#3)

Address the three actionable correctness items from the 2026-06-13 review:

linuxmint#1 [High] /boot/efi without the esp flag is now rejected. The reverse
   check was missing: a vfat /boot/efi partition with empty flags passed
   validation but its GPT entry lacks the ESP type GUID, so UEFI won't
   boot it — a silently-broken install. schema.py CustomPartition now
   requires esp on a /boot/efi partition, with an error explaining why.

linuxmint#2 [Med] esp-on-BIOS / bios_grub-on-UEFI now rejected engine-side. The
   schema can't know firmware mode (runtime), so installer.py validates it
   in _check_layout_matches_firmware before any partition is created (no
   destructive action on mismatch). Added a fail-bios-grub-on-uefi
   regression scenario to the failure-modes CI job.

linuxmint#3 [Med] Duplicate LV names within a VG now rejected at validation time
   (Storage._consistency) instead of failing confusingly at lvcreate.

10 new tests (schema + engine + parsed regression fixture); 240 unit
tests pass.
Extend the netplan-v2 network: section with wifis:, rendered to
NetworkManager wifi keyfiles by netconfig.py.

- schema.py: WifiConfig(_IpConfig) reuses all the ethernet IP logic
  (static/DHCP, dual-stack, gateways, DNS, routes, match) and adds an
  access-points map of SSID -> {password?, hidden?}. Validates SSID
  (1..32 bytes), PSK (8..63 chars or 64-hex), at-least-one access point,
  and id uniqueness across ethernets/wifis/vlans. vlans may link to a wifi.
- netconfig.py: one NM wifi keyfile per access point (type=wifi, [wifi]
  ssid/mode/hidden + mac bind, [wifi-security] wpa-psk; open network omits
  security). Filenames suffixed only when a device has >1 AP.
- 15 new unit tests (schema + renderer + a driver test asserting the
  PSK-bearing keyfile is 0600). 255 unit tests pass.

Testing boundary: QEMU does not emulate 802.11, so wifi has no end-to-end
integration scenario (documented) — it is covered by schema + renderer
unit tests. WPA-Enterprise (EAP), bonds, and bridges remain unmodelled.

Docs updated: ### network (wifi subsection), comparison table, limitations.
… wifi filter, doc nits)

linuxmint#4 [Low] discovery now genuinely excludes wifi from by-mac candidates: the
   ARPHRD_ETHER (type 1) check does NOT exclude wifi (wifi is type 1 too),
   so also skip interfaces with a sysfs wireless/ subdir. Comment fixed to
   describe what the check actually does.
linuxmint#5 [Med] discovery drops sentinel/placeholder DMI values (all-zero/all-FF/
   QEMU-default product_uuid; 'Default string', 'To Be Filled By O.E.M.',
   etc. product_serial) so a by-uuid/ or by-serial/ config can't match every
   defective unit in a fleet; the skip is logged.
linuxmint#6 [Low,docs] steer network examples to routes: [{to: default, via}] as the
   canonical default-gateway form; gateway4/6 documented as legacy.
linuxmint#7 [Low,docs] document that ambiguous NM matches are undefined — bind by MAC
   or a unique interface name; no autoconnect-priority is emitted.
linuxmint#8 [Low] document the TFTP client's RFC1350 size assumption (small files;
   use HTTP for large) and warn past 256 KiB. (Full RFC2347 negotiation
   remains queued.)
linuxmint#9 [Low,docs] document on_no_match's future intended values.

6 new tests (260 unit total).
…ck (item B)

The built-in TFTP client now requests RFC 2347 options (RFC 2348 blksize
1428, RFC 2349 tsize) and handles the OACK: parse the negotiated blksize,
ACK block 0, then receive at that size. Two fallbacks keep it working
against any server:
- option-unaware server replies with DATA block 1 -> use the 512-byte
  RFC 1350 default already in place (no special handling needed);
- strict server replies ERROR code 8 (option negotiation) -> retry the
  request once with no options.

The transfer's terminal-block test uses the negotiated blksize, not a
hardcoded 512.

Tests cover both modes: the existing roundtrip cases now exercise the
no-OACK fallback (the default test server ignores options); new cases add
blksize=1024 OACK negotiation, ERROR-8 -> bare-retry fallback, and OACK
parsing. 263 unit tests pass. Closes review linuxmint#8 / backlog item B. Docs
updated (TFTP transport row).
… knob

Add end-to-end NFS coverage against a live server, and a way to force the
protocol version.

- auto_installer: nfs:// URLs accept an optional ?vers= (3/4/4.0/4.1/4.2),
  passed through as a vers= mount option; omitted, mount.nfs negotiates.
  Useful for version-restricted filers, and lets the harness pin each
  version deterministically. Validated; bad versions rejected.
- tests/nfs/: a real-server harness — setup-nfs-server.sh provisions
  nfs-kernel-server exporting a small answer file read-only over NFSv3 and
  NFSv4.2, reachable via IPv4, IPv6, and a hostname; test_nfs_real.py fetches
  it through the actual nfs:// transport across the full {v3,v4.2} x
  {127.0.0.1,[::1],li-nfs-host} matrix plus a negotiated-default case.
  Gated on LI_NFS_TEST=1 (needs root for mount), so the unit suite stays
  hermetic.
- .github/workflows/nfs-tests.yml runs it per push: GHA ubuntu runners have
  passwordless sudo and a kernel nfsd, so a real server works in CI.
- Schema versioning policy (review Q1) documented: option (c) — grow v1
  additively, fork to v2 only for breaking changes while still parsing v1.
- Docs: NFS vers= and dual-stack/hostname support.

267 unit tests pass; 7 NFS tests run in the new CI job.
The nfs-kernel-server package does not pre-create /etc/exports.d on the
runner, so the export tee failed. mkdir -p it first.
Cross-checked every doc against the code and fixed the mismatches a
multi-doc audit surfaced.

Code:
- main.py: the `live-installer` automated wrapper only forwarded
  --automated=/--insecure/--dry-run, so the documented `--list-disks` and
  `--check`/`--config` invocations silently did nothing. Forward all other
  flags through to auto_installer unchanged.

Docs (automated-install.md): drop the ${serial} keyfile example (no such
templating exists, and it contradicted the no-templating rule) and clarify
only passphrase_source: keyfile is implemented; note the install-vs-boot
passphrase distinction; add the LV-name-uniqueness, esp<->/boot/efi, and
engine-side firmware-flag rules; add the vlan-self-link and id-reuse network
rules; correct the apt key path (.asc vs .gpg, name vs filename:); hostname
defaults to 'mint', not DHCP.

Docs (comparison.md): add the missing TFTP delivery row (v3/v4.2 NFS detail
too). getting-started.md: layout list includes custom; fix the stale
"four answer files / one per layout" count. README.md: intro lists the
real capabilities; module list adds discovery/netconfig/pkgbackend.

PR linuxmint#180 body: ${serial} -> concrete keyfile URL; size -> size-min;
NFS/TFTP transport detail. 267 unit tests pass.
True first-boot rekey (the mechanism confirmed with the user): format LUKS
with a random throwaway key so the install is fully unattended, embed that
key in the initramfs to auto-unlock the first boot, then a one-shot service
prompts the operator for the real passphrase, swaps it in, removes the
throwaway key + keyfile, rebuilds the initramfs, and reboots — every
subsequent boot prompts normally.

- build_setup: prompt-on-first-boot (the schema default) now generates a
  random newline-free key (secrets.token_urlsafe) instead of aborting; sets
  setup.luks_rekey_on_first_boot. tpm2 still aborts.
- installer.py: write_crypttab names the embedded keyfile instead of the
  prompting placeholder in the rekey case so the first boot auto-unlocks;
  print_setup no longer logs the LUKS passphrase (it was teed to the log +
  serial, a pre-existing leak the random key would have worsened).
- auto_installer: _setup_luks_first_boot_rekey writes the keyfile byte-exact
  (no trailing newline, 0600), configures the cryptsetup-initramfs hook to
  embed it + UMASK=0077, installs/enables the li-luks-rekey one-shot. Runs
  before the initramfs rebuild. The rekey script adds the new key with
  printf %s (no newline) so it matches the boot-time prompt.

Unit-tested (key byte-exactness, crypttab, hook, service, redaction); 272
unit tests pass. End-to-end two-boot unlock is the next step.
@ggiesen ggiesen force-pushed the automated-install branch from a092622 to a4399df Compare June 14, 2026 04:57
ggiesen added 20 commits June 14, 2026 01:00
…ekey)

uefi-luks-prompt: unattended LVM-on-LUKS install with no secret in the
answer file. Phase 2 drives the full rekey dance over serial via a new
first_boot_rekey harness branch: first boot auto-unlocks (throwaway key in
the initramfs), the one-shot prompts to set + confirm the passphrase, the
service rekeys and reboots, the second boot prompts at the initramfs, and
the unlocked system is verified over SSH. Also asserts the throwaway
keyfile, the rekey service, and the keyfile crypttab entry are all gone
afterwards.
The first attempt failed in CI: the one-shot ran at ~7s (before
systemd-user-sessions), and systemd-ask-password --no-tty returned non-zero
that early, which the set -e script treated as fatal -> service failed
before prompting. The install + first-boot auto-unlock worked; only the
prompt mechanism was wrong.

Fixes: run the unit late (After=systemd-user-sessions.service) but before any
getty claims the console (Before=getty.target serial-getty@ttyS0.service), and
own the tty (StandardInput=tty-force, TTYPath=/dev/console). Prompt with a
plain read on the console instead of systemd-ask-password, and drop set -e so
a transient blkid cannot kill the rekey.
Queue item 1. Catches up to kickstart/autoinstall on the multi-layout +
extra-locale case (e.g. an en_CA + fr_CA desktop).

- schema: keyboard.additional_layouts (list of {layout, variant}) + keyboard.
  toggle (XKB switch option, only valid with extra layouts); top-level
  additional_locales (each validated like locale). XKB layout/variant/toggle
  validated.
- build_setup: comma-joins layouts/variants (primary first) into the
  keyboard_layout/keyboard_variant Setup fields, passes the toggle as
  keyboard_options, and strips codesets from additional_locales.
- installer.py: setup_locale also locale-gen s the supplementary locales (LANG
  stays primary); setup_keyboard writes XKBOPTIONS from keyboard_options
  (honouring the toggle) and along the way fixes a pre-existing bug where the
  XKBOPTIONS line was written without quotes or a trailing newline.
- Integration: bios-simple now installs us + ca/fr with a toggle and an
  fr_CA locale, and verifies /etc/default/keyboard and locale -a on the booted
  system. 283 unit tests pass.
Queue item 2. autoinstall has it; corporate desktops need it.

- schema: top-level proxy: (http(s) URL, validated — scheme, host, and no
  quotes/whitespace so it cannot break out of the apt.conf string or shell
  env quoting).
- auto_installer: _apply_proxy writes /etc/apt/apt.conf.d/00proxy
  (Acquire::http(s)::Proxy) and appends http_proxy/https_proxy (+upper) to
  /etc/environment. Runs BEFORE _apply_packages so the install chroot apt-get
  already uses it; persists for the installed system. Wired into needs_post.

Unit-tested (URL validation + the exact apt.conf/environment content). Not
integration-tested: an unreachable proxy would break apt during install, and
standing up a real proxy in QEMU is disproportionate — the config generation
is fully unit-covered. 291 unit tests pass.
Queue item 3. Adds CA certs to the system trust store (/etc/ssl/certs) for
corporate MITM-proxy CAs / internal PKI roots.

- schema: ca_certs: {remove_defaults, trusted: [<inline PEM>]} mirroring
  cloud-init. Validation rejects a private key in trusted (a trust store
  holds public certs only) and non-PEM junk; rejects remove_defaults with no
  trusted certs (would leave an empty store / break TLS).
- catrust.py (new): CaTrustBackend + DebianCaTrustBackend mirroring
  pkgbackend.py. Writes each cert to /usr/local/share/ca-certificates/
  li-ca-N.crt (0644, .crt ext required) and runs update-ca-certificates;
  remove_defaults disables the bundled certs and --fresh-rebuilds. Seam for
  update-ca-trust (RHEL) later.
- auto_installer: _apply_ca_certs runs BEFORE _apply_packages so a private
  mirror CA is trusted when apt fetches over HTTPS. Wired into needs_post.
- Deliberately the SYSTEM store only; separate from future per-connection
  802.1X/EAP trust.
- Integration: bios-simple installs a harness-generated self-signed CA and
  verifies it reached /etc/ssl/certs. 303 unit tests pass.
The previous commit left bios-simple.yaml with a bare {ca_cert} placeholder
that fails schema validation, breaking test_scenario_fixture_parses (a
pytest|tail masked the failure at commit time). Use a structurally valid PEM
placeholder (BEGIN/END CERTIFICATE with an LI_HARNESS_CA_CERT_PLACEHOLDER
body) so the static fixture validates; the harness swaps in a real generated
cert by matching the marker. 303 unit tests pass.
Queue item 4. Adds autoinstall driver-selection primitive.

- schema: drivers: {install: bool} (autoinstall shape).
- auto_installer: _apply_drivers runs ubuntu-drivers install after the
  package phase (apt/repos ready). Guarded by command -v ubuntu-drivers, so
  it is a logged no-op on LMDE/Debian where ubuntu-drivers-common is absent;
  on Mint it installs recommended proprietary/DKMS drivers. Wired into
  needs_post.

The MOK wall is universal: under SecureBoot a freshly built DKMS module
still needs interactive MOK enrollment at next boot — documented, not
something this (or any) installer can automate.

Unit-tested (available + unavailable paths). No integration scenario:
QEMU has no proprietary hardware, so ubuntu-drivers install is a no-op
there. 306 unit tests pass.
Queue item 5 (Feature B). Per-connection EAP for wired 802.1X and wifi
WPA-Enterprise, via netplan v2 auth:, rendered to the NM keyfile [802-1x].

- schema: Auth model on EthernetConfig (wired) and AccessPoint (wifi).
  method tls|peap|ttls; identity/anonymous-identity; ca/client cert + key
  by ABSOLUTE PATH (inline deferred so a private key need not transit the
  answer file); phase2-auth; password; allow-unvalidated. Validation:
  REFUSES EAP without a ca-certificate (rogue-AP credential-theft footgun)
  unless allow-unvalidated; tls needs client-cert+key; peap/ttls need
  identity+password+phase2-auth; an AP cannot set both a WPA-PSK password
  and auth.
- netconfig: shared _8021x_lines renders [802-1x] for both ethernet and
  wifi; wifi EAP uses key-mgmt=wpa-eap. EAP secrets land in the 0600 keyfile.
- Separate from the system ca_certs trust store by design (can share a cert
  file by reference).

Unit-tested (rendering + validation incl. the no-CA refusal). Not
integration-tested: no RADIUS server / 802.11 in QEMU — bare-metal concern,
documented. Deferred: inline certs/keys, a first-boot EAP connectivity
check. 318 unit tests pass.
Queue item 6. Configures Timeshift at install time on a btrfs root.

- schema: snapshots: {enabled, backend: timeshift-btrfs, schedule:
  {boot,daily,weekly,monthly}, initial_snapshot}. THE KEY GUARD: a top-level
  model_validator cross-validates backend vs storage — timeshift-btrfs
  requires the btrfs @/@home root (decided in storage.layout), so a config
  that could not make snapshots fails at validation, not silently after
  install. backend: auto / rsync / snapper rejected in v1.
- snapshotbackend.py (new): SnapshotBackend seam + TimeshiftBtrfsBackend
  mirroring pkgbackend.py. Ensures timeshift is installed, renders
  /etc/timeshift/timeshift.json (btrfs_mode + schedule_/count_ from the neutral
  schedule), and installs a self-removing first-boot one-shot that registers
  the schedule (timeshift --check) and takes the optional initial snapshot.
- auto_installer: _apply_snapshots after the package phase; wired into needs_post.
- Integration: custom-btrfs now enables snapshots and verifies timeshift.json
  btrfs_mode + that timeshift --list parses our config (validates the json
  format against the shipped Timeshift) + the first-boot one-shot self-removed.
  New fail-snapshots-on-ext4 failure scenario asserts the cross-validation
  aborts a btrfs-backend-on-ext4 config at schema time.

329 unit tests pass. Note: timeshift.json keys can vary by Timeshift version;
the integration test validates the format against the real shipped tool.
The timeshift-config-btrfs-mode verify command embedded "btrfs_mode": "true"
unquoted, whose colon-space YAML read as a mapping separator -> ScannerError
before the VM even booted. Quote the command and drop the literal colon.
Queue item 7, full multi-level + LVM-on-RAID. Stage 1: schema + validation.

- storage.disks: a list of disk matches (multi-disk); storage.target made
  optional (exactly one of target/disks, RAID requires disks).
- storage.raid: RaidArray list (name md0.., level 0/1/5/10, metadata,
  then mount|lvm_pv|subvolumes like a partition/LV).
- CustomPartition.raid: marks a partition a member of a named array (the
  member is replicated on each disk, so device count = number of disks).
- Storage._consistency: RAID referential integrity (members<->arrays),
  per-level minimum disk counts (0/1:2, 5:3, 10:4), LVM PVs may now come from
  RAID arrays, mounts include array+subvolume mounts.

Engine (mdadm create, mdadm.conf, RAID-aware initramfs, grub to every member)
and integration scenarios follow in stage 2. 338 unit tests pass.
…nstead

timeshift --list requires root, but the verify runs as the unprivileged ssh
user, so it returned non-zero on permissions (not a config error). The
config-written + first-boot-one-shot-ran (as root) checks already cover our
code; swap the root-only check for command -v timeshift.
- build_setup: resolve storage.disks to N distinct disks (setup.disks),
  reject overlapping matches; pass custom_raid + the partition raid field;
  a disks= override for tests.
- _create_custom_partitions: multi-disk. Partition every disk identically,
  set the raid flag on member partitions, mdadm --create each array over its
  per-disk members, then mkfs/pvcreate/btrfs on the md device (LVM PVs may now
  be md devices). Non-RAID partitions are used from the first disk (ESP/boot
  copies on the others exist for bootloader redundancy).
- finish_installation: install mdadm, write mdadm.conf (--detail --scan), add
  the md/raid initramfs modules, and grub-install to EVERY member disk so the
  machine boots if one disk fails.

Unit-tested (mdadm/pv/vg/mkfs command sequence over 3 disks, raid flags,
per-disk labels; existing single-disk custom path unchanged). 340 unit tests
pass. Integration (multi-disk RAID install + boot) follows in stage 3.
BIOS RAID1 install across two disks (/boot + /), booted from the array and
verified over SSH: root on an /dev/md device, two active raid1 arrays both
healthy ([UU]), mdadm.conf present. Also: the engine now ensures mdadm is in
the live environment before building arrays (not on every live image).
raid1-bios proved the boot-from-RAID path green on the first try. Add the full
multi-level case (the user preview): BIOS RAID1 /boot + RAID5 across 3 disks
used as an LVM PV, with root on the LV. Verifies root on /dev/mapper/vg0-root,
a raid1 + raid5 both active, the RAID5 array with all three members [UUU], and
the LVM PV sitting on an /dev/md device.
automated-install.md: a Software RAID subsection (disks list, raid arrays,
levels 0/1/5/10, LVM-on-RAID, grub-to-every-member, validation rules).
comparison.md: Software RAID and custom-partitioning rows now check; roadmap
item 4 done. Limitations updated.
RAID5+LVM-on-RAID boots correctly (root on /dev/mapper/vg0-root, RAID5 [UUU]);
the only failing check was pvs, which needs root while the verify runs as the
unprivileged ssh user. Swap for an lsblk dependency-stack check that proves
LVM-on-RAID without root.
Closes the lifecycle gap opposite late_commands.

- schema: top-level early_commands: [..]; on_failure.early_command_failure
  (default abort).
- auto_installer: _run_early_commands runs them in the LIVE environment (not a
  chroot — /target does not exist yet) BEFORE partitioning, as the first step
  of run(). An on-media file is run with sh. Failure -> early_command_failure
  policy (fail before any disk is touched).
- Deliberately CANNOT change the layout (config is data; per-machine layouts
  use generated answer files / auto: discovery) — documented.

For prepping the install environment: tear down a stale array, mount an
out-of-band source, place a cert/keyfile for ca_certs/EAP/LUKS to reference.
7 new tests (347 unit total). Docs + comparison updated (%pre row now check).
Rounds out the package story (apt + flatpak — both package systems Mint uses).

- schema: flatpak: {remotes: [{name, url}], install: [app-ids]}. Validation:
  https remote URLs, valid app ids, unique remote names, install needs >=1
  remote.
- auto_installer: _apply_flatpak ensures flatpak (apt), adds each remote
  (remote-add --if-not-exists), and installs each app in the chroot with
  flatpak install --system -y --noninteractive (root, so no polkit/session
  needed; the long install timeout absorbs the download). After the package
  phase. Wired into needs_post.
- Integration: a flatpak scenario installs Flathub + two small popular apps
  (Flatseal, Calculator — shared GNOME runtime) and verifies them present on
  the booted system (a REAL flatpak pull, not mocked).

Snap is deliberately unsupported (Mint/LMDE disable snapd) but NOT foreclosed:
a future snap: section would be additive and separate, like flatpak:/apt:.
9 new tests (356 unit total). Docs + comparison updated.
…ak refs

Targeted documentation corrections after the feature queue landed:

- automated-install.md: fix the Limitations and keyboard TOC/section
  anchors; note the keyfile uses the same insecure-transport rules as
  the answer file; correct that swap is a filesystem, not a flag; clarify
  gateway4/6 produce an equivalent default route; add early_command_failure
  to the failure list and note the disk-target mismatch always aborts
  ahead of partition_mismatch; add TFTP to the cleartext-transport list;
  state the logging defaults; fix a double 'and' in Limitations.
- getting-started.md: list the RAID, LUKS-prompt, and flatpak example files.
- README.md: add catrust.py and snapshotbackend.py to the module map and
  the new features (RAID, EAP, first-boot LUKS rekey, Timeshift, etc.) to
  the intro.
- serial-console-and-luks.md: document the prompt-on-first-boot serial
  rekey path and the _setup_luks_first_boot_rekey step.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant