|
| 1 | +# PRD: Component Change Detection for Build Queue Determination |
| 2 | + |
| 3 | +**Related ideas:** bn-c5ae, bn-8bb7 (superseded NEVR approach) |
| 4 | +**Priority:** P0 — Critical |
| 5 | +**Status:** Draft |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Problem |
| 10 | + |
| 11 | +When a PR lands, the build pipeline needs to know which components to queue for building and publishing. Today there is no automated way to answer "what changed?" — developers manually track which components they touched. |
| 12 | + |
| 13 | +The original approach (computing NEVR and comparing against Koji) was abandoned because: |
| 14 | +- It requires a mock root and source downloads for every component (slow) |
| 15 | +- `%autorelease` derives release numbers from synthetic git history, making offline NEVR computation fragile |
| 16 | +- Koji already errors on duplicate NEVRs — that error is real signal worth investigating, not something to pre-filter |
| 17 | + |
| 18 | +The better approach: **detect whether a component's build inputs changed between two commits.** Changed inputs = queue for build. Let Koji handle the rest. |
| 19 | + |
| 20 | +## Target Users |
| 21 | + |
| 22 | +- **CI/CD pipelines** — automatically determine the build queue for a PR or merge |
| 23 | +- **Developers** — quickly check which components a change affects before submitting |
| 24 | +- **Release tooling** — audit what changed between releases |
| 25 | + |
| 26 | +## Proposed Solution |
| 27 | + |
| 28 | +Add a `component identity` subcommand that dumps a deterministic fingerprint of every component's build inputs. CI runs it at two commits and diffs the output. Changed fingerprints = changed components = build queue. |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +## Design |
| 33 | + |
| 34 | +### Core Principle: Fingerprint Resolved Inputs, Not Configuration Metadata |
| 35 | + |
| 36 | +The fingerprint must capture the **fully-resolved** build inputs after all config merging, inheritance, and path resolution — not raw config values. This ensures that changes to global/distro/group-level defaults automatically propagate to every component they affect. |
| 37 | + |
| 38 | +### What Goes Into the Fingerprint |
| 39 | + |
| 40 | +The fingerprint for a component is a deterministic hash of all inputs that could change RPM output. |
| 41 | + |
| 42 | +#### 1. Resolved Component Config (from merged `ProjectConfig.Components[name]`) |
| 43 | + |
| 44 | +Fingerprinted using struct reflection with `fingerprint:"-"` tags on excluded fields: |
| 45 | + |
| 46 | +| Field | Included | Rationale | |
| 47 | +|-------|----------|-----------| |
| 48 | +| `Spec` (SpecSource) | Yes | Different spec = different package | |
| 49 | +| `Overlays` (functional fields only) | Yes | Any overlay change mutates output. `Overlay.Description` is excluded (`fingerprint:"-"`) as it is purely documentation. | |
| 50 | +| `Build.With`, `Build.Without`, `Build.Defines`, `Build.Undefines` | Yes | Directly affect rpmbuild macros | |
| 51 | +| `Build.Check.Skip` | Yes | Changes %check behavior | |
| 52 | +| `SourceFiles` (filenames + hashes) | Yes | Different sources = different package | |
| 53 | +| `Name` | No (`fingerprint:"-"`) | Metadata, already the map key | |
| 54 | +| `SourceConfigFile` | No (`fingerprint:"-"`) | Internal reference | |
| 55 | +| `Build.Failure.Expected` | No (`fingerprint:"-"`) | CI decision, not a build input | |
| 56 | +| `Build.Hints.Expensive` | No (`fingerprint:"-"`) | Scheduling hint, not a build input | |
| 57 | +| `Build.Check.SkipReason` | No (`fingerprint:"-"`) | Documentation, not a build input | |
| 58 | +| `Build.Failure.ExpectedReason` | No (`fingerprint:"-"`) | Documentation, not a build input | |
| 59 | + |
| 60 | +**Key design**: new fields default to **included** in the fingerprint. Only explicitly excluded fields are tagged `fingerprint:"-"`. A guard test asserts every field has been consciously categorized. |
| 61 | + |
| 62 | +#### 2. Actual File Content Hashes |
| 63 | + |
| 64 | +Config references files by path. The fingerprint must hash their *content*, not their path: |
| 65 | + |
| 66 | +| File | How to hash | |
| 67 | +|------|-------------| |
| 68 | +| Spec file (`Spec.Path` for local specs) | SHA256 of file content | |
| 69 | +| Overlay source files (patches, added files) | SHA256 of each file referenced by overlay `Source` fields | |
| 70 | + |
| 71 | +#### 3. Resolved Distro Context |
| 72 | + |
| 73 | +| Field | Included | Rationale | |
| 74 | +|-------|----------|-----------| |
| 75 | +| Effective distro name + version | Yes | Different distro = different build env | |
| 76 | +| Distro-level `DefaultComponentConfig` | Yes | Already merged into component config — captured by #1 | |
| 77 | + |
| 78 | +**Mock config is explicitly excluded** from the per-component fingerprint. A mock config change is a distro-level event (new dist tag, mass rebuild) — not something that should trigger individual component rebuilds via change detection. Mock config changes are handled by a separate coordinated mass-rebuild workflow. |
| 79 | + |
| 80 | +#### 4. Upstream Spec Identity (for `SourceType=upstream`) |
| 81 | + |
| 82 | +| Field | Included | Rationale | |
| 83 | +|-------|----------|-----------| |
| 84 | +| `UpstreamCommit` (pinned or resolved hash, included as-is) | Yes | Different commit = different spec | |
| 85 | +| `UpstreamName` | Yes | Different upstream package name | |
| 86 | +| `UpstreamDistro` (name + version) | Yes | Different distro source | |
| 87 | + |
| 88 | +**Floating upstream specs:** For components without a pinned `UpstreamCommit`, the fingerprint resolves the upstream ref to a concrete commit at runtime and includes that commit hash directly as a string input (not double-hashed). If upstream moved between the base and head identity runs, the resolved commit differs → fingerprint changes → rebuild. This is correct behavior. The tool already requires network access for most operations, so online resolution is not a concern. |
| 89 | + |
| 90 | +**Future: global snapshot time.** When a global snapshot timestamp is introduced, the fingerprint should include the **resolved upstream commit hash directly**, not the snapshot timestamp itself. This prevents a snapshot bump from marking all upstream components as changed when only a handful actually got new content. |
| 91 | + |
| 92 | +#### 5. Affects Commit Count (for synthetic history / %autorelease) |
| 93 | + |
| 94 | +| Field | Included | Rationale | |
| 95 | +|-------|----------|-----------| |
| 96 | +| Count of `Affects: <component>` commits in project repo | Yes | Each Affects commit = +1 release in %autorelease | |
| 97 | + |
| 98 | +This count changes when someone adds a commit with `Affects: curl`, which is exactly when the component needs a rebuild. |
| 99 | + |
| 100 | +### What Is Explicitly Excluded |
| 101 | + |
| 102 | +- Source archive contents — already represented by `SourceFiles[].Hash` in config |
| 103 | +- Log/work/output directory paths — runtime choice, not a build input |
| 104 | +- `ComponentGroupConfig.Description` — documentation |
| 105 | +- Image config — irrelevant to component builds |
| 106 | +- System architecture — captured by mock config (platform-specific path already resolved) |
| 107 | + |
| 108 | +### Global Change Propagation |
| 109 | + |
| 110 | +This is the critical correctness property. Changes to shared config must fan out to all affected components. |
| 111 | + |
| 112 | +**How it works:** The fingerprint operates on the **fully-merged** `ProjectConfig`. Config merging happens at load time: |
| 113 | + |
| 114 | +``` |
| 115 | +defaults.toml → project azldev.toml → included configs → CLI overrides |
| 116 | + ↓ merge |
| 117 | +DistroVersion.DefaultComponentConfig → ComponentGroup.DefaultComponentConfig → ComponentConfig |
| 118 | +``` |
| 119 | + |
| 120 | +If `defaults.toml` changes a distro's `DefaultComponentConfig.Build.With`, every component inheriting from that distro gets a different resolved config → different fingerprint → queued for rebuild. No special propagation logic needed. |
| 121 | + |
| 122 | +**Guard test:** A unit test verifies that mutating any shared config field (distro default, group default) changes the fingerprint of components that inherit from it. |
| 123 | + |
| 124 | +### `fingerprint:"-"` Tag System |
| 125 | + |
| 126 | +```go |
| 127 | +type ComponentBuildConfig struct { |
| 128 | + With []string `toml:"with"` |
| 129 | + Without []string `toml:"without"` |
| 130 | + Defines map[string]string `toml:"defines"` |
| 131 | + Undefines []string `toml:"undefines"` |
| 132 | + Check CheckConfig `toml:"check"` |
| 133 | + Failure ComponentBuildFailureConfig `toml:"failure" fingerprint:"-"` |
| 134 | + Hints ComponentBuildHints `toml:"hints" fingerprint:"-"` |
| 135 | +} |
| 136 | + |
| 137 | +type CheckConfig struct { |
| 138 | + Skip bool `toml:"skip"` |
| 139 | + SkipReason string `toml:"skip-reason" fingerprint:"-"` |
| 140 | +} |
| 141 | +``` |
| 142 | + |
| 143 | +**Guard test:** Reflects over all fingerprinted structs. Fails if any field lacks either a `fingerprint:"-"` tag or is implicitly included. Forces a conscious decision on every field. |
| 144 | + |
| 145 | +```go |
| 146 | +func TestAllFieldsHaveFingerprintDecision(t *testing.T) { |
| 147 | + // For each struct in the fingerprint set: |
| 148 | + // Walk fields via reflect. If a field has fingerprint:"-", it's excluded. |
| 149 | + // If it has no fingerprint tag, it's included (default). |
| 150 | + // Test passes if all fields are accounted for. |
| 151 | + // When a new field is added without a tag, the test still passes (included by default). |
| 152 | + // Optional: maintain an explicit allowlist of excluded fields to catch accidental tag removal. |
| 153 | +} |
| 154 | +``` |
| 155 | + |
| 156 | +### CLI Design |
| 157 | + |
| 158 | +#### `azldev component identity` |
| 159 | + |
| 160 | +Dumps the fingerprint for selected components. |
| 161 | + |
| 162 | +```bash |
| 163 | +# All components, JSON output for CI |
| 164 | +azldev component identity -a -O json > identity.json |
| 165 | + |
| 166 | +# Single component, table output for dev |
| 167 | +azldev component identity -p curl |
| 168 | + |
| 169 | +# Output: |
| 170 | +# COMPONENT FINGERPRINT |
| 171 | +# curl sha256:a1b2c3d4... |
| 172 | +``` |
| 173 | + |
| 174 | +Uses standard component filter flags (`-p`, `-g`, `-a`, `-s`). |
| 175 | + |
| 176 | +**JSON output structure:** |
| 177 | +```json |
| 178 | +{ |
| 179 | + "curl": { |
| 180 | + "fingerprint": "sha256:a1b2c3d4e5f6...", |
| 181 | + "inputs": { |
| 182 | + "config_hash": "sha256:...", |
| 183 | + "spec_content_hash": "sha256:...", |
| 184 | + "overlay_file_hashes": { |
| 185 | + "patches/fix.patch": "sha256:...", |
| 186 | + "patches/build.patch": "sha256:..." |
| 187 | + }, |
| 188 | + "affects_commit_count": 3, |
| 189 | + "distro": "azl3", |
| 190 | + "distro_version": "3.0" |
| 191 | + } |
| 192 | + } |
| 193 | +} |
| 194 | +``` |
| 195 | + |
| 196 | +The `inputs` breakdown is for debugging — the top-level `fingerprint` is the single hash for comparison. |
| 197 | + |
| 198 | +#### `azldev component diff-identity` |
| 199 | + |
| 200 | +Compares two identity dumps and outputs changed components. |
| 201 | + |
| 202 | +```bash |
| 203 | +azldev component diff-identity base.json head.json |
| 204 | +``` |
| 205 | + |
| 206 | +**Output (table):** |
| 207 | +``` |
| 208 | +COMPONENT STATUS DETAIL |
| 209 | +curl changed spec_content_hash, overlay_file_hashes |
| 210 | +wget added (new component) |
| 211 | +libfoo removed (component removed) |
| 212 | +openssl unchanged |
| 213 | +``` |
| 214 | + |
| 215 | +**Output (JSON, for CI):** |
| 216 | +```json |
| 217 | +{ |
| 218 | + "changed": ["curl"], |
| 219 | + "added": ["wget"], |
| 220 | + "removed": ["libfoo"], |
| 221 | + "unchanged": ["openssl"] |
| 222 | +} |
| 223 | +``` |
| 224 | + |
| 225 | +CI builds the queue from `changed` + `added`. |
| 226 | + |
| 227 | +### CI Workflow |
| 228 | + |
| 229 | +```bash |
| 230 | +# At PR gate: |
| 231 | +git checkout $BASE_REF |
| 232 | +azldev component identity -a -O json > /tmp/base-identity.json |
| 233 | + |
| 234 | +git checkout $HEAD_REF |
| 235 | +azldev component identity -a -O json > /tmp/head-identity.json |
| 236 | + |
| 237 | +# Diff |
| 238 | +CHANGED=$(azldev component diff-identity /tmp/base-identity.json /tmp/head-identity.json -O json | jq -r '.changed[]') |
| 239 | + |
| 240 | +# Queue builds |
| 241 | +for component in $CHANGED; do |
| 242 | + koji-build queue "$component" |
| 243 | +done |
| 244 | +``` |
| 245 | + |
| 246 | +--- |
| 247 | + |
| 248 | +## Scope |
| 249 | + |
| 250 | +### In Scope |
| 251 | + |
| 252 | +1. `component identity` subcommand — dump deterministic fingerprints |
| 253 | +2. `component diff-identity` subcommand — compare two identity files |
| 254 | +3. `fingerprint:"-"` struct tag system with guard tests |
| 255 | +4. Fingerprint library function callable outside CLI |
| 256 | +5. Content hashing for spec files and overlay sources |
| 257 | +6. Affects commit count integration (from PR #17 synthetic history work) |
| 258 | + |
| 259 | +### Out of Scope |
| 260 | + |
| 261 | +- Git-aware diffing within the tool (Option B from earlier discussion) — CI does the checkout |
| 262 | +- Transitive build dependency detection (if B changed and A `BuildRequires: B`) — periodic checks handle this |
| 263 | +- MCP integration |
| 264 | +- NEVR computation (deprioritized, separate backlog item) |
| 265 | +- Global snapshot time resolution (future feature — PRD notes the design constraint) |
| 266 | + |
| 267 | +--- |
| 268 | + |
| 269 | +## Implementation Plan |
| 270 | + |
| 271 | +1. **Add `fingerprint:"-"` tags** to existing config structs + guard test — small, self-contained |
| 272 | +2. **Implement fingerprint computation** — walk resolved config, hash fields, hash referenced files |
| 273 | +3. **Add `component identity` subcommand** — standard CLI pattern, returns fingerprint struct |
| 274 | +4. **Add `component diff-identity` subcommand** — reads two JSON files, computes diff |
| 275 | +5. **Integration test** — scenario test that modifies config and verifies fingerprint changes |
| 276 | +6. **CI pipeline integration** — documentation/example for PR gate workflow |
| 277 | + |
| 278 | +### Dependencies |
| 279 | + |
| 280 | +- PR #17 (synthetic history / Affects commits) for `affects_commit_count` — can stub with 0 until merged |
| 281 | +- Existing config loading and component resolution infrastructure |
| 282 | + |
| 283 | +--- |
| 284 | + |
| 285 | +## Success Criteria |
| 286 | + |
| 287 | +- Changing any build-relevant config field changes the fingerprint of all affected components |
| 288 | +- Changing a distro default propagates to all inheriting components |
| 289 | +- Adding a new struct field without deciding on fingerprint inclusion causes a test failure |
| 290 | +- `diff-identity` correctly identifies added, removed, changed, and unchanged components |
| 291 | +- CI can determine the build queue in <30s for a typical project (no mock root, no source downloads) |
| 292 | +- No false negatives (missed rebuilds) for any config-driven change |
| 293 | + |
| 294 | +## Risks |
| 295 | + |
| 296 | +- **Floating upstream specs** (no pinned commit): upstream ref is resolved at runtime. If upstream moves between identity runs, the component is flagged as changed. This is correct but means the fingerprint is not reproducible without network access — acceptable since the tool is inherently online. |
| 297 | +- **External changes outside config** (e.g., upstream repo force-push changing a pinned commit's content): extremely unlikely and not worth defending against. |
0 commit comments