Skip to content

Commit 7731e28

Browse files
committed
wip
1 parent 8db810c commit 7731e28

19 files changed

+1252
-24
lines changed

change-detection-prd.md

Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
# PRD: Component Change Detection for Build Queue Determination
2+
3+
**Related ideas:** bn-c5ae, bn-8bb7 (superseded NEVR approach)
4+
**Priority:** P0 — Critical
5+
**Status:** Draft
6+
7+
---
8+
9+
## Problem
10+
11+
When a PR lands, the build pipeline needs to know which components to queue for building and publishing. Today there is no automated way to answer "what changed?" — developers manually track which components they touched.
12+
13+
The original approach (computing NEVR and comparing against Koji) was abandoned because:
14+
- It requires a mock root and source downloads for every component (slow)
15+
- `%autorelease` derives release numbers from synthetic git history, making offline NEVR computation fragile
16+
- Koji already errors on duplicate NEVRs — that error is real signal worth investigating, not something to pre-filter
17+
18+
The better approach: **detect whether a component's build inputs changed between two commits.** Changed inputs = queue for build. Let Koji handle the rest.
19+
20+
## Target Users
21+
22+
- **CI/CD pipelines** — automatically determine the build queue for a PR or merge
23+
- **Developers** — quickly check which components a change affects before submitting
24+
- **Release tooling** — audit what changed between releases
25+
26+
## Proposed Solution
27+
28+
Add a `component identity` subcommand that dumps a deterministic fingerprint of every component's build inputs. CI runs it at two commits and diffs the output. Changed fingerprints = changed components = build queue.
29+
30+
---
31+
32+
## Design
33+
34+
### Core Principle: Fingerprint Resolved Inputs, Not Configuration Metadata
35+
36+
The fingerprint must capture the **fully-resolved** build inputs after all config merging, inheritance, and path resolution — not raw config values. This ensures that changes to global/distro/group-level defaults automatically propagate to every component they affect.
37+
38+
### What Goes Into the Fingerprint
39+
40+
The fingerprint for a component is a deterministic hash of all inputs that could change RPM output.
41+
42+
#### 1. Resolved Component Config (from merged `ProjectConfig.Components[name]`)
43+
44+
Fingerprinted using struct reflection with `fingerprint:"-"` tags on excluded fields:
45+
46+
| Field | Included | Rationale |
47+
|-------|----------|-----------|
48+
| `Spec` (SpecSource) | Yes | Different spec = different package |
49+
| `Overlays` (functional fields only) | Yes | Any overlay change mutates output. `Overlay.Description` is excluded (`fingerprint:"-"`) as it is purely documentation. |
50+
| `Build.With`, `Build.Without`, `Build.Defines`, `Build.Undefines` | Yes | Directly affect rpmbuild macros |
51+
| `Build.Check.Skip` | Yes | Changes %check behavior |
52+
| `SourceFiles` (filenames + hashes) | Yes | Different sources = different package |
53+
| `Name` | No (`fingerprint:"-"`) | Metadata, already the map key |
54+
| `SourceConfigFile` | No (`fingerprint:"-"`) | Internal reference |
55+
| `Build.Failure.Expected` | No (`fingerprint:"-"`) | CI decision, not a build input |
56+
| `Build.Hints.Expensive` | No (`fingerprint:"-"`) | Scheduling hint, not a build input |
57+
| `Build.Check.SkipReason` | No (`fingerprint:"-"`) | Documentation, not a build input |
58+
| `Build.Failure.ExpectedReason` | No (`fingerprint:"-"`) | Documentation, not a build input |
59+
60+
**Key design**: new fields default to **included** in the fingerprint. Only explicitly excluded fields are tagged `fingerprint:"-"`. A guard test asserts every field has been consciously categorized.
61+
62+
#### 2. Actual File Content Hashes
63+
64+
Config references files by path. The fingerprint must hash their *content*, not their path:
65+
66+
| File | How to hash |
67+
|------|-------------|
68+
| Spec file (`Spec.Path` for local specs) | SHA256 of file content |
69+
| Overlay source files (patches, added files) | SHA256 of each file referenced by overlay `Source` fields |
70+
71+
#### 3. Resolved Distro Context
72+
73+
| Field | Included | Rationale |
74+
|-------|----------|-----------|
75+
| Effective distro name + version | Yes | Different distro = different build env |
76+
| Distro-level `DefaultComponentConfig` | Yes | Already merged into component config — captured by #1 |
77+
78+
**Mock config is explicitly excluded** from the per-component fingerprint. A mock config change is a distro-level event (new dist tag, mass rebuild) — not something that should trigger individual component rebuilds via change detection. Mock config changes are handled by a separate coordinated mass-rebuild workflow.
79+
80+
#### 4. Upstream Spec Identity (for `SourceType=upstream`)
81+
82+
| Field | Included | Rationale |
83+
|-------|----------|-----------|
84+
| `UpstreamCommit` (pinned or resolved hash, included as-is) | Yes | Different commit = different spec |
85+
| `UpstreamName` | Yes | Different upstream package name |
86+
| `UpstreamDistro` (name + version) | Yes | Different distro source |
87+
88+
**Floating upstream specs:** For components without a pinned `UpstreamCommit`, the fingerprint resolves the upstream ref to a concrete commit at runtime and includes that commit hash directly as a string input (not double-hashed). If upstream moved between the base and head identity runs, the resolved commit differs → fingerprint changes → rebuild. This is correct behavior. The tool already requires network access for most operations, so online resolution is not a concern.
89+
90+
**Future: global snapshot time.** When a global snapshot timestamp is introduced, the fingerprint should include the **resolved upstream commit hash directly**, not the snapshot timestamp itself. This prevents a snapshot bump from marking all upstream components as changed when only a handful actually got new content.
91+
92+
#### 5. Affects Commit Count (for synthetic history / %autorelease)
93+
94+
| Field | Included | Rationale |
95+
|-------|----------|-----------|
96+
| Count of `Affects: <component>` commits in project repo | Yes | Each Affects commit = +1 release in %autorelease |
97+
98+
This count changes when someone adds a commit with `Affects: curl`, which is exactly when the component needs a rebuild.
99+
100+
### What Is Explicitly Excluded
101+
102+
- Source archive contents — already represented by `SourceFiles[].Hash` in config
103+
- Log/work/output directory paths — runtime choice, not a build input
104+
- `ComponentGroupConfig.Description` — documentation
105+
- Image config — irrelevant to component builds
106+
- System architecture — captured by mock config (platform-specific path already resolved)
107+
108+
### Global Change Propagation
109+
110+
This is the critical correctness property. Changes to shared config must fan out to all affected components.
111+
112+
**How it works:** The fingerprint operates on the **fully-merged** `ProjectConfig`. Config merging happens at load time:
113+
114+
```
115+
defaults.toml → project azldev.toml → included configs → CLI overrides
116+
↓ merge
117+
DistroVersion.DefaultComponentConfig → ComponentGroup.DefaultComponentConfig → ComponentConfig
118+
```
119+
120+
If `defaults.toml` changes a distro's `DefaultComponentConfig.Build.With`, every component inheriting from that distro gets a different resolved config → different fingerprint → queued for rebuild. No special propagation logic needed.
121+
122+
**Guard test:** A unit test verifies that mutating any shared config field (distro default, group default) changes the fingerprint of components that inherit from it.
123+
124+
### `fingerprint:"-"` Tag System
125+
126+
```go
127+
type ComponentBuildConfig struct {
128+
With []string `toml:"with"`
129+
Without []string `toml:"without"`
130+
Defines map[string]string `toml:"defines"`
131+
Undefines []string `toml:"undefines"`
132+
Check CheckConfig `toml:"check"`
133+
Failure ComponentBuildFailureConfig `toml:"failure" fingerprint:"-"`
134+
Hints ComponentBuildHints `toml:"hints" fingerprint:"-"`
135+
}
136+
137+
type CheckConfig struct {
138+
Skip bool `toml:"skip"`
139+
SkipReason string `toml:"skip-reason" fingerprint:"-"`
140+
}
141+
```
142+
143+
**Guard test:** Reflects over all fingerprinted structs. Fails if any field lacks either a `fingerprint:"-"` tag or is implicitly included. Forces a conscious decision on every field.
144+
145+
```go
146+
func TestAllFieldsHaveFingerprintDecision(t *testing.T) {
147+
// For each struct in the fingerprint set:
148+
// Walk fields via reflect. If a field has fingerprint:"-", it's excluded.
149+
// If it has no fingerprint tag, it's included (default).
150+
// Test passes if all fields are accounted for.
151+
// When a new field is added without a tag, the test still passes (included by default).
152+
// Optional: maintain an explicit allowlist of excluded fields to catch accidental tag removal.
153+
}
154+
```
155+
156+
### CLI Design
157+
158+
#### `azldev component identity`
159+
160+
Dumps the fingerprint for selected components.
161+
162+
```bash
163+
# All components, JSON output for CI
164+
azldev component identity -a -O json > identity.json
165+
166+
# Single component, table output for dev
167+
azldev component identity -p curl
168+
169+
# Output:
170+
# COMPONENT FINGERPRINT
171+
# curl sha256:a1b2c3d4...
172+
```
173+
174+
Uses standard component filter flags (`-p`, `-g`, `-a`, `-s`).
175+
176+
**JSON output structure:**
177+
```json
178+
{
179+
"curl": {
180+
"fingerprint": "sha256:a1b2c3d4e5f6...",
181+
"inputs": {
182+
"config_hash": "sha256:...",
183+
"spec_content_hash": "sha256:...",
184+
"overlay_file_hashes": {
185+
"patches/fix.patch": "sha256:...",
186+
"patches/build.patch": "sha256:..."
187+
},
188+
"affects_commit_count": 3,
189+
"distro": "azl3",
190+
"distro_version": "3.0"
191+
}
192+
}
193+
}
194+
```
195+
196+
The `inputs` breakdown is for debugging — the top-level `fingerprint` is the single hash for comparison.
197+
198+
#### `azldev component diff-identity`
199+
200+
Compares two identity dumps and outputs changed components.
201+
202+
```bash
203+
azldev component diff-identity base.json head.json
204+
```
205+
206+
**Output (table):**
207+
```
208+
COMPONENT STATUS DETAIL
209+
curl changed spec_content_hash, overlay_file_hashes
210+
wget added (new component)
211+
libfoo removed (component removed)
212+
openssl unchanged
213+
```
214+
215+
**Output (JSON, for CI):**
216+
```json
217+
{
218+
"changed": ["curl"],
219+
"added": ["wget"],
220+
"removed": ["libfoo"],
221+
"unchanged": ["openssl"]
222+
}
223+
```
224+
225+
CI builds the queue from `changed` + `added`.
226+
227+
### CI Workflow
228+
229+
```bash
230+
# At PR gate:
231+
git checkout $BASE_REF
232+
azldev component identity -a -O json > /tmp/base-identity.json
233+
234+
git checkout $HEAD_REF
235+
azldev component identity -a -O json > /tmp/head-identity.json
236+
237+
# Diff
238+
CHANGED=$(azldev component diff-identity /tmp/base-identity.json /tmp/head-identity.json -O json | jq -r '.changed[]')
239+
240+
# Queue builds
241+
for component in $CHANGED; do
242+
koji-build queue "$component"
243+
done
244+
```
245+
246+
---
247+
248+
## Scope
249+
250+
### In Scope
251+
252+
1. `component identity` subcommand — dump deterministic fingerprints
253+
2. `component diff-identity` subcommand — compare two identity files
254+
3. `fingerprint:"-"` struct tag system with guard tests
255+
4. Fingerprint library function callable outside CLI
256+
5. Content hashing for spec files and overlay sources
257+
6. Affects commit count integration (from PR #17 synthetic history work)
258+
259+
### Out of Scope
260+
261+
- Git-aware diffing within the tool (Option B from earlier discussion) — CI does the checkout
262+
- Transitive build dependency detection (if B changed and A `BuildRequires: B`) — periodic checks handle this
263+
- MCP integration
264+
- NEVR computation (deprioritized, separate backlog item)
265+
- Global snapshot time resolution (future feature — PRD notes the design constraint)
266+
267+
---
268+
269+
## Implementation Plan
270+
271+
1. **Add `fingerprint:"-"` tags** to existing config structs + guard test — small, self-contained
272+
2. **Implement fingerprint computation** — walk resolved config, hash fields, hash referenced files
273+
3. **Add `component identity` subcommand** — standard CLI pattern, returns fingerprint struct
274+
4. **Add `component diff-identity` subcommand** — reads two JSON files, computes diff
275+
5. **Integration test** — scenario test that modifies config and verifies fingerprint changes
276+
6. **CI pipeline integration** — documentation/example for PR gate workflow
277+
278+
### Dependencies
279+
280+
- PR #17 (synthetic history / Affects commits) for `affects_commit_count` — can stub with 0 until merged
281+
- Existing config loading and component resolution infrastructure
282+
283+
---
284+
285+
## Success Criteria
286+
287+
- Changing any build-relevant config field changes the fingerprint of all affected components
288+
- Changing a distro default propagates to all inheriting components
289+
- Adding a new struct field without deciding on fingerprint inclusion causes a test failure
290+
- `diff-identity` correctly identifies added, removed, changed, and unchanged components
291+
- CI can determine the build queue in <30s for a typical project (no mock root, no source downloads)
292+
- No false negatives (missed rebuilds) for any config-driven change
293+
294+
## Risks
295+
296+
- **Floating upstream specs** (no pinned commit): upstream ref is resolved at runtime. If upstream moves between identity runs, the component is flagged as changed. This is correct but means the fingerprint is not reproducible without network access — acceptable since the tool is inherently online.
297+
- **External changes outside config** (e.g., upstream repo force-push changing a pinned commit's content): extremely unlikely and not worth defending against.

docs/user/reference/cli/azldev_component.md

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/user/reference/cli/azldev_component_diff-identity.md

Lines changed: 53 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)