Skip to content

Phase 0 / Closure / Vertical slice + Phase 0 closure#24

Draft
guysenpai wants to merge 60 commits into
mainfrom
phase-0/closure/vertical-slice-and-freeze
Draft

Phase 0 / Closure / Vertical slice + Phase 0 closure#24
guysenpai wants to merge 60 commits into
mainfrom
phase-0/closure/vertical-slice-and-freeze

Conversation

@guysenpai

Copy link
Copy Markdown
Contributor

E1 in review — full description at closure.

guysenpai and others added 30 commits June 13, 2026 02:08
copyBufferToTexture is a Phase-0 no-op on the Vulkan GAL backend
(command_encoder.zig:85) — the only public GAL texture-upload path.
The texture-asset -> GPU-sampled seam is therefore unwired; fixing it
touches src/ (outside the E4 borne). Narrow block: mesh+camera+depth+
instancing+input and a mesh asset (copyBufferToBuffer is implemented)
are fully wired. Recommended Option A: pivot the cooked asset to a
mesh, render mesh+camera, document GPU texture upload as Phase-1.
Stopped before writing E4 code; awaiting ruling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
copyBufferToTexture marked RESOLVED (ruling Option B / corrected model,
implemented as the copyTextureToBuffer mirror). New blocker E4 #2:
RenderPassEncoder.setBindGroup is a "Phase 1+" no-op and
cmdBindDescriptorSets is unwired in the GAL, so the slice's textured/
camera draw binds nothing to set 0 (VUID-08114 + blank frame). Same
class as copyBufferToTexture; proposed ruling: implement
cmdBindDescriptorSets + track the pipeline layout in setPipeline.
Push held to avoid burning a CI cycle. The trivial createBindGroup
[*]T->*const T type-typo was fixed inline (cardinal-rule carve-out).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The vertical slice (M0.9 E4) is the first GAL consumer to render with
real resources, surfacing two primitives the triangle (empty bind group,
no texture upload) never exercised — both stubbed "Phase 1+" in M0.4:

- copyBufferToTexture: implemented as the exact mirror of its neighbour
  copyTextureToBuffer (NOT vk_blit's raw path). The buffer->image
  asymmetry (destination born UNDEFINED, no render pass to carry a
  final_layout, no encoder-level barrier on the public surface) forces
  internal transitions undefined->transfer_dst then transfer_dst->
  shader_read; documented as a deliberate divergence. WebGPU-symmetric
  signature (zero callers existed); null stub matched.

- RenderPassEncoder.setBindGroup: was a no-op (cmdBindDescriptorSets was
  unwired GAL-wide). setPipeline now records the bound pipeline's layout
  in current_pipeline_layout; setBindGroup binds against it, and SKIPS
  cleanly when no pipeline was bound first (no unreachable, no implicit
  "last pipeline is current").

- bind_group createGroup: fix the [*]T->*const T type bug on p_pool_sizes
  (never Vulkan-compiled before), and harden the unused WriteDescriptorSet
  union pointers — point them at valid zero-initialized dummies instead of
  undefined, so no garbage address reaches a defensively-inspecting
  validation layer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the M0.9 headless slice (E3) into a Vulkan forward render of the
live ECS scene, exercising the render/asset/input bricks end-to-end:

- render.zig: GAL forward pass — one cube mesh INSTANCED once per entity
  at the entity's live Position, shaded by the cooked albedo texture
  (uploaded via copyBufferToTexture), under a perspective camera with
  depth. Generic over the device type (Vulkan / Null). runInteractive
  (window+swapchain) / runSmoke (offscreen+PPM, headless).
- sim.zig: pure 60 Hz sim — grid layout + per-entity velocities,
  readPosition, and SPACE-toggles-pause via the normalized KeyCode (the
  InputRawState array is raw-scancode-indexed in Phase 0).
- cook_assets.zig + assets/slice_albedo.png: cook the source PNG through
  the real M0.6 pipeline to a .texture.bin, loaded at runtime via Loader.
- shaders/slice.vert|frag (+ committed SPIR-V), math.zig (Vulkan MVP).
- build.zig: slice module (render+asset deps), asset cook+install,
  run-vertical-slice / cook-vertical-slice-assets steps.
- integration test (4 facets): sim 100/120, E2-B cross-file validation,
  M0.6 cook+load, input->pause. Render is not asserted headless (the Null
  backend leaves mapBuffer Unsupported); coverage = compile + lavapipe.
- ci.yml: vertical-slice-smoke job — offscreen render on lavapipe (Debug,
  validation layers active), asserts a frame composes + zero VUID.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The vertical-slice-smoke job failed at apt install: Ubuntu 24.04 names
the Khronos validation layers package `vulkan-validationlayers` (no
hyphen), not `vulkan-validation-layers`. Try both names and fail loudly
if the layer manifest is absent afterwards — the smoke is only
meaningful with validation active.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The slice smoke panicked with `integer overflow` at
`n * @sizeof(Instance)` in setInstancesFromWorld on lavapipe — its first
ever execution (no macOS/headless or test path runs the render). With
n = @min(entity_count, max_instances) <= 100 the multiply cannot
overflow u32 from the source alone, so a runtime value differs from the
code. Derive the instance capacity from the REAL mapped buffer length
(out of the device's buffer registry, not self.max_instances), do the
arithmetic in usize, and clamp the write to that capacity — no u32
multiply, never writes past the buffer, robust regardless. A one-shot
log captures n / entity_count / max_instances / mapped_len / capacity to
pin the root cause in the same CI cycle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The one-shot diagnostic confirmed the values were correct (n=100,
max=100, mapped_len=1200) — not memory corruption — and the lavapipe
smoke is validation-clean. Drop the verbose log; keep a triple clamp
(entity_count, configured max, real buffer capacity) so max_instances
stays meaningful and the write can never overflow or exceed the buffer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E4 done. setBindGroup implemented (cmdBindDescriptorSets + pipeline-
layout tracking with a null-layout guard); unused WriteDescriptorSet
pointers hardened to zero-init dummies; the slice integer-overflow was
diagnosed as not-corruption and fixed robustly. CI fully green incl.
vertical-slice-smoke on lavapipe (validation layers active, frame
composed, zero VUID). Blocker E4 #2 marked RESOLVED.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
guysenpai and others added 30 commits June 13, 2026 19:40
E6 stage added to M0.9 (Claude.ai/Guy decision): rewrite the editor's
viewport blit (src/editor/vk_blit.zig) to render via gal.Device instead
of raw Vulkan, consuming the E4 copyBufferToTexture, and resorb the two
M0.5 sync bugs (BT709 colorspace without extension; binary present
semaphore reuse, VUID-vkQueueSubmit-pSignalSemaphores-00067) if they do
not vanish via the GAL's per-image sync. After E5, before the freeze.
2nd src/ write authorized in M0.9, bounded to vk_blit.zig + its test.
Renumber: E6 = vk_blit consolidation, E7 = freeze, E8 = closure.
(Title shortened from the suggested wording to fit weld_lint's 72-char
limit; the freeze→E7 closure→E8 mapping is preserved here + in the
Acted deviations entry.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E5/C0.8 (edit -> live world -> visible in viewport) rests on three
unwired seams in read-only src/runtime/src/editor: the runtime has no
ECS world and modify_component is a no-op echo ack (runtime/main.zig:272;
messages.zig:199); the runtime renders a stateless renderMire test
pattern, not its world (main.zig:186, :76), so no edit can change a
viewport pixel; the editor never sends ModifyComponent. M0.7/S6 shipped
the IPC as transport + echo stub. 4th same-pattern blocker; structurally
the heaviest (a runtime World + world-to-viewport renderer = Phase-0.6
scale). Recommended Option B: close the protocol + in-process loop,
defer the visual clause to Phase-0.6 (mirrors E3 Level-C). Stopped
before any code; awaiting ruling. vk_blit VUID observation (E6 input)
is independent + read-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Misrouting error, not a 4th blocker (Claude.ai ruling). C0.4 != C0.8:
the C0.4 IPC protocol round-trip was satisfied by M0.7 (renderMire +
echo-ack are its legitimate transport stubs); C0.8 is the vertical
slice, and the slice already IS the C0.8 runtime (live World from E3 +
world->viewport renderer from E4). RULING: the C0.8 loop lives in
examples/vertical_slice/, no src/ touched — an editor-stub thread sends
a real ModifyComponent over the M0.7 transport, the slice decodes +
applies via the field_offset/new_value (diff_runner) pattern, the E4
renderer reflects it. Real-blocker count corrected to 2 (#1 Phase-1
instantiation, #2 Phase-1 cross-file import). E6/vk_blit unaffected:
the C0.8 loop uses the slice's own GAL viewport, not src/editor/vk_blit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E5 closes C0.8 in the slice (the slice IS the C0.8 runtime — World from
E3 + renderer from E4), not in src/runtime (whose renderMire/echo-ack
are the C0.4 transport stubs). No src/ touched.

- ipc_loop.zig: an editor-stub thread sends a real ModifyComponent over
  the real M0.7 transport (AF_UNIX socket + framing); the slice's
  runtime-side client decodes it and applies it to the live World via
  the diff_runner write path (field_offset + new_value -> the component
  slot, field size resolved from the registry). Socket-only, so the
  semantic loop runs headless on every platform.
- integration test (5th facet): a ModifyComponent over M0.7 changes
  Position.x of entity 0 on the live World, leaving Position.y untouched
  — the C0.8 semantic loop asserted end-to-end.
- main.zig --ipc-edit: drive one edit then render the post-edit world
  (the C0.8 visual reflection on hardware/lavapipe).
- ci.yml: a C0.8 --ipc-edit lavapipe step (edit -> world -> frame
  composes, validation clean) + a non-failing vk_blit VUID observation
  via run-ipc-demo under weston+syncval (the E6 input: E5 observes, E6
  fixes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E5 done: the C0.8 component-edit-over-IPC -> live world -> visible loop
is closed in the slice (no src/ touched). Integration test 5/5; CI fully
green incl. the vertical-slice-smoke C0.8 --ipc-edit step on lavapipe.
E6 input captured: the vk_blit swapchain colorspace VUID
(VkSwapchainCreateInfoKHR-imageColorSpace-parameter) FIRES on lavapipe;
the present-semaphore reuse VUID-00067 was not observed under syncval
(latent, likely hardware-only). README updated for the E5 IPC loop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bug (a) colorspace is a GAL bug (swapchain.zig:113 blind-copies
fmt.color_space; format selection filters pixel format only; conv.zig has
no colorspace map) — latent until vk_blit's rewrite onto gal.createSwapchain
makes it the only owner. Authorized E6 file grid (extends the FROZEN
"Files to create or modify"): vk_blit.zig + test; gal/vulkan/swapchain.zig
+ conv.zig (colorspace at its source — prefer srgb_nonlinear core, else
force it); assets/shaders/viewport_blit.frag.glsl + .spv (separate
descriptors, the E4 convention); a 1-line gal getSwapchainImageCount
(for the per-image present-semaphore fix). The swapchain colorspace code
will be FROZEN by E7 (C0.5) — get the contract right, not just the VUID.
main.zig stays zero-diff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The GAL swapchain blind-copied the surface's first-reported colorspace
(swapchain.zig: .image_color_space = fmt.color_space; format selection
matched the pixel format only) — on drivers that report an extended
*_EXT colorspace (e.g. lavapipe, value 1000104013) this emits an
imageColorSpace the swapchain create-info rejects without
VK_EXT_swapchain_colorspace (VUID-VkSwapchainCreateInfoKHR-imageColorSpace-
parameter). It was latent because the E4 slice matched a core
srgb_nonlinear format. Now the swapchain always presents in the core
srgb_nonlinear colorspace (conv.colorSpace), selecting a surface
(format, colorspace) pair carrying it (preferred pixel format, then
BGRA8, then any srgb_nonlinear pair). Also add getSwapchainImageCount so
callers can size per-image present semaphores. Fixed at the source — every
GAL swapchain consumer benefits. (Frozen by E7 / C0.5: contract-level fix.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite src/editor/vk_blit.zig to drive the public GAL (gal.Device)
instead of raw Vulkan — the S6 hand-rolled instance/device/swapchain/
descriptor/command plumbing (vk.device_dispatch.*) is gone, replaced by
GAL calls; the per-frame shm upload uses copyBufferToTexture (E4), whose
internal barriers subsume the hand-issued layout transitions. The file
survives (the shared viewport stays a Phase-2 need); main.zig is
zero-diff (Renderer init/deinit/recreateSwapchain/stageViewport/drawFrame
+ last_known_size/swapchain_dirty preserved).

Resorbs the two M0.5 sync bugs:
- (a) colorspace: fixed at its source in the GAL swapchain (prior commit);
  routing through gal.createSwapchain inherits the fix. The blit frag
  shader moves to separate descriptors (texture2D + sampler, the E4 GAL
  convention) since the GAL has no combined sampler2D binding;
  viewport_blit.frag.spv regenerated, .vert reused, embed.zig unchanged.
- (b) present-semaphore reuse: one render_finished semaphore PER swapchain
  image (via getSwapchainImageCount), indexed by the acquired image_index
  — never re-signals a binary semaphore pending on a prior present. NOT
  observable on lavapipe (synchronous software present); verified by
  construction, hardware-confirmed (no false green).

ci.yml: the E5 vk_blit observation step becomes an E6 assertion — the
colorspace VUID must be ABSENT on lavapipe; (b) stays a non-failing
report. build.zig wires weld_render into the editor module.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The GAL render pass set dependency_count=0 — no external subpass
dependency synchronizing the render pass's image-layout transition (a
color-attachment-output write) with a prior external access, notably a
swapchain image just returned by vkAcquireNextImageKHR (present-engine
read). Under synchronization validation this is a WRITE_AFTER_READ
hazard at vkQueueSubmit. The raw editor blit carried this dependency
(SUBPASS_EXTERNAL->0, color_attachment_output, color_attachment_write);
the E6 consolidation onto the GAL dropped it. Add it at the source — it
benefits every GAL swapchain consumer and is harmless for offscreen
targets. ci.yml: the E6 step now asserts SYNC-HAZARD absent under
syncval (VUID-00067 stays a hardware-only non-failing report).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The E6 syncval observation surfaced a SYNC-HAZARD-WRITE-AFTER-READ: the
GAL render pass omitted the external subpass dependency the raw vk_blit
had (dependency_count=0). Borne widened to gal/vulkan/render_pass.zig to
add it at the source (frozen by E7). Test: run-ipc-demo under syncval on
lavapipe asserts SYNC-HAZARD absent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adding the external subpass dependency to the executing render pass
(render_pass.zig) made it incompatible with the pipeline's template
render pass (dependencyCount 1 vs 0) → VUID-vkCmdDrawIndexed-renderPass-
02684. The GAL builds the render pass in two sites — the executing one
(render_pass.zig) and the pipeline-compatibility template (pipeline.zig)
— so the dependency must be identical in both. Add the same
SUBPASS_EXTERNAL->0 color-attachment-output dependency to the template
(used only for compatibility; harmless). Completes the render-pass
external-dependency fix across both sites.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E6 done: vk_blit consolidated onto the GAL (1133 -> ~280 lines, main.zig
zero-diff). Validated on lavapipe with validation + sync validation:
(a) swapchain colorspace VUID absent (fixed at source in the GAL
swapchain); (b) present-semaphore reuse VUID-00067 fixed by a per-image
render_finished semaphore (hardware-validated, not lavapipe-observable);
(c) render-pass WRITE_AFTER_READ SYNC-HAZARD absent (external subpass
dependency added to both GAL render-pass sites). CI fully green; the GAL
render path is consolidated + validation-clean ahead of the E7 freeze.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Read-only cross-module audit found 11/12 interfaces carry fix_before_freeze contract defects (13 blocker + 6 minor), all verified against code. EventBus + IPC clean. No markers applied; fix decisions are Guy's per the E7 STOP rule. Awaiting rulings.
Guy ruled fix-then-freeze (3 blocks). Records the borne expansion for blocks 1+2, the 5 meta-decisions (C1 Allocator out -> 8 Tier-0, C2 rename, C3 unify resources, C4 logical KeyCode), the M0.2-freeze-nominal authorization, and the phase learning (freeze = audit + marker, never marker alone).
B1/B2: dispatch/dispatchBatch returned void and guarded with std.debug.assert (compiled out in ReleaseFast -> OOB write on the C0.1 1M path). Now SchedulerError!void returning error.TooManyChunks (the contract the doc already promised); start pinned to SchedulerError!void; SchedulerError doc corrected. 6 call-sites updated to try.
B8: loadPlugin returned &items[len-1] into an appending ArrayListUnmanaged; a later load reallocated and dangled prior *PluginHandle, contradicting the documented stability promise. Storage is now ArrayListUnmanaged(*PluginHandle) with heap-boxed handles.
B6/B7: ResourceError dropped dead RegistrationFailed, added StaleEntityHandle (removeResource propagates it via despawn). setResource/removeResource pinned to ResourceError!void; setResource maps World-internal errors (e.g. DuplicateComponent) to EcsError, OutOfMemory passes through.
B5: registry.zig claimed the gpa field is 'kept private' — Zig has no field-level visibility. Doc now states gpa/types/name_index are implementation detail, NOT part of the frozen contract (consumers use the Registry methods); internal container types may change Phase-1+.
B10/B13: destroyCommandEncoder (mandatory, every consumer calls it) and getSwapchainImageCount (E6, real on Vulkan) were absent from interface.required_methods, so checkBackend did not enforce cross-backend parity. Both added to required_methods + TestShape; getSwapchainImageCount mirrored on the Null backend.
M4: window.zig documented a non-existent getDimensions() and omitted close(). M5: Loader.load/reload error surfaces documented (inferred sets, version-bump policy). M6: ImageCopyTexture/ImageCopyBuffer docs made direction-neutral (E4 copyBufferToTexture reuses them with swapped roles).
Six BLOC 1 commits (B1/B2, B8, B6/B7, B5, B10/B13, M4/M5/M6). 693/711 tests pass; the lone failure is the pre-existing macOS-local bindgen-verify drift (CI/Linux is authority).
C2: get_mut was the only snake_case method on the otherwise-camelCase World surface (engine-zig-conventions.md:21). Surgical rename — def + the real Zig call-sites (resources/api.zig, tests/ecs/change_detection.zig) + doc refs. The Etch-language get_mut accessor and the C-ABI component_get_mut/resource_get_mut are separate-convention surfaces and stay snake_case.
B9 (folded into C2): spawn_process/wait_nonblock/is_alive were snake_case against the camelCase platform surface. Renamed to spawnProcess/waitNonblock/isAlive + all callers, via word-boundary rename to avoid the entity_is_alive C-ABI substring (which stays snake_case by C convention).
C1 Allocator out of scope (9->8 Tier-0, no code). C2 rename done (getMut + process camelCase). CI run 27509968374 green for BLOC 1. C2 gate evidence: C-API + Etch AST decoupled from the rename.
C3 (remove legacy ResourceStore, unify on M0.2) is infeasible as freeze-prep: the byte-keyed ResourceStore is the load-bearing Etch resource backend (codegen emits byte-keyed access, tree-walking interpreter needs runtime ComponentId access M0.2's comptime-typed API cannot serve). Cas-2 STOP; options A/B/C, A recommended (clarify two roles, defer true unification to Phase 1).
C3: the byte-keyed ResourceStore (Etch runtime backend) and the M0.2 singleton system are two models for two consumers, not duplication. Replaced world.zig's misleading 'independent until a later milestone unifies them' comment with the documented two-roles rationale; both freeze at block 3; true unification is Phase-1.
B11: pressed[] is raw-scancode-indexed, not KeyCode-indexed (the two don't share a codomain). Doc now says so; logical-key input is window.Event.code (the frozen KeyCode contract); logical steady-state querying is the Phase-1 Tier-1 mapping layer. The event already carries scancode beside code, so no struct change.
B12: linux_evdev.pollAllSlots gains a gpa parameter mirroring win32_xinput.pollAllSlots so a cross-OS mainloop binds one signature. No callers — frozen-signature alignment only; the Linux stub ignores gpa.
M5: load -> (std.Io.ConcurrentError || LoadError || FinishError)!AssetHandle; reload -> (LoadError || error{StaleHandle})!void. An inferred error set on a to-be-frozen interface can silently widen via a callee (the B6 defect); pinning makes the freeze meaningful.
C3 M3-ruling retracted (Option A: clarify, freeze both). C4/B11 doc, B12 arity, M5 pin done. BLOC 2 complete.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant