Phase 0 / Closure / Vertical slice + Phase 0 closure#24
Draft
guysenpai wants to merge 60 commits into
Draft
Conversation
copyBufferToTexture is a Phase-0 no-op on the Vulkan GAL backend (command_encoder.zig:85) — the only public GAL texture-upload path. The texture-asset -> GPU-sampled seam is therefore unwired; fixing it touches src/ (outside the E4 borne). Narrow block: mesh+camera+depth+ instancing+input and a mesh asset (copyBufferToBuffer is implemented) are fully wired. Recommended Option A: pivot the cooked asset to a mesh, render mesh+camera, document GPU texture upload as Phase-1. Stopped before writing E4 code; awaiting ruling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
copyBufferToTexture marked RESOLVED (ruling Option B / corrected model, implemented as the copyTextureToBuffer mirror). New blocker E4 #2: RenderPassEncoder.setBindGroup is a "Phase 1+" no-op and cmdBindDescriptorSets is unwired in the GAL, so the slice's textured/ camera draw binds nothing to set 0 (VUID-08114 + blank frame). Same class as copyBufferToTexture; proposed ruling: implement cmdBindDescriptorSets + track the pipeline layout in setPipeline. Push held to avoid burning a CI cycle. The trivial createBindGroup [*]T->*const T type-typo was fixed inline (cardinal-rule carve-out). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The vertical slice (M0.9 E4) is the first GAL consumer to render with real resources, surfacing two primitives the triangle (empty bind group, no texture upload) never exercised — both stubbed "Phase 1+" in M0.4: - copyBufferToTexture: implemented as the exact mirror of its neighbour copyTextureToBuffer (NOT vk_blit's raw path). The buffer->image asymmetry (destination born UNDEFINED, no render pass to carry a final_layout, no encoder-level barrier on the public surface) forces internal transitions undefined->transfer_dst then transfer_dst-> shader_read; documented as a deliberate divergence. WebGPU-symmetric signature (zero callers existed); null stub matched. - RenderPassEncoder.setBindGroup: was a no-op (cmdBindDescriptorSets was unwired GAL-wide). setPipeline now records the bound pipeline's layout in current_pipeline_layout; setBindGroup binds against it, and SKIPS cleanly when no pipeline was bound first (no unreachable, no implicit "last pipeline is current"). - bind_group createGroup: fix the [*]T->*const T type bug on p_pool_sizes (never Vulkan-compiled before), and harden the unused WriteDescriptorSet union pointers — point them at valid zero-initialized dummies instead of undefined, so no garbage address reaches a defensively-inspecting validation layer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend the M0.9 headless slice (E3) into a Vulkan forward render of the live ECS scene, exercising the render/asset/input bricks end-to-end: - render.zig: GAL forward pass — one cube mesh INSTANCED once per entity at the entity's live Position, shaded by the cooked albedo texture (uploaded via copyBufferToTexture), under a perspective camera with depth. Generic over the device type (Vulkan / Null). runInteractive (window+swapchain) / runSmoke (offscreen+PPM, headless). - sim.zig: pure 60 Hz sim — grid layout + per-entity velocities, readPosition, and SPACE-toggles-pause via the normalized KeyCode (the InputRawState array is raw-scancode-indexed in Phase 0). - cook_assets.zig + assets/slice_albedo.png: cook the source PNG through the real M0.6 pipeline to a .texture.bin, loaded at runtime via Loader. - shaders/slice.vert|frag (+ committed SPIR-V), math.zig (Vulkan MVP). - build.zig: slice module (render+asset deps), asset cook+install, run-vertical-slice / cook-vertical-slice-assets steps. - integration test (4 facets): sim 100/120, E2-B cross-file validation, M0.6 cook+load, input->pause. Render is not asserted headless (the Null backend leaves mapBuffer Unsupported); coverage = compile + lavapipe. - ci.yml: vertical-slice-smoke job — offscreen render on lavapipe (Debug, validation layers active), asserts a frame composes + zero VUID. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The vertical-slice-smoke job failed at apt install: Ubuntu 24.04 names the Khronos validation layers package `vulkan-validationlayers` (no hyphen), not `vulkan-validation-layers`. Try both names and fail loudly if the layer manifest is absent afterwards — the smoke is only meaningful with validation active. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The slice smoke panicked with `integer overflow` at `n * @sizeof(Instance)` in setInstancesFromWorld on lavapipe — its first ever execution (no macOS/headless or test path runs the render). With n = @min(entity_count, max_instances) <= 100 the multiply cannot overflow u32 from the source alone, so a runtime value differs from the code. Derive the instance capacity from the REAL mapped buffer length (out of the device's buffer registry, not self.max_instances), do the arithmetic in usize, and clamp the write to that capacity — no u32 multiply, never writes past the buffer, robust regardless. A one-shot log captures n / entity_count / max_instances / mapped_len / capacity to pin the root cause in the same CI cycle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The one-shot diagnostic confirmed the values were correct (n=100, max=100, mapped_len=1200) — not memory corruption — and the lavapipe smoke is validation-clean. Drop the verbose log; keep a triple clamp (entity_count, configured max, real buffer capacity) so max_instances stays meaningful and the write can never overflow or exceed the buffer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E4 done. setBindGroup implemented (cmdBindDescriptorSets + pipeline- layout tracking with a null-layout guard); unused WriteDescriptorSet pointers hardened to zero-init dummies; the slice integer-overflow was diagnosed as not-corruption and fixed robustly. CI fully green incl. vertical-slice-smoke on lavapipe (validation layers active, frame composed, zero VUID). Blocker E4 #2 marked RESOLVED. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E6 stage added to M0.9 (Claude.ai/Guy decision): rewrite the editor's viewport blit (src/editor/vk_blit.zig) to render via gal.Device instead of raw Vulkan, consuming the E4 copyBufferToTexture, and resorb the two M0.5 sync bugs (BT709 colorspace without extension; binary present semaphore reuse, VUID-vkQueueSubmit-pSignalSemaphores-00067) if they do not vanish via the GAL's per-image sync. After E5, before the freeze. 2nd src/ write authorized in M0.9, bounded to vk_blit.zig + its test. Renumber: E6 = vk_blit consolidation, E7 = freeze, E8 = closure. (Title shortened from the suggested wording to fit weld_lint's 72-char limit; the freeze→E7 closure→E8 mapping is preserved here + in the Acted deviations entry.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E5/C0.8 (edit -> live world -> visible in viewport) rests on three unwired seams in read-only src/runtime/src/editor: the runtime has no ECS world and modify_component is a no-op echo ack (runtime/main.zig:272; messages.zig:199); the runtime renders a stateless renderMire test pattern, not its world (main.zig:186, :76), so no edit can change a viewport pixel; the editor never sends ModifyComponent. M0.7/S6 shipped the IPC as transport + echo stub. 4th same-pattern blocker; structurally the heaviest (a runtime World + world-to-viewport renderer = Phase-0.6 scale). Recommended Option B: close the protocol + in-process loop, defer the visual clause to Phase-0.6 (mirrors E3 Level-C). Stopped before any code; awaiting ruling. vk_blit VUID observation (E6 input) is independent + read-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Misrouting error, not a 4th blocker (Claude.ai ruling). C0.4 != C0.8: the C0.4 IPC protocol round-trip was satisfied by M0.7 (renderMire + echo-ack are its legitimate transport stubs); C0.8 is the vertical slice, and the slice already IS the C0.8 runtime (live World from E3 + world->viewport renderer from E4). RULING: the C0.8 loop lives in examples/vertical_slice/, no src/ touched — an editor-stub thread sends a real ModifyComponent over the M0.7 transport, the slice decodes + applies via the field_offset/new_value (diff_runner) pattern, the E4 renderer reflects it. Real-blocker count corrected to 2 (#1 Phase-1 instantiation, #2 Phase-1 cross-file import). E6/vk_blit unaffected: the C0.8 loop uses the slice's own GAL viewport, not src/editor/vk_blit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E5 closes C0.8 in the slice (the slice IS the C0.8 runtime — World from E3 + renderer from E4), not in src/runtime (whose renderMire/echo-ack are the C0.4 transport stubs). No src/ touched. - ipc_loop.zig: an editor-stub thread sends a real ModifyComponent over the real M0.7 transport (AF_UNIX socket + framing); the slice's runtime-side client decodes it and applies it to the live World via the diff_runner write path (field_offset + new_value -> the component slot, field size resolved from the registry). Socket-only, so the semantic loop runs headless on every platform. - integration test (5th facet): a ModifyComponent over M0.7 changes Position.x of entity 0 on the live World, leaving Position.y untouched — the C0.8 semantic loop asserted end-to-end. - main.zig --ipc-edit: drive one edit then render the post-edit world (the C0.8 visual reflection on hardware/lavapipe). - ci.yml: a C0.8 --ipc-edit lavapipe step (edit -> world -> frame composes, validation clean) + a non-failing vk_blit VUID observation via run-ipc-demo under weston+syncval (the E6 input: E5 observes, E6 fixes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E5 done: the C0.8 component-edit-over-IPC -> live world -> visible loop is closed in the slice (no src/ touched). Integration test 5/5; CI fully green incl. the vertical-slice-smoke C0.8 --ipc-edit step on lavapipe. E6 input captured: the vk_blit swapchain colorspace VUID (VkSwapchainCreateInfoKHR-imageColorSpace-parameter) FIRES on lavapipe; the present-semaphore reuse VUID-00067 was not observed under syncval (latent, likely hardware-only). README updated for the E5 IPC loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bug (a) colorspace is a GAL bug (swapchain.zig:113 blind-copies fmt.color_space; format selection filters pixel format only; conv.zig has no colorspace map) — latent until vk_blit's rewrite onto gal.createSwapchain makes it the only owner. Authorized E6 file grid (extends the FROZEN "Files to create or modify"): vk_blit.zig + test; gal/vulkan/swapchain.zig + conv.zig (colorspace at its source — prefer srgb_nonlinear core, else force it); assets/shaders/viewport_blit.frag.glsl + .spv (separate descriptors, the E4 convention); a 1-line gal getSwapchainImageCount (for the per-image present-semaphore fix). The swapchain colorspace code will be FROZEN by E7 (C0.5) — get the contract right, not just the VUID. main.zig stays zero-diff. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The GAL swapchain blind-copied the surface's first-reported colorspace (swapchain.zig: .image_color_space = fmt.color_space; format selection matched the pixel format only) — on drivers that report an extended *_EXT colorspace (e.g. lavapipe, value 1000104013) this emits an imageColorSpace the swapchain create-info rejects without VK_EXT_swapchain_colorspace (VUID-VkSwapchainCreateInfoKHR-imageColorSpace- parameter). It was latent because the E4 slice matched a core srgb_nonlinear format. Now the swapchain always presents in the core srgb_nonlinear colorspace (conv.colorSpace), selecting a surface (format, colorspace) pair carrying it (preferred pixel format, then BGRA8, then any srgb_nonlinear pair). Also add getSwapchainImageCount so callers can size per-image present semaphores. Fixed at the source — every GAL swapchain consumer benefits. (Frozen by E7 / C0.5: contract-level fix.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite src/editor/vk_blit.zig to drive the public GAL (gal.Device) instead of raw Vulkan — the S6 hand-rolled instance/device/swapchain/ descriptor/command plumbing (vk.device_dispatch.*) is gone, replaced by GAL calls; the per-frame shm upload uses copyBufferToTexture (E4), whose internal barriers subsume the hand-issued layout transitions. The file survives (the shared viewport stays a Phase-2 need); main.zig is zero-diff (Renderer init/deinit/recreateSwapchain/stageViewport/drawFrame + last_known_size/swapchain_dirty preserved). Resorbs the two M0.5 sync bugs: - (a) colorspace: fixed at its source in the GAL swapchain (prior commit); routing through gal.createSwapchain inherits the fix. The blit frag shader moves to separate descriptors (texture2D + sampler, the E4 GAL convention) since the GAL has no combined sampler2D binding; viewport_blit.frag.spv regenerated, .vert reused, embed.zig unchanged. - (b) present-semaphore reuse: one render_finished semaphore PER swapchain image (via getSwapchainImageCount), indexed by the acquired image_index — never re-signals a binary semaphore pending on a prior present. NOT observable on lavapipe (synchronous software present); verified by construction, hardware-confirmed (no false green). ci.yml: the E5 vk_blit observation step becomes an E6 assertion — the colorspace VUID must be ABSENT on lavapipe; (b) stays a non-failing report. build.zig wires weld_render into the editor module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The GAL render pass set dependency_count=0 — no external subpass dependency synchronizing the render pass's image-layout transition (a color-attachment-output write) with a prior external access, notably a swapchain image just returned by vkAcquireNextImageKHR (present-engine read). Under synchronization validation this is a WRITE_AFTER_READ hazard at vkQueueSubmit. The raw editor blit carried this dependency (SUBPASS_EXTERNAL->0, color_attachment_output, color_attachment_write); the E6 consolidation onto the GAL dropped it. Add it at the source — it benefits every GAL swapchain consumer and is harmless for offscreen targets. ci.yml: the E6 step now asserts SYNC-HAZARD absent under syncval (VUID-00067 stays a hardware-only non-failing report). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The E6 syncval observation surfaced a SYNC-HAZARD-WRITE-AFTER-READ: the GAL render pass omitted the external subpass dependency the raw vk_blit had (dependency_count=0). Borne widened to gal/vulkan/render_pass.zig to add it at the source (frozen by E7). Test: run-ipc-demo under syncval on lavapipe asserts SYNC-HAZARD absent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adding the external subpass dependency to the executing render pass (render_pass.zig) made it incompatible with the pipeline's template render pass (dependencyCount 1 vs 0) → VUID-vkCmdDrawIndexed-renderPass- 02684. The GAL builds the render pass in two sites — the executing one (render_pass.zig) and the pipeline-compatibility template (pipeline.zig) — so the dependency must be identical in both. Add the same SUBPASS_EXTERNAL->0 color-attachment-output dependency to the template (used only for compatibility; harmless). Completes the render-pass external-dependency fix across both sites. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
E6 done: vk_blit consolidated onto the GAL (1133 -> ~280 lines, main.zig zero-diff). Validated on lavapipe with validation + sync validation: (a) swapchain colorspace VUID absent (fixed at source in the GAL swapchain); (b) present-semaphore reuse VUID-00067 fixed by a per-image render_finished semaphore (hardware-validated, not lavapipe-observable); (c) render-pass WRITE_AFTER_READ SYNC-HAZARD absent (external subpass dependency added to both GAL render-pass sites). CI fully green; the GAL render path is consolidated + validation-clean ahead of the E7 freeze. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Read-only cross-module audit found 11/12 interfaces carry fix_before_freeze contract defects (13 blocker + 6 minor), all verified against code. EventBus + IPC clean. No markers applied; fix decisions are Guy's per the E7 STOP rule. Awaiting rulings.
Guy ruled fix-then-freeze (3 blocks). Records the borne expansion for blocks 1+2, the 5 meta-decisions (C1 Allocator out -> 8 Tier-0, C2 rename, C3 unify resources, C4 logical KeyCode), the M0.2-freeze-nominal authorization, and the phase learning (freeze = audit + marker, never marker alone).
B1/B2: dispatch/dispatchBatch returned void and guarded with std.debug.assert (compiled out in ReleaseFast -> OOB write on the C0.1 1M path). Now SchedulerError!void returning error.TooManyChunks (the contract the doc already promised); start pinned to SchedulerError!void; SchedulerError doc corrected. 6 call-sites updated to try.
B8: loadPlugin returned &items[len-1] into an appending ArrayListUnmanaged; a later load reallocated and dangled prior *PluginHandle, contradicting the documented stability promise. Storage is now ArrayListUnmanaged(*PluginHandle) with heap-boxed handles.
B6/B7: ResourceError dropped dead RegistrationFailed, added StaleEntityHandle (removeResource propagates it via despawn). setResource/removeResource pinned to ResourceError!void; setResource maps World-internal errors (e.g. DuplicateComponent) to EcsError, OutOfMemory passes through.
B5: registry.zig claimed the gpa field is 'kept private' — Zig has no field-level visibility. Doc now states gpa/types/name_index are implementation detail, NOT part of the frozen contract (consumers use the Registry methods); internal container types may change Phase-1+.
B10/B13: destroyCommandEncoder (mandatory, every consumer calls it) and getSwapchainImageCount (E6, real on Vulkan) were absent from interface.required_methods, so checkBackend did not enforce cross-backend parity. Both added to required_methods + TestShape; getSwapchainImageCount mirrored on the Null backend.
M4: window.zig documented a non-existent getDimensions() and omitted close(). M5: Loader.load/reload error surfaces documented (inferred sets, version-bump policy). M6: ImageCopyTexture/ImageCopyBuffer docs made direction-neutral (E4 copyBufferToTexture reuses them with swapped roles).
Six BLOC 1 commits (B1/B2, B8, B6/B7, B5, B10/B13, M4/M5/M6). 693/711 tests pass; the lone failure is the pre-existing macOS-local bindgen-verify drift (CI/Linux is authority).
C2: get_mut was the only snake_case method on the otherwise-camelCase World surface (engine-zig-conventions.md:21). Surgical rename — def + the real Zig call-sites (resources/api.zig, tests/ecs/change_detection.zig) + doc refs. The Etch-language get_mut accessor and the C-ABI component_get_mut/resource_get_mut are separate-convention surfaces and stay snake_case.
B9 (folded into C2): spawn_process/wait_nonblock/is_alive were snake_case against the camelCase platform surface. Renamed to spawnProcess/waitNonblock/isAlive + all callers, via word-boundary rename to avoid the entity_is_alive C-ABI substring (which stays snake_case by C convention).
C1 Allocator out of scope (9->8 Tier-0, no code). C2 rename done (getMut + process camelCase). CI run 27509968374 green for BLOC 1. C2 gate evidence: C-API + Etch AST decoupled from the rename.
C3 (remove legacy ResourceStore, unify on M0.2) is infeasible as freeze-prep: the byte-keyed ResourceStore is the load-bearing Etch resource backend (codegen emits byte-keyed access, tree-walking interpreter needs runtime ComponentId access M0.2's comptime-typed API cannot serve). Cas-2 STOP; options A/B/C, A recommended (clarify two roles, defer true unification to Phase 1).
C3: the byte-keyed ResourceStore (Etch runtime backend) and the M0.2 singleton system are two models for two consumers, not duplication. Replaced world.zig's misleading 'independent until a later milestone unifies them' comment with the documented two-roles rationale; both freeze at block 3; true unification is Phase-1.
B11: pressed[] is raw-scancode-indexed, not KeyCode-indexed (the two don't share a codomain). Doc now says so; logical-key input is window.Event.code (the frozen KeyCode contract); logical steady-state querying is the Phase-1 Tier-1 mapping layer. The event already carries scancode beside code, so no struct change.
B12: linux_evdev.pollAllSlots gains a gpa parameter mirroring win32_xinput.pollAllSlots so a cross-OS mainloop binds one signature. No callers — frozen-signature alignment only; the Linux stub ignores gpa.
M5: load -> (std.Io.ConcurrentError || LoadError || FinishError)!AssetHandle; reload -> (LoadError || error{StaleHandle})!void. An inferred error set on a to-be-frozen interface can silently widen via a callee (the B6 defect); pinning makes the freeze meaningful.
C3 M3-ruling retracted (Option A: clarify, freeze both). C4/B11 doc, B12 arity, M5 pin done. BLOC 2 complete.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
E1 in review — full description at closure.