Skip to content

issue: close scoreboard issue-stage timing at NT16#372

Merged
tinebp merged 3 commits into
masterfrom
scoreboard-issue-stage-timing
Jun 18, 2026
Merged

issue: close scoreboard issue-stage timing at NT16#372
tinebp merged 3 commits into
masterfrom
scoreboard-issue-stage-timing

Conversation

@tinebp

@tinebp tinebp commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Closes the scoreboard issue-stage critical path at NT=NW=16 on the U55C @ 300 MHz.

What

  • Shortens the scoreboard issue-stage critical path (keeps operands_ready a clean flop; look-ahead/credit folding).
  • Selects the issue arbiter by requester count via VX_generic_arbiter — matrix above 8 requesters, round-robin at or below — so grant logic stays shallow at NW16.
  • Serializes multi-uop FU sequences with a single registered lock mask gating the arbiter requests, replacing the per-FU locked state.
  • Drops the now-unused suppress port from VX_gto_arbiter and its tie-off in VX_generic_arbiter.

Validation

  • Functional: sgemmx, sgemm_tcu, packld, vecadd pass on rtlsim (NT/NW 8 and 16).
  • Timing: with the matrix arbiter the issue-stage binding path leaves the scoreboard entirely; IPC neutral-to-better vs the prior arbiter.

🤖 Generated with Claude Code

tinebp and others added 3 commits June 18, 2026 15:09
Fold per-FU dispatch-queue back-pressure (fu_goingfull) and the FU-lock
serialization into the registered operands_ready, so the arbiter request is
a pure flop; fu_locked_n consults the next-state so it stays bit-exact.
Share in_use_mask between the busy check and operand trace, and replace the
selection with a cyclic arbiter. Add the VX_CFG_DISPATCH_QUEUE_SIZE credit
knob (default 4) with dispatcher/issue-slice fu_release plumbing.

Also: DUT synth-flow fixes (project.tcl/xdc clock knob, scoped
async_bram_patch, xrt Makefile) and coding-guideline doc tweaks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Select the issue arbiter by requester count via VX_generic_arbiter (matrix above 8, round-robin at or below) so grant logic stays shallow at NW16, and serialize multi-uop FU sequences with a single registered lock mask gating the arbiter requests instead of the per-FU locked state. Remove the unused suppress port from VX_gto_arbiter and its tie-off in VX_generic_arbiter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tinebp tinebp merged commit 4dd8d4e into master Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant