CI 2.0 — catalog-driven, driver-sliceable test architecture (pytest)#375
Open
tinebp wants to merge 14 commits into
Open
CI 2.0 — catalog-driven, driver-sliceable test architecture (pytest)#375tinebp wants to merge 14 commits into
tinebp wants to merge 14 commits into
Conversation
c2daeed to
2d04263
Compare
Vortex tests become declarative YAML data run by pytest, replacing the
imperative, driver-pinned bash in ci/regression.sh (1382 lines, 401 driver-
pinned invocations). blackbox.sh stays the untouched executor. The driver
slice the whole effort was chasing is now just `pytest ci -m "simx"`.
Engine (ci/) — the conventional pytest layout, no config file:
- testcase.py model + planner CLI (lint | matrix | select); no pytest dep
- conftest.py hooks/fixtures: markers registered dynamically from the data,
parametrize + ambient-XLEN filter, build-once sim_build fixture
- test_runner.py the single test_case (shells out; needs-provisioning skip)
- testcases/*.yaml all 29 categories, 388 cases (22 transcribed from the bash;
7 script/build categories via via:script -> legacy)
Workflow (.github/)
- workflows/ci.yml catalog-driven: plan(testcase.py matrix by event)
-> build(per xlen) -> tests(pytest ci -m per cell,
JUnit) -> complete
- actions/setup-vortex composite action (profile-scoped cache + deps + pip)
- workflows/apptainer-ci.yml separate minimal env-smoke: composite action +
in-container `pytest ci -m "regression and simx"`,
weekly offset (Wed) + container-path triggers
configure: copies ci/testcases/ into the build tree (harness .py + conftest
ride the existing ci/ copy). Design: docs/designs/continuous_integration.md.
Markers register dynamically in conftest.py (--strict-markers catches -m typos);
test_runner.py is auto-discovered by the test_ prefix; the run passes `ci` as
the path — so no pyproject.toml/pytest.ini is needed.
Validated (structural, no toolchain): lint OK (388 cases / 29 categories);
pytest collects 387 (cupbop xlen64-only); marker slicing correct (push 40 /
PR 74 / schedule 119 cells); all workflow YAML valid. Real sim execution /
parity runs on CI.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The divide-free per-thread coordinate ripple (lane 0 = warp base, each later lane steps +1 along X with single wrap into Y/Z) was a single combinational chain feeding the cta_warp_ram write -- 37 logic levels that cannot close at 300 MHz once the launch grid is a real runtime value (it was only hidden in the core unit-test DUT, which constant-folds the grid). Pipeline the ripple TID_STEP lanes per cycle. CTA dispatch is infrequent and cta_warp_ram is read many cycles after a warp launches, so the added write latency is hidden. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the per-CTA/per-warp context tables (cta_ctx_ram, cta_warp_ram), the divide-free thread-coordinate expansion pipeline, the wid->cta_id map, and the CTA-CSR read-back out of VX_scheduler and into VX_cta_dispatch, so all CTA launch and context state has a single owner. VX_scheduler keeps only the launch handshake (warp activation + mscratch latch via cta_param) and wires the dispatcher read-back into sched_csr_if. cta_csrs is demoted to an internal signal; a narrow cta_param output replaces it on the module boundary. Pure structural change: same RAMs, same pipeline depth, same launch handshake; no functional/timing/IPC change. rtlsim vecadd/sgemm/sgemm_tcu_wg pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop needs:-based auto-skip from the pytest runner: a missing dependency is now a real (red) failure, not a silent skip, and every build warning escalated to an error stays a failure. Clean the sim build dir before each new CONFIGS so a stale Verilator obj_dir can't produce spurious lint errors. Update the CI 2.0 design doc to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 36956fc3f3e453b2543295e69685ad6fd8900d27)
Add VX_fanout_buffer (a combinational counterpart to VX_reset_relay) plus FANOUT_BUFFER/FANOUT_BUFFER_EX macros, and use them to give each FMA/div/sqrt IP its own preserved clock-enable copy across every backend (Quartus en, Vivado aclken, RTL enable). This keeps the high-fanout enable as local distributed routing instead of being merged onto a single global buffer, which on the U55C was stretched into a congested cross-die path at NT16. Also refactors the FMA is_d selector in VX_fpu_std to use VX_shift_register. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 4572fa5f452769c7c40777f500fb9f3aa79b6851)
Model per-FU dispatch-queue back-pressure with credits: a credit is spent when a uop issues into operand collection and returned when the FU accepts it, so warp suppression now counts in-flight ops still in operand collection (matching the RTL scoreboard) instead of only what already reached the queue. Size the per-FU dispatch queues by VX_CFG_DISPATCH_QUEUE_SIZE rather than a hardcoded 2, and wire that depth into the dispatcher's output channels (the buf_size arg was stored but unused; the dead member is removed). Also clarify comments only in cache.cpp (dirty_mask), opae_sim.cpp and xrt_sim.cpp (host-priority backoff) -- no behavior change in those. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 21359c6f55842beea757ffeda194a4cee9e2c20c)
The K-major (transposing) DXA load enumerated one GMEM read PER ELEMENT, re-reading the same cache line up to 8x (the model even counted the waste as gmem_dedup). Coalesce the read span to the cache line, matching the RTL addr_gen which reads per line: one cache-line read fans out to its scattered SMEM destinations. On the write side, gather the scattered K-major elements that land in the same LMEM block into one byte-masked block write per beat (the per-core LMEM port accepts a full block/cycle, banked), so the engine drains at SMEM bandwidth instead of one element per beat. This models achievable write bandwidth ahead of the current RTL smem_wr (1 elem/beat) -- a known SimX-ahead-of-RTL gap to be matched on the RTL side. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit ae97cef300dfd5845c258f5e3589c1983399039f)
…platforms Migrate the OpenCL stack from PoCL-as-libOpenCL (direct-linked) to PoCL built ICD-only, so the system ocl-icd loader discovers the Vortex platform via a vendor .icd and Vortex can run alongside other OpenCL platforms (resolves the ICD-mode request). Device drivers are linked statically into libpocl and the install tree is relocatable. - docs/building_toolchain.md: PoCL recipe now ENABLE_ICD=ON, POCL_ICD_ABSOLUTE_PATH=OFF, INSTALL_OPENCL_HEADERS=ON (keeps ENABLE_LOADABLE_DRIVERS=OFF). Drops the manual CL-header copy; documents the ICD layout, static driver, relocatable kernel-lib lookup, and OCL_ICD_VENDORS. - ci/toolchain_install.sh.in: after extracting the PoCL bundle, regenerate the vendor .icd to the relocated libpocl path. - tests/opencl/common.mk: link the system ocl-icd loader (-lOpenCL) and set OCL_ICD_VENDORS at run time; pin OCL_ICD_LIB_DIR ahead of any other vendor loader (e.g. CUDA). Validated end-to-end: vecadd run-simx PASSED through the loader against a static ICD install. - tests/hip/common.mk: remove the LD_PRELOAD=libOpenCL.so shim (no longer needed now that the loader sees PoCL via the .icd); discover Vortex via OCL_ICD_VENDORS. chipStar already links the system loader. Note: shipping this requires rebuilding/re-hosting the prebuilt PoCL bundle ICD-only; the local changes take effect once that bundle is in place. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit e98d908ef1053cae78cdb7c2e1f84ee60d23afa0)
…sudo Support the portable, cross-loader registration path for real deployments while keeping the test harness/CI sudo-free. - ci/register_icd.sh: optional helper (run by the user with sudo) that installs/removes /etc/OpenCL/vendors/pocl-vortex.icd pointing at the relocated libpocl. Standard /etc/OpenCL/vendors convention -> works with both ocl-icd and the Khronos loader, and lets any app discover Vortex alongside other platforms with no per-process env var. Not invoked by CI. - docs/building_toolchain.md: document the two paths -- per-user OCL_ICD_VENDORS (no sudo, ocl-icd-specific, used by the harness/CI) vs. system-wide sudo registration (portable, recommended for deployment). Notes that OCL_ICD_VENDORS is an ocl-icd extension, not OpenCL-spec, and replaces the system vendor scan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit ac16321ced84e8a9ae3a8b2e07cf5efa54883ae9)
detect_osversion() only recognized Ubuntu and CentOS 7, so RHEL-family hosts (e.g. RHEL 8.10 on CRNCH Rogues-Gallery FPGA nodes) fell through to "unsupported" and configure aborted before generating config.mk. Map the RHEL family (rhel/redhat/rocky/almalinux) and CentOS Stream 8/9 to the centos/7 prebuilt bundle, whose glibc 2.17 binaries run on these newer glibc releases. The --osversion override remains available. Verified detect_osversion against synthetic os-release files for RHEL 8.10, Rocky 9.3, AlmaLinux 8.9 and CentOS Stream 8 (all -> centos/7), with Ubuntu/CentOS-7 detection and the unsupported fallback unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit ee6147ba4a388ba6accb73d13f0bbe3f1b9b37b5)
The OPAE flow targets discontinued Intel PAC cards (Arria 10 / Stratix 10), depends on Intel-supplied platform files (e.g. platform_if.vh from the OPAE PIM), and is no longer maintained or CI-tested, so its platform/memory config can be broken on current toolchains. Add a deprecation banner pointing users to the supported Xilinx Alveo / XRT flow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit 5e2ee808a5b544e6e6a6c01c738f73fde745d63a)
The hardcoded per-core PERF/IPC examples in simulation.md were taken from an older microarchitecture and could not be reproduced by users (reported IPC was ~2x lower), and recent simx revisions print a single aggregate PERF line rather than the per-core breakdown shown. Add a note that the instruction/cycle/IPC figures are illustrative and depend on configuration, input size, and revision, and document the current single-line format, so they are not treated as fixed targets. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> (cherry picked from commit e88f96ba6266acf1b6082cd5f68aa599a5e2f49e)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Introduces CI 2.0: Vortex tests become declarative YAML data run by pytest, replacing the imperative, driver-pinned bash in
ci/regression.sh(1382 lines, 401 driver-pinned invocations).blackbox.shstays the untouched executor. Lands alongside the legacyci.yml(manual-dispatch only) so nothing breaks during migration.Design docs:
docs/proposals/ci_2.0_architecture.md(workflow layer) +docs/proposals/regression_sh_2.0.md(engine).Why
The driver (
simx/rtlsim/xrtsim/opaesim) is hard-coded into every line, so you can't run "simx only" without editing 401 lines — yetrtlsim(~168 runs, the Verilator long pole) dominates cost. With CI 2.0 the driver slice is justpytest -m "simx"; push runs simx, PR adds rtlsim, nightly runs everything.What's here
Engine
ci/vxcatalog.py— catalog core (load/expand/filter/render,(driver,configs)build-key dedup)ci/catalog/*.yaml— all 29 categories, 388 specs (22 extracted from the bash; 7 script/build categories viavia: script)ci/conftest.py/ci/test_vortex.py— pytest harness (one marker per value, ambient-XLEN filter, build-oncesim_buildfixture, needs-provisioning skip)ci/catalog_query.py— planner (matrix/select/lint)ci/extract_catalog.py— drafts catalogs fromregression.sh.inci/run-tests— friendly wrapper → pytest marker expressionsWorkflow
.github/workflows/ci-v2.yml—plan(catalog_query) → build(per xlen) → tests(pytest -m per cell, JUnit) → complete.github/actions/setup-vortex— composite action (profile-scoped cache + deps)configure— copies the catalog + pytest config into the build treeValidation
Structural (no toolchain): catalog lint OK (388 specs / 29 categories); pytest collects 387 (cupbop is xlen-64-only); marker slicing correct (push 40 / PR 74 / schedule 119 cells). One smoke run executed 4 simx
amospecs end-to-end green (both blackbox + make-run styles). Real per-category sim execution / parity-vs-legacy runs on CI.Migration status
22 categories are extractor-drafts (faithful to the bash, worth a review); 7 script/build categories (
unittest,synthesis,vector,dtm,sst,gem5,cupbop) currently delegate to legacy viavia: script— native per-spec migration is Phase D.ci-v2.ymlis manual-dispatch until validated on a runner.🤖 Generated with Claude Code