Navigate to the HW folder and compile the hardware. Running make will automatically trigger the Vitis HLS/V++ flow and generate the required .xclbin deployment package.
cd HW/
makeOnce the hardware finishes compiling, move to the target Kria board and deploy the bitstream using xmutil:
# Check available firmware slots
sudo xmutil listapps
# Unload the current active app to free up the slot
sudo xmutil unloadapp
# Copy the compiled hardware package into the Xilinx firmware directory
sudo cp HW/package.hw/* /lib/firmware/xilinx/mult-sys-array/
# Load our custom systolic array application into the FPGA
sudo xmutil loadapp mult-sys-arrayNavigate to the SW folder, compile the C++ host application, and execute it:
cd SW/
make
./mmultImportant!
The baseline of the code was taked from Systolic Array example available in Xilinx repository, at 2020.2 branch (here: https://github.com/Xilinx/Vitis_Accel_Examples/tree/2020.2/cpp_kernels/systolic_array)
AI was used in the following way (For build environment and Makefile configuration): Tool: Gemini Type of use: Code adaptation Degree of dependence: Moderate Justification and validation performed: It was used to assist in adapting and customizing the existing Makefiles to properly match our specific project hierarchy and target FPGA platform. The generated build instructions were thoroughly reviewed and manually validated by launching the build process to ensure that all cross-compilation and packaging steps executed successfully.
AI was used in the following way (For C++ syntax debugging and header corrections): Tool: Gemini Type of use: Debugging Degree of dependence: Low Justification and validation performed: It was employed as a rapid troubleshooting tool to identify missing header inclusions and resolve minor syntax or typo errors encountered during the High-Level Synthesis (HLS) compilation phase. All suggested corrections were manually verified by running the HLS compiler to confirm that the codebase compiled flawlessly and the core logic remained intact.
AI was used in the following way (For hierarchical scaling strategy implementation): Tool: Claude Code Type of use: Code adaptation Degree of dependence: Medium Justification and validation performed: It was employed to design and implement a PARALLELISM_FACTOR parameter that replaces the fixed complete array partitioning of the systolic array with a tunable factor-based partitioning across both dimensions of localC, localA, and localB. The suggested changes were manually reviewed to ensure correctness of the pragma syntax and consistency across all affected arrays and loop unroll directives. Functional equivalence was validated by running C-simulation manually and confirming matching results against the software reference implementation.
AI was used in the following way (For C++ HLS testbench creation and general understanding of tiling for amtrix multiply): Tool: Copilot Type of use: Debugging and code adaptation Degree of dependence: Low Justification and validation performed: It was employed to speed up the creation of a simple C++ HLS testbench skeleton suitable for matrix multiplication. Also, related with tiling, it was used just as another reference for understand what is tiling and how it works in matrix multiplication tasks. It was also used to assist debug and understanding of the different syntax errors generated at testbench level, since, as expected, the compile of it didn't work at the first time. Finally, the generated testbench was only used as a initial point, then all changes like ap_int datatype, random initialization, parameterized values and dimensions, to cite a few examples, were manually placed and verified.