Add fixed_capacity_map to cudax by srinivasyadav18 · Pull Request #7705 · NVIDIA/cccl

srinivasyadav18 · 2026-02-18T04:14:14Z

Description

This PR migrates cuCollections static_map into cudax as cuda::experimental::cuco::static_map.

Minimal scope: implements insert, contains, clear, and trivial accessors, with capacity validation provided by make_valid_capacity and is_valid_capacity. Tests mirror the cuCollections layout and use a parameterized matrix covering key type, probing scheme, CG size, and bucket size.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-02-18T04:14:18Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

copy-pr-bot · 2026-05-21T20:46:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…ST_DEVICE_API

coderabbitai · 2026-06-03T20:50:32Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a complete open-addressing hash table infrastructure for CUDA Experimental, comprising device reference operations, grid kernels, host orchestration, and a public static_map container with static/dynamic capacity modes and optional key erasure, plus comprehensive test coverage.

Changes

Open-addressing and static_map port

Layer / File(s)	Summary
Type traits and bitwise comparison `cudax/include/cuda/experimental/__cuco/traits.hpp`, `cudax/include/cuda/experimental/__cuco/__detail/bitwise_compare.cuh`	`is_bitwise_comparable`, `is_tuple_like` traits and aligned `__bitwise_compare` template support bitwise-safe type detection and fast equality paths (4/8-byte specializations via reinterpretation, general `memcmp` fallback).
Prime utilities and capacity rounding `cudax/include/cuda/experimental/__cuco/__detail/prime.hpp`, `cudax/include/cuda/experimental/__cuco/capacity.cuh`	Deterministic 64-bit primality testing via trial division + Miller–Rabin, modular arithmetic with `__int128` fast path, and `make_valid_capacity` rounding for linear/double-hashing with overflow guards.
Probing schemes and iterator base `cudax/include/cuda/experimental/__cuco/__detail/probing_scheme_base.cuh`, `cudax/include/cuda/experimental/__cuco/probing_scheme.cuh`	`__probing_scheme_base<CgSize>` and `__probing_iterator` for bucket traversal; public `linear_probing` and `double_hashing` templates with cooperative-group tile-rank stride distribution.
Sentinel types and kernel utilities `cudax/include/cuda/experimental/__cuco/types.cuh`, `cudax/include/cuda/experimental/__cuco/__detail/types.cuh`, `cudax/include/cuda/experimental/__cuco/__detail/utils.cuh`, `cudax/include/cuda/experimental/__cuco/__detail/utils.hpp`	Strong-type sentinel wrappers (`empty_key`, `empty_value`, `erased_key`), mdspan extent aliases, and grid-launch helpers (global thread ID, grid stride, occupancy sizing, tile-size traits).
Equality wrapper for probing `cudax/include/cuda/experimental/__cuco/__detail/equal_wrapper.cuh`	Combines `__bitwise_compare` sentinel checks with key equality, returning three-way results and branching on insert vs. query mode for duplicate control.
Slot storage and device reference core `cudax/include/cuda/experimental/__cuco/__open_addressing/slot_storage_ref.cuh`, `cudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_ref_impl.cuh`	`__slot_storage_ref` non-owning bucket view and `__open_addressing_ref_impl` device-side operations (probing, CAS-based insert with `packed_cas`/`back_to_back_cas`/`cas_dependent_write` dispatch, contains, cooperative-group variants).
Grid kernels for bulk operations `cudax/include/cuda/experimental/__cuco/__open_addressing/kernels.cuh`	Grid-stride conditional `__insert_if_n`, `__fill`, and `__contains_if_n` kernels with `_CgSize==1` direct vs. `_CgSize!=1` tiled cooperative execution paths.
Host orchestration and memory `cudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_impl.cuh`	Device-allocated slot buffer, async/sync clear/insert/contains with stream refs, device counter for success counting, bucket-count computation from capacity or load factor.
Public static_map container `cudax/include/cuda/experimental/__cuco/static_map.cuh`, `cudax/include/cuda/experimental/__cuco/static_map_ref.cuh`	SFINAE-selected constructors for static/dynamic capacity and erasure modes; `clear`, `insert`, `contains` forwarding; device-side `static_map_ref` with trivially-copyable ref semantics.
Capacity, insert, and sentinel tests `cudax/test/cuco/static_map/test_capacity.cu`, `cudax/test/cuco/static_map/test_insert_and_contains.cu`, `cudax/test/cuco/static_map/test_key_sentinel.cu`, `cudax/test/cuco/static_map/test_shared_memory.cu`, `cudax/test/cuco/utility/test_capacity.cu`, `cudax/test/CMakeLists.txt`	Validates dynamic/static capacity computation, insert/contains workflows, shared-memory sizing via `capacity_v`, sentinel handling, and load-factor rounding; updates `strong_type.cuh` documentation.

Assessment against linked issues

Objective	Addressed	Explanation
Port OpenAddressing [`#7463`]	✅
Port `static_map` [`#7463`]	✅

Suggested labels

cudax

Suggested reviewers

andralex
pciolkosz
gevtushenko

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 18

🧹 Nitpick comments (3)

cudax/include/cuda/experimental/__cuco/__detail/extent.cuh (1)

131-148: ⚡ Quick win

suggestion: Mark these header variable templates inline. They are namespace-scope constexpr definitions in a header, and the local CCCL rule requires the explicit inline spelling for this pattern.

As per coding guidelines, "All constexpr variables at namespace/global scope must use inline, including template variables."

cudax/include/cuda/experimental/__cuco/probing_scheme.cuh (1)

24-31: ⚡ Quick win

suggestion: Wrap this header with the standard CCCL prologue/epilogue pair. The file enters code directly after its includes and never closes with #include <cuda/std/__cccl/epilogue.h>, unlike the other new headers in this cohort.

As per coding guidelines, "The last included header before code must be #include <cuda/std/__cccl/prologue.h>, and #include <cuda/std/__cccl/epilogue.h> must be at the end of a file."

Also applies to: 264-264

cudax/include/cuda/experimental/__cuco/__static_map/kernels.cuh (1)

119-235: suggestion: Please attach benchmark results for this fast path before merge. This shared-memory kernel adds a new execution path and tuning heuristic, so we need the perf numbers that justify it on the supported toolchains and architectures. As per coding guidelines, "Do not commit SASS code changes without running benchmarks to check for performance regressions."

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d2eb011c-f333-4929-a09b-f09102640ec3

📥 Commits

Reviewing files that changed from the base of the PR and between 75c7b14 and 134736e.

📒 Files selected for processing (22)

cudax/include/cuda/experimental/__cuco/__detail/bitwise_compare.cuh
cudax/include/cuda/experimental/__cuco/__detail/equal_wrapper.cuh
cudax/include/cuda/experimental/__cuco/__detail/extent.cuh
cudax/include/cuda/experimental/__cuco/__detail/prime.hpp
cudax/include/cuda/experimental/__cuco/__detail/probing_scheme_base.cuh
cudax/include/cuda/experimental/__cuco/__detail/types.cuh
cudax/include/cuda/experimental/__cuco/__detail/utils.cuh
cudax/include/cuda/experimental/__cuco/__detail/utils.hpp
cudax/include/cuda/experimental/__cuco/__open_addressing/functors.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/kernels.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_impl.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_ref_impl.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/slot_storage_ref.cuh
cudax/include/cuda/experimental/__cuco/__open_addressing/types.cuh
cudax/include/cuda/experimental/__cuco/__static_map/kernels.cuh
cudax/include/cuda/experimental/__cuco/__utility/strong_type.cuh
cudax/include/cuda/experimental/__cuco/probing_scheme.cuh
cudax/include/cuda/experimental/__cuco/static_map.cuh
cudax/include/cuda/experimental/__cuco/static_map_ref.cuh
cudax/include/cuda/experimental/__cuco/traits.hpp
cudax/test/CMakeLists.txt
cudax/test/cuco/static_map/test_static_map.cu

…t, drop stray constexpr)

…ressing ref

…addressing ref

…xtent include

…shing struct

…l wrapper

…ail/utility

PointKernel

Ready for another round of review

PointKernel · 2026-06-23T23:04:53Z

+        const auto __src_lane = __ffs(__group_contains_available) - 1;
+        auto __status         = __insert_result::__continue;
+        if (__group.thread_rank() == __src_lane)
+        {
+          __status =
+            __attempt_insert(__get_slot_ptr(*__probing_iter, __intra_bucket_index), empty_slot_sentinel(), __val);
+        }


Good to know. I ran some tests locally with Claude Code and didn't observe any noticeable impact, but I'm happy to revisit this if needed.

PointKernel · 2026-06-23T23:38:44Z

+  using __impl_type = ::cuda::experimental::cuco::__open_addressing::
+    __open_addressing_impl<_Key, value_type, _Scope, _KeyEqual, _ProbingScheme, _BucketSize, _MemoryResource>;
+
+  ::cuda::std::unique_ptr<__impl_type> __impl;


PImpl is intentional here. While the Core Guidelines primarily discuss PImpl in the context of ABI stability, it is also a well-established technique for reducing header dependencies and isolating implementation details

PointKernel · 2026-06-24T00:26:51Z

+  {
+    using __size_type        = typename _Capacity::index_type;
+    using __step_extent      = ::cuda::std::extents<__size_type, _BucketSize>;
+    const __size_type __init = __hash(__probe_key) % (__cap.extent(0) / _BucketSize) * _BucketSize;


round_down(extent, B) isn't equivalent. The current (__hash % (extent / _BucketSize)) * _BucketSize picks a bucket (hash % num_buckets) then scales to its first slot, so the start is bucket-aligned.

PointKernel · 2026-06-24T00:26:52Z

+    using __step_extent = ::cuda::std::extents<__size_type, ::cuda::std::dynamic_extent>;
+    return ::cuda::experimental::cuco::__detail::__probing_iterator<_Capacity, __step_extent>{
+      __size_type{__hash1(__probe_key)} % (__cap.extent(0) / _BucketSize) * _BucketSize,
+      __step_extent{__size_type{(__hash2(__probe_key) % (__cap.extent(0) / _BucketSize - 1) + 1) * _BucketSize}},


Not needed: extent(0) (the slot capacity) is always a multiple of the probe stride (cg_size * _BucketSize), so extent / _BucketSize is exact.

PointKernel

Looks good from my end

github-project-automation Bot added this to CCCL Feb 18, 2026

github-project-automation Bot moved this to Todo in CCCL Feb 18, 2026

cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL Feb 18, 2026

PointKernel self-requested a review February 18, 2026 20:10

PointKernel requested changes Feb 18, 2026

View reviewed changes

srinivasyadav18 force-pushed the cuco_static_map branch from 0edb761 to 65368db Compare April 15, 2026 22:53

srinivasyadav18 added 5 commits April 22, 2026 17:30

initial migration of OA and static_map

9a6f714

cleanups

874285c

temporary WAR for call to ~buffer from cuda::counting_iterator[]

a1337e9

use shared memory buffer flushing in kernels;cleanups

f2820a6

refactor extent's to use size_type like std::span

b0d0702

srinivasyadav18 force-pushed the cuco_static_map branch from 65368db to b0d0702 Compare April 23, 2026 00:32

srinivasyadav18 added 3 commits April 22, 2026 17:46

simlify primes usage

1194f02

docs and cleanups

9d38c38

more refactorings

875a47c

PointKernel marked this pull request as ready for review June 3, 2026 18:11

PointKernel requested a review from a team as a code owner June 3, 2026 18:11

PointKernel requested a review from andralex June 3, 2026 18:11

cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL Jun 3, 2026

PointKernel added 7 commits June 3, 2026 18:27

Merge upstream/main into cuco_static_map; align _CCCL_API -> _CCCL_HO…

f169d17

…ST_DEVICE_API

Code formatting

3d2dd24

Update docs for probing scheme

f96351f

Remove outer logic and get rid of count and retrieve APIs for map

edf4fd3

Replace thrust fancy iters with cuda:: ones

aa6b827

Inclusion cleanups + remove a circular inclusion

a8bc527

Fix outdated docs + comments

134736e

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

PointKernel added 26 commits June 23, 2026 19:12

Reorganize cuco hash table implementation headers under detail/

476043e

Simplify cuco bitwise_compare with bit_cast

21df677

Use granular libcudacxx headers in cuco hash table headers

f0ef272

Drop explicit uint64_t literal wrappers in cuco prime

43b8092

Use raw arrays in cuco prime to drop array header

ce68198

Guard device-only cuco thread-index helpers for host-only compilation

e73a819

Use inline constexpr variables for cuco launch constants

90e5338

Address review comments in cuco detail/utils.hpp

2b1f1a9

Use granular cuda/__atomic header in cuco hash table headers

98434a6

Guard device-only cuco APIs for host-only compilation

0237dc9

Use CUB_NS_QUALIFIER for CUB references in cuco hash table

7f232ff

Use local value_type alias in cuco open-addressing kernels

29126f9

Guard host-only open-addressing impl against NVRTC compilation

639c91f

Address open-addressing impl review nits (driver memcpy, unsigned cas…

e687939

…t, drop stray constexpr)

Use && and ! operators instead of alternative tokens in cuco open-add…

0072ed5

…ressing ref

Simplify duplicate-case fallthrough with a guard clause in cuco open-…

420790e

…addressing ref

Drop redundant ::cuda::experimental::cuco:: qualifier in cuco headers

63cdddf

Use for loop for cuco mod_pow binary exponentiation

c25ed32

Drop unused cstddef include in cuco capacity

aec97ae

Guard fixed_capacity_map for NVRTC, use driver API, lighter dynamic_e…

314cc7f

…xtent include

Simplify cuco probing_scheme: drop redundant private and is_double_ha…

5fd60a9

…shing struct

Reuse cuda::std __tuple_like/__pair_like in cuco traits; drop __detai…

1564b8f

…l wrapper

Rename cuco __detail namespace to detail; move internal traits to det…

1ebbd0c

…ail/utility

Move cuco detail/utils.hpp to detail/utility/cuda.cuh

8f7ea08

Prune unused and add missing includes in cuco hash table headers

f9d445c

Merge remote-tracking branch 'upstream/main' into cuco_static_map

6a243a5

PointKernel reviewed Jun 24, 2026

View reviewed changes

PointKernel approved these changes Jun 24, 2026

View reviewed changes

PointKernel requested review from fbusato and sleeepyjack June 24, 2026 00:32

Conversation

srinivasyadav18 commented Feb 18, 2026 • edited by PointKernel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Feb 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

copy-pr-bot Bot commented May 21, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Assessment against linked issues

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PointKernel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PointKernel Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

PointKernel Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

PointKernel Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

PointKernel Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

PointKernel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

srinivasyadav18 commented Feb 18, 2026 •

edited by PointKernel

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading