Add fixed_capacity_map to cudax#7705
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
0edb761 to
65368db
Compare
65368db to
b0d0702
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a complete open-addressing hash table infrastructure for CUDA Experimental, comprising device reference operations, grid kernels, host orchestration, and a public ChangesOpen-addressing and static_map port
Assessment against linked issues
Suggested labels
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 18
🧹 Nitpick comments (3)
cudax/include/cuda/experimental/__cuco/__detail/extent.cuh (1)
131-148: ⚡ Quick winsuggestion: Mark these header variable templates
inline. They are namespace-scopeconstexprdefinitions in a header, and the local CCCL rule requires the explicitinlinespelling for this pattern.As per coding guidelines, "All
constexprvariables at namespace/global scope must useinline, includingtemplatevariables."cudax/include/cuda/experimental/__cuco/probing_scheme.cuh (1)
24-31: ⚡ Quick winsuggestion: Wrap this header with the standard CCCL prologue/epilogue pair. The file enters code directly after its includes and never closes with
#include <cuda/std/__cccl/epilogue.h>, unlike the other new headers in this cohort.As per coding guidelines, "The last included header before code must be
#include <cuda/std/__cccl/prologue.h>, and#include <cuda/std/__cccl/epilogue.h>must be at the end of a file."Also applies to: 264-264
cudax/include/cuda/experimental/__cuco/__static_map/kernels.cuh (1)
119-235: suggestion: Please attach benchmark results for this fast path before merge. This shared-memory kernel adds a new execution path and tuning heuristic, so we need the perf numbers that justify it on the supported toolchains and architectures. As per coding guidelines, "Do not commit SASS code changes without running benchmarks to check for performance regressions."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d2eb011c-f333-4929-a09b-f09102640ec3
📒 Files selected for processing (22)
cudax/include/cuda/experimental/__cuco/__detail/bitwise_compare.cuhcudax/include/cuda/experimental/__cuco/__detail/equal_wrapper.cuhcudax/include/cuda/experimental/__cuco/__detail/extent.cuhcudax/include/cuda/experimental/__cuco/__detail/prime.hppcudax/include/cuda/experimental/__cuco/__detail/probing_scheme_base.cuhcudax/include/cuda/experimental/__cuco/__detail/types.cuhcudax/include/cuda/experimental/__cuco/__detail/utils.cuhcudax/include/cuda/experimental/__cuco/__detail/utils.hppcudax/include/cuda/experimental/__cuco/__open_addressing/functors.cuhcudax/include/cuda/experimental/__cuco/__open_addressing/kernels.cuhcudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_impl.cuhcudax/include/cuda/experimental/__cuco/__open_addressing/open_addressing_ref_impl.cuhcudax/include/cuda/experimental/__cuco/__open_addressing/slot_storage_ref.cuhcudax/include/cuda/experimental/__cuco/__open_addressing/types.cuhcudax/include/cuda/experimental/__cuco/__static_map/kernels.cuhcudax/include/cuda/experimental/__cuco/__utility/strong_type.cuhcudax/include/cuda/experimental/__cuco/probing_scheme.cuhcudax/include/cuda/experimental/__cuco/static_map.cuhcudax/include/cuda/experimental/__cuco/static_map_ref.cuhcudax/include/cuda/experimental/__cuco/traits.hppcudax/test/CMakeLists.txtcudax/test/cuco/static_map/test_static_map.cu
…t, drop stray constexpr)
PointKernel
left a comment
There was a problem hiding this comment.
Ready for another round of review
| const auto __src_lane = __ffs(__group_contains_available) - 1; | ||
| auto __status = __insert_result::__continue; | ||
| if (__group.thread_rank() == __src_lane) | ||
| { | ||
| __status = | ||
| __attempt_insert(__get_slot_ptr(*__probing_iter, __intra_bucket_index), empty_slot_sentinel(), __val); | ||
| } |
There was a problem hiding this comment.
Good to know. I ran some tests locally with Claude Code and didn't observe any noticeable impact, but I'm happy to revisit this if needed.
| using __impl_type = ::cuda::experimental::cuco::__open_addressing:: | ||
| __open_addressing_impl<_Key, value_type, _Scope, _KeyEqual, _ProbingScheme, _BucketSize, _MemoryResource>; | ||
|
|
||
| ::cuda::std::unique_ptr<__impl_type> __impl; |
There was a problem hiding this comment.
PImpl is intentional here. While the Core Guidelines primarily discuss PImpl in the context of ABI stability, it is also a well-established technique for reducing header dependencies and isolating implementation details
| { | ||
| using __size_type = typename _Capacity::index_type; | ||
| using __step_extent = ::cuda::std::extents<__size_type, _BucketSize>; | ||
| const __size_type __init = __hash(__probe_key) % (__cap.extent(0) / _BucketSize) * _BucketSize; |
There was a problem hiding this comment.
round_down(extent, B) isn't equivalent. The current (__hash % (extent / _BucketSize)) * _BucketSize picks a bucket (hash % num_buckets) then scales to its first slot, so the start is bucket-aligned.
| using __step_extent = ::cuda::std::extents<__size_type, ::cuda::std::dynamic_extent>; | ||
| return ::cuda::experimental::cuco::__detail::__probing_iterator<_Capacity, __step_extent>{ | ||
| __size_type{__hash1(__probe_key)} % (__cap.extent(0) / _BucketSize) * _BucketSize, | ||
| __step_extent{__size_type{(__hash2(__probe_key) % (__cap.extent(0) / _BucketSize - 1) + 1) * _BucketSize}}, |
There was a problem hiding this comment.
Not needed: extent(0) (the slot capacity) is always a multiple of the probe stride (cg_size * _BucketSize), so extent / _BucketSize is exact.
Description
closes #7463
This PR migrates cuCollections
static_mapinto cudax ascuda::experimental::cuco::static_map.Minimal scope: implements
insert,contains,clear, and trivial accessors, with capacity validation provided bymake_valid_capacityandis_valid_capacity. Tests mirror the cuCollections layout and use a parameterized matrix covering key type, probing scheme, CG size, and bucket size.Checklist