perf: batch hash join chain-traversal probe lookups for memory-level parallelism by Dandandan · Pull Request #22677 · apache/datafusion

Dandandan · 2026-05-31T19:39:28Z

Which issue does this PR close?

N/A (hash join probe-side micro-optimization).

Rationale for this change

In the hash join probe path, JoinHashMap::get_matched_indices_with_limit_offset handles the non-unique-key (chained) case by, for each probe row, looking up the head of the collision chain (map.find) and immediately walking that chain. Because the map.find result feeds straight into the chain walk that consumes it, the hash-table probes are effectively serialized: each row's hash-table cache miss must resolve before the next row's lookup begins.

The map.find miss is the dominant cost in this path, and these lookups are independent across probe rows — an ideal candidate for memory-level parallelism.

What changes are included in this PR?

Process the probe rows in windows of 16 with two phases per window:

Lookup phase — resolve the head-of-chain index (map.find) for every row in the window into a small stack array. These probes are independent, so their cache misses overlap (several outstanding at once) instead of each stalling the next row.
Traversal phase — walk the chains in probe-row order, exactly as before.

traverse_chain remains the sole authority for the per-call output limit and resume offset, and is still invoked in probe-row order, so the limit/offset resume protocol is unchanged. Heads looked up for rows past the output limit are simply discarded and recomputed on the next call (the lookup is a pure function of the hash). The unique-key fast path and the mid-chain resume handling are untouched.

Are these changes tested?

Yes:

New unit test test_limit_offset_window_boundary_matches_unbounded asserts the windowed lookup yields output identical to a single unbounded call, across more than one window (20 probe rows) with chains/singletons/misses, for several limits — including ones that split mid-chain and don't divide the window size.
cargo test -p datafusion-physical-plan --lib joins:: — 968 tests pass.
cargo test -p datafusion-sqllogictest --test sqllogictests -- joins — passes.

Are there any user-facing changes?

No. Internal performance optimization with identical results.

…parallelism In the chained (non-unique-key) path of `get_matched_indices_with_limit_offset`, each probe row's `map.find` cache miss fed directly into the chain walk that consumed it, so the hash-table probes were serialized one row at a time. Process probe rows in windows of 16: first resolve every row's head-of-chain index (`map.find`), then traverse the chains. Separating lookup from traversal lets the independent hash-table probes — the dominant cache miss here — have several misses outstanding at once (memory-level parallelism) instead of each one stalling the row that follows it. `traverse_chain` remains the sole authority for the output limit and resume offset and is still called in probe-row order, so the resume protocol is unchanged. Heads looked up for rows past the limit are discarded and recomputed on the next call. Adds a unit test asserting the windowed lookup produces identical output to a single unbounded call across the window boundary, for several limits including ones that split mid-chain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Dandandan · 2026-05-31T19:41:28Z

run benchmarks

adriangbot · 2026-05-31T19:44:13Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4587871102-384-g8225 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing perf/batch-chain-traversal (8abebd8) to 0da8961 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-31T19:44:24Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4587871102-386-4zt8h 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing perf/batch-chain-traversal (8abebd8) to 0da8961 (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-31T19:44:26Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4587871102-385-pw8n2 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing perf/batch-chain-traversal (8abebd8) to 0da8961 (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-31T19:58:13Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and perf_batch-chain-traversal
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃     perf_batch-chain-traversal ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 38.34 / 39.50 ±1.01 / 41.13 ms │ 38.30 / 38.90 ±1.03 / 40.95 ms │    no change │
│ QQuery 2  │ 19.70 / 19.85 ±0.13 / 20.09 ms │ 19.60 / 19.97 ±0.29 / 20.44 ms │    no change │
│ QQuery 3  │ 32.90 / 34.11 ±1.36 / 36.01 ms │ 33.08 / 36.07 ±1.85 / 38.89 ms │ 1.06x slower │
│ QQuery 4  │ 17.47 / 17.88 ±0.65 / 19.18 ms │ 17.33 / 17.52 ±0.14 / 17.72 ms │    no change │
│ QQuery 5  │ 39.57 / 41.18 ±1.51 / 43.45 ms │ 39.61 / 42.10 ±1.36 / 43.66 ms │    no change │
│ QQuery 6  │ 16.46 / 16.54 ±0.06 / 16.65 ms │ 16.16 / 16.33 ±0.12 / 16.50 ms │    no change │
│ QQuery 7  │ 45.69 / 49.28 ±3.24 / 55.03 ms │ 46.61 / 49.29 ±1.99 / 51.55 ms │    no change │
│ QQuery 8  │ 44.79 / 44.92 ±0.07 / 44.98 ms │ 44.77 / 45.43 ±0.87 / 47.13 ms │    no change │
│ QQuery 9  │ 49.67 / 50.26 ±0.69 / 51.56 ms │ 49.35 / 50.66 ±0.74 / 51.55 ms │    no change │
│ QQuery 10 │ 63.82 / 64.01 ±0.10 / 64.14 ms │ 63.55 / 64.03 ±0.80 / 65.62 ms │    no change │
│ QQuery 11 │ 13.28 / 13.52 ±0.18 / 13.74 ms │ 13.38 / 13.61 ±0.19 / 13.88 ms │    no change │
│ QQuery 12 │ 24.11 / 24.42 ±0.33 / 24.88 ms │ 24.47 / 25.19 ±0.97 / 27.05 ms │    no change │
│ QQuery 13 │ 34.90 / 36.24 ±1.49 / 39.01 ms │ 34.72 / 35.97 ±1.14 / 37.83 ms │    no change │
│ QQuery 14 │ 25.40 / 25.55 ±0.09 / 25.68 ms │ 25.40 / 25.51 ±0.11 / 25.71 ms │    no change │
│ QQuery 15 │ 31.52 / 32.25 ±0.92 / 34.06 ms │ 31.49 / 32.18 ±0.78 / 33.61 ms │    no change │
│ QQuery 16 │ 14.72 / 15.00 ±0.27 / 15.35 ms │ 14.84 / 15.06 ±0.14 / 15.24 ms │    no change │
│ QQuery 17 │ 73.58 / 75.01 ±1.55 / 77.96 ms │ 73.91 / 76.08 ±2.05 / 79.99 ms │    no change │
│ QQuery 18 │ 62.39 / 64.71 ±2.59 / 69.76 ms │ 61.86 / 63.58 ±1.35 / 65.23 ms │    no change │
│ QQuery 19 │ 33.97 / 34.17 ±0.22 / 34.59 ms │ 34.34 / 34.54 ±0.21 / 34.91 ms │    no change │
│ QQuery 20 │ 37.27 / 37.85 ±0.35 / 38.32 ms │ 37.56 / 38.32 ±0.42 / 38.84 ms │    no change │
│ QQuery 21 │ 54.92 / 57.16 ±1.37 / 58.80 ms │ 56.92 / 60.09 ±2.60 / 64.32 ms │ 1.05x slower │
│ QQuery 22 │ 23.52 / 24.65 ±1.46 / 27.53 ms │ 23.55 / 23.90 ±0.44 / 24.75 ms │    no change │
└───────────┴────────────────────────────────┴────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                         ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 818.05ms │
│ Total Time (perf_batch-chain-traversal)   │ 824.32ms │
│ Average Time (HEAD)                       │  37.18ms │
│ Average Time (perf_batch-chain-traversal) │  37.47ms │
│ Queries Faster                            │        0 │
│ Queries Slower                            │        2 │
│ Queries with No Change                    │       20 │
│ Queries with Failure                      │        0 │
└───────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.0 GiB
CPU user	29.7s
CPU sys	2.4s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	5.6 GiB
Avg memory	5.1 GiB
CPU user	30.0s
CPU sys	2.2s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-31T20:00:13Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and perf_batch-chain-traversal
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃            perf_batch-chain-traversal ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           5.73 / 6.24 ±0.91 / 8.06 ms │           5.77 / 6.28 ±0.86 / 8.00 ms │     no change │
│ QQuery 2  │        80.24 / 80.99 ±0.40 / 81.41 ms │        80.76 / 81.18 ±0.51 / 82.03 ms │     no change │
│ QQuery 3  │        29.39 / 29.96 ±0.52 / 30.93 ms │        29.17 / 29.43 ±0.20 / 29.69 ms │     no change │
│ QQuery 4  │     539.39 / 544.28 ±4.94 / 553.66 ms │     531.75 / 537.66 ±3.75 / 543.11 ms │     no change │
│ QQuery 5  │        51.42 / 51.96 ±0.45 / 52.72 ms │        51.61 / 52.06 ±0.38 / 52.69 ms │     no change │
│ QQuery 6  │        36.13 / 36.92 ±0.54 / 37.54 ms │        35.80 / 36.33 ±0.36 / 36.82 ms │     no change │
│ QQuery 7  │     108.82 / 110.78 ±1.74 / 113.09 ms │     108.52 / 110.20 ±1.35 / 112.44 ms │     no change │
│ QQuery 8  │        36.82 / 37.63 ±0.43 / 38.06 ms │        37.33 / 37.63 ±0.29 / 38.10 ms │     no change │
│ QQuery 9  │        54.25 / 55.94 ±1.28 / 58.21 ms │        54.69 / 56.20 ±1.08 / 57.76 ms │     no change │
│ QQuery 10 │        80.93 / 82.25 ±1.67 / 85.53 ms │        81.84 / 82.90 ±1.14 / 85.08 ms │     no change │
│ QQuery 11 │     337.87 / 341.19 ±3.29 / 345.65 ms │     341.20 / 346.10 ±2.82 / 349.11 ms │     no change │
│ QQuery 12 │        29.38 / 30.02 ±0.74 / 31.46 ms │        29.65 / 29.76 ±0.08 / 29.85 ms │     no change │
│ QQuery 13 │     127.41 / 128.17 ±0.72 / 129.51 ms │     129.07 / 129.91 ±0.70 / 130.86 ms │     no change │
│ QQuery 14 │     509.19 / 512.50 ±3.02 / 518.12 ms │     515.60 / 517.01 ±0.91 / 518.18 ms │     no change │
│ QQuery 15 │        63.10 / 63.59 ±0.31 / 63.93 ms │        62.01 / 63.16 ±0.61 / 63.81 ms │     no change │
│ QQuery 16 │           6.46 / 6.60 ±0.23 / 7.07 ms │           6.65 / 6.80 ±0.21 / 7.21 ms │     no change │
│ QQuery 17 │        81.98 / 82.74 ±0.61 / 83.39 ms │        82.46 / 82.89 ±0.41 / 83.64 ms │     no change │
│ QQuery 18 │     153.26 / 155.81 ±1.39 / 157.37 ms │     154.33 / 155.31 ±0.66 / 156.25 ms │     no change │
│ QQuery 19 │        41.64 / 42.01 ±0.34 / 42.63 ms │        42.21 / 42.42 ±0.21 / 42.79 ms │     no change │
│ QQuery 20 │        35.83 / 36.47 ±0.41 / 37.02 ms │        36.07 / 37.04 ±0.52 / 37.43 ms │     no change │
│ QQuery 21 │        17.32 / 17.59 ±0.21 / 17.94 ms │        17.73 / 18.00 ±0.18 / 18.28 ms │     no change │
│ QQuery 22 │        63.91 / 65.08 ±0.93 / 66.18 ms │        65.23 / 66.15 ±1.01 / 67.93 ms │     no change │
│ QQuery 23 │     494.31 / 499.01 ±3.65 / 505.04 ms │     490.17 / 497.95 ±4.72 / 503.94 ms │     no change │
│ QQuery 24 │     238.01 / 240.63 ±1.98 / 243.82 ms │     235.99 / 239.59 ±3.69 / 245.04 ms │     no change │
│ QQuery 25 │     115.11 / 116.28 ±1.58 / 119.35 ms │     115.16 / 116.66 ±1.48 / 119.18 ms │     no change │
│ QQuery 26 │        71.22 / 71.95 ±0.72 / 73.21 ms │        71.29 / 72.09 ±0.94 / 73.73 ms │     no change │
│ QQuery 27 │           6.49 / 6.66 ±0.17 / 6.98 ms │          6.67 / 7.92 ±1.92 / 11.74 ms │  1.19x slower │
│ QQuery 28 │        56.93 / 60.49 ±2.78 / 63.15 ms │        57.41 / 59.12 ±1.80 / 62.23 ms │     no change │
│ QQuery 29 │      99.42 / 100.58 ±1.15 / 102.36 ms │      98.97 / 100.15 ±0.89 / 101.59 ms │     no change │
│ QQuery 30 │        30.82 / 31.26 ±0.51 / 32.25 ms │        30.48 / 30.90 ±0.26 / 31.15 ms │     no change │
│ QQuery 31 │     113.71 / 115.20 ±1.86 / 118.74 ms │     113.89 / 115.16 ±1.59 / 118.22 ms │     no change │
│ QQuery 32 │        20.42 / 20.87 ±0.24 / 21.13 ms │        20.26 / 20.83 ±0.38 / 21.19 ms │     no change │
│ QQuery 33 │        39.12 / 40.18 ±1.55 / 43.25 ms │        38.84 / 39.86 ±1.16 / 42.11 ms │     no change │
│ QQuery 34 │          9.41 / 9.72 ±0.39 / 10.45 ms │          9.26 / 9.79 ±0.56 / 10.83 ms │     no change │
│ QQuery 35 │        81.76 / 82.99 ±1.33 / 85.54 ms │        81.52 / 82.71 ±1.27 / 85.01 ms │     no change │
│ QQuery 36 │           5.77 / 5.90 ±0.17 / 6.23 ms │           5.88 / 6.01 ±0.13 / 6.24 ms │     no change │
│ QQuery 37 │           6.92 / 7.03 ±0.12 / 7.25 ms │           6.83 / 7.07 ±0.16 / 7.30 ms │     no change │
│ QQuery 38 │        70.61 / 70.87 ±0.24 / 71.24 ms │        69.51 / 69.96 ±0.32 / 70.41 ms │     no change │
│ QQuery 39 │     101.66 / 102.31 ±0.82 / 103.93 ms │      99.72 / 100.57 ±0.46 / 100.91 ms │     no change │
│ QQuery 40 │        23.54 / 23.97 ±0.56 / 25.04 ms │        22.96 / 23.52 ±0.41 / 24.07 ms │     no change │
│ QQuery 41 │        11.70 / 11.93 ±0.14 / 12.13 ms │        11.48 / 11.76 ±0.19 / 12.06 ms │     no change │
│ QQuery 42 │        24.27 / 24.74 ±0.34 / 25.25 ms │        24.30 / 24.60 ±0.19 / 24.87 ms │     no change │
│ QQuery 43 │           4.80 / 4.93 ±0.16 / 5.24 ms │           4.77 / 4.91 ±0.21 / 5.33 ms │     no change │
│ QQuery 44 │        10.92 / 11.02 ±0.09 / 11.19 ms │        10.86 / 11.02 ±0.08 / 11.10 ms │     no change │
│ QQuery 45 │        41.56 / 43.29 ±1.01 / 44.30 ms │        41.72 / 42.66 ±1.03 / 44.63 ms │     no change │
│ QQuery 46 │        13.49 / 13.88 ±0.37 / 14.56 ms │        12.75 / 13.31 ±0.50 / 14.16 ms │     no change │
│ QQuery 47 │     238.68 / 243.88 ±4.85 / 252.64 ms │     238.71 / 242.37 ±3.36 / 246.88 ms │     no change │
│ QQuery 48 │     104.10 / 104.28 ±0.22 / 104.69 ms │     104.16 / 104.46 ±0.27 / 104.93 ms │     no change │
│ QQuery 49 │        79.41 / 80.35 ±0.70 / 81.50 ms │        80.39 / 82.11 ±2.15 / 86.15 ms │     no change │
│ QQuery 50 │        60.46 / 61.70 ±1.85 / 65.39 ms │        60.38 / 60.79 ±0.42 / 61.56 ms │     no change │
│ QQuery 51 │        92.81 / 94.91 ±1.14 / 96.04 ms │        92.85 / 94.45 ±1.25 / 96.01 ms │     no change │
│ QQuery 52 │        24.68 / 24.89 ±0.28 / 25.42 ms │        24.60 / 24.97 ±0.31 / 25.50 ms │     no change │
│ QQuery 53 │        30.11 / 31.80 ±2.72 / 37.21 ms │        30.26 / 31.68 ±2.75 / 37.17 ms │     no change │
│ QQuery 54 │        55.38 / 56.01 ±0.51 / 56.61 ms │        55.35 / 55.68 ±0.27 / 56.06 ms │     no change │
│ QQuery 55 │        24.15 / 24.71 ±0.39 / 25.36 ms │        24.14 / 24.45 ±0.31 / 25.01 ms │     no change │
│ QQuery 56 │        39.06 / 39.63 ±0.40 / 40.17 ms │        39.73 / 40.08 ±0.32 / 40.65 ms │     no change │
│ QQuery 57 │     180.09 / 180.48 ±0.44 / 181.16 ms │     178.29 / 180.59 ±1.97 / 183.55 ms │     no change │
│ QQuery 58 │     119.45 / 120.60 ±0.63 / 121.27 ms │     118.26 / 118.64 ±0.45 / 119.48 ms │     no change │
│ QQuery 59 │     118.48 / 119.59 ±1.13 / 121.68 ms │     117.22 / 117.72 ±0.27 / 118.01 ms │     no change │
│ QQuery 60 │        40.08 / 40.80 ±0.50 / 41.64 ms │        39.72 / 40.17 ±0.52 / 41.14 ms │     no change │
│ QQuery 61 │        13.23 / 14.19 ±1.59 / 17.35 ms │        12.91 / 13.01 ±0.11 / 13.19 ms │ +1.09x faster │
│ QQuery 62 │        47.20 / 47.50 ±0.21 / 47.80 ms │        46.65 / 47.94 ±1.08 / 49.36 ms │     no change │
│ QQuery 63 │        30.32 / 30.59 ±0.27 / 30.94 ms │        30.10 / 30.60 ±0.38 / 31.05 ms │     no change │
│ QQuery 64 │     464.65 / 470.10 ±5.71 / 480.22 ms │     465.27 / 470.37 ±4.44 / 477.46 ms │     no change │
│ QQuery 65 │     149.78 / 152.23 ±2.52 / 155.33 ms │     145.59 / 151.52 ±5.53 / 161.05 ms │     no change │
│ QQuery 66 │        79.39 / 80.52 ±1.28 / 82.86 ms │        78.91 / 80.17 ±1.31 / 82.60 ms │     no change │
│ QQuery 67 │     253.65 / 260.97 ±4.54 / 267.53 ms │     258.01 / 262.89 ±6.11 / 273.04 ms │     no change │
│ QQuery 68 │        13.16 / 13.45 ±0.24 / 13.73 ms │        13.16 / 13.50 ±0.20 / 13.76 ms │     no change │
│ QQuery 69 │        76.80 / 77.13 ±0.25 / 77.52 ms │        76.89 / 77.16 ±0.17 / 77.33 ms │     no change │
│ QQuery 70 │     107.72 / 116.83 ±8.09 / 130.56 ms │     106.38 / 114.74 ±7.92 / 129.05 ms │     no change │
│ QQuery 71 │        36.16 / 36.44 ±0.25 / 36.74 ms │        36.15 / 36.60 ±0.41 / 37.33 ms │     no change │
│ QQuery 72 │ 2145.50 / 2248.02 ±64.14 / 2324.96 ms │ 2244.51 / 2317.34 ±69.53 / 2418.40 ms │     no change │
│ QQuery 73 │          9.25 / 9.61 ±0.31 / 10.17 ms │         9.21 / 11.49 ±3.65 / 18.75 ms │  1.19x slower │
│ QQuery 74 │     192.65 / 193.48 ±0.72 / 194.33 ms │     189.32 / 195.55 ±6.11 / 203.31 ms │     no change │
│ QQuery 75 │     147.83 / 149.10 ±0.87 / 150.27 ms │     153.75 / 157.41 ±2.71 / 161.12 ms │  1.06x slower │
│ QQuery 76 │        35.97 / 36.15 ±0.17 / 36.40 ms │        38.08 / 38.41 ±0.26 / 38.80 ms │  1.06x slower │
│ QQuery 77 │        60.56 / 61.67 ±1.12 / 63.79 ms │        63.41 / 65.55 ±2.10 / 69.41 ms │  1.06x slower │
│ QQuery 78 │     191.64 / 194.74 ±1.87 / 197.23 ms │     203.44 / 206.40 ±3.27 / 210.91 ms │  1.06x slower │
│ QQuery 79 │        68.11 / 68.44 ±0.29 / 68.93 ms │        68.96 / 70.86 ±1.08 / 71.95 ms │     no change │
│ QQuery 80 │     101.13 / 101.85 ±0.44 / 102.39 ms │     105.17 / 109.71 ±4.02 / 115.72 ms │  1.08x slower │
│ QQuery 81 │        24.51 / 24.75 ±0.15 / 24.88 ms │        25.46 / 26.07 ±0.32 / 26.35 ms │  1.05x slower │
│ QQuery 82 │        16.59 / 17.07 ±0.28 / 17.45 ms │        17.48 / 18.28 ±0.73 / 19.37 ms │  1.07x slower │
│ QQuery 83 │        37.56 / 39.26 ±2.62 / 44.47 ms │        37.90 / 38.68 ±0.50 / 39.45 ms │     no change │
│ QQuery 84 │        44.33 / 44.52 ±0.29 / 45.09 ms │        44.29 / 45.64 ±1.77 / 49.06 ms │     no change │
│ QQuery 85 │     136.91 / 139.30 ±2.18 / 142.92 ms │     140.70 / 142.52 ±2.00 / 145.93 ms │     no change │
│ QQuery 86 │        25.42 / 26.34 ±0.63 / 27.33 ms │        26.02 / 26.83 ±0.47 / 27.30 ms │     no change │
│ QQuery 87 │        69.92 / 71.67 ±1.37 / 73.16 ms │        72.19 / 74.63 ±2.54 / 79.33 ms │     no change │
│ QQuery 88 │        63.17 / 64.15 ±0.68 / 65.17 ms │        63.68 / 64.73 ±0.77 / 65.71 ms │     no change │
│ QQuery 89 │        36.71 / 37.29 ±0.74 / 38.69 ms │        37.42 / 37.95 ±0.34 / 38.48 ms │     no change │
│ QQuery 90 │        17.14 / 17.58 ±0.25 / 17.82 ms │        17.40 / 17.76 ±0.39 / 18.51 ms │     no change │
│ QQuery 91 │        52.52 / 53.01 ±0.54 / 54.03 ms │        52.20 / 53.33 ±0.64 / 54.14 ms │     no change │
│ QQuery 92 │        30.81 / 31.17 ±0.30 / 31.59 ms │        31.78 / 32.54 ±0.60 / 33.36 ms │     no change │
│ QQuery 93 │        50.69 / 51.59 ±0.65 / 52.48 ms │        52.57 / 53.33 ±0.52 / 53.80 ms │     no change │
│ QQuery 94 │        38.91 / 39.38 ±0.30 / 39.80 ms │        39.39 / 40.24 ±0.64 / 41.24 ms │     no change │
│ QQuery 95 │        86.28 / 87.19 ±0.76 / 88.49 ms │        86.01 / 89.66 ±2.38 / 93.53 ms │     no change │
│ QQuery 96 │        24.63 / 24.83 ±0.16 / 25.03 ms │        25.12 / 25.82 ±1.02 / 27.84 ms │     no change │
│ QQuery 97 │        46.38 / 46.88 ±0.33 / 47.38 ms │        48.14 / 48.34 ±0.12 / 48.49 ms │     no change │
│ QQuery 98 │        43.17 / 43.55 ±0.30 / 43.96 ms │        46.81 / 47.77 ±0.58 / 48.37 ms │  1.10x slower │
│ QQuery 99 │        70.50 / 71.53 ±1.56 / 74.62 ms │        72.71 / 73.46 ±0.61 / 74.47 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 10752.14ms │
│ Total Time (perf_batch-chain-traversal)   │ 10879.48ms │
│ Average Time (HEAD)                       │   108.61ms │
│ Average Time (perf_batch-chain-traversal) │   109.89ms │
│ Queries Faster                            │          1 │
│ Queries Slower                            │         10 │
│ Queries with No Change                    │         88 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	55.0s
Peak memory	7.0 GiB
Avg memory	6.2 GiB
CPU user	242.4s
CPU sys	5.9s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	55.0s
Peak memory	6.9 GiB
Avg memory	6.2 GiB
CPU user	243.1s
CPU sys	6.0s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-31T20:03:24Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and perf_batch-chain-traversal
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃            perf_batch-chain-traversal ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.15 / 4.77 ±7.11 / 19.00 ms │          1.17 / 4.78 ±7.06 / 18.91 ms │     no change │
│ QQuery 1  │        12.35 / 12.88 ±0.27 / 13.12 ms │        12.69 / 13.19 ±0.35 / 13.57 ms │     no change │
│ QQuery 2  │        36.69 / 37.10 ±0.26 / 37.40 ms │        35.64 / 36.13 ±0.41 / 36.86 ms │     no change │
│ QQuery 3  │        31.27 / 32.36 ±1.29 / 34.89 ms │        30.71 / 31.04 ±0.33 / 31.67 ms │     no change │
│ QQuery 4  │     230.19 / 235.41 ±3.65 / 239.45 ms │     224.32 / 231.02 ±3.55 / 234.86 ms │     no change │
│ QQuery 5  │     278.09 / 280.37 ±1.50 / 282.27 ms │     270.81 / 273.64 ±3.28 / 279.81 ms │     no change │
│ QQuery 6  │           1.18 / 1.33 ±0.23 / 1.77 ms │           1.18 / 1.31 ±0.21 / 1.72 ms │     no change │
│ QQuery 7  │        13.74 / 14.00 ±0.13 / 14.08 ms │        14.29 / 14.52 ±0.11 / 14.58 ms │     no change │
│ QQuery 8  │     322.13 / 329.65 ±4.85 / 335.19 ms │     319.80 / 322.38 ±2.46 / 326.78 ms │     no change │
│ QQuery 9  │     457.06 / 465.87 ±6.75 / 476.72 ms │     453.66 / 458.96 ±3.78 / 463.78 ms │     no change │
│ QQuery 10 │        69.99 / 71.93 ±2.16 / 76.13 ms │        68.63 / 69.51 ±0.82 / 70.85 ms │     no change │
│ QQuery 11 │        80.93 / 82.03 ±0.79 / 82.89 ms │        80.75 / 82.27 ±1.24 / 84.20 ms │     no change │
│ QQuery 12 │     272.16 / 276.04 ±4.17 / 281.30 ms │     268.66 / 271.34 ±2.77 / 276.03 ms │     no change │
│ QQuery 13 │     368.00 / 373.98 ±5.09 / 382.20 ms │    370.49 / 385.97 ±12.25 / 401.83 ms │     no change │
│ QQuery 14 │     283.58 / 286.61 ±2.03 / 289.69 ms │     280.13 / 285.20 ±4.92 / 292.77 ms │     no change │
│ QQuery 15 │     269.80 / 276.11 ±7.50 / 288.73 ms │    269.17 / 277.96 ±11.93 / 301.20 ms │     no change │
│ QQuery 16 │     609.86 / 620.81 ±8.03 / 632.49 ms │     611.72 / 621.03 ±7.49 / 632.44 ms │     no change │
│ QQuery 17 │     625.93 / 628.41 ±2.50 / 632.97 ms │     618.39 / 627.02 ±6.13 / 633.67 ms │     no change │
│ QQuery 18 │  1261.59 / 1274.87 ±9.50 / 1287.08 ms │  1263.75 / 1275.84 ±8.30 / 1284.05 ms │     no change │
│ QQuery 19 │        27.99 / 29.65 ±3.16 / 35.97 ms │       27.82 / 36.85 ±14.18 / 64.60 ms │  1.24x slower │
│ QQuery 20 │     513.50 / 521.82 ±7.52 / 534.01 ms │     514.86 / 521.25 ±3.88 / 526.90 ms │     no change │
│ QQuery 21 │     586.87 / 594.64 ±5.15 / 599.75 ms │     593.71 / 597.51 ±3.43 / 601.82 ms │     no change │
│ QQuery 22 │ 1057.85 / 1072.70 ±18.25 / 1105.65 ms │ 1055.19 / 1067.19 ±11.29 / 1083.75 ms │     no change │
│ QQuery 23 │ 3149.28 / 3181.24 ±23.19 / 3220.68 ms │ 3156.72 / 3176.62 ±15.74 / 3196.36 ms │     no change │
│ QQuery 24 │        41.71 / 44.47 ±3.84 / 52.05 ms │        41.37 / 43.33 ±3.08 / 49.43 ms │     no change │
│ QQuery 25 │     111.24 / 113.80 ±3.69 / 120.95 ms │     110.81 / 113.38 ±3.15 / 119.54 ms │     no change │
│ QQuery 26 │        42.37 / 43.82 ±2.24 / 48.24 ms │        41.88 / 46.18 ±7.07 / 60.27 ms │  1.05x slower │
│ QQuery 27 │     663.68 / 672.62 ±7.18 / 680.80 ms │     670.50 / 675.46 ±5.27 / 682.76 ms │     no change │
│ QQuery 28 │ 3022.88 / 3046.01 ±17.10 / 3069.22 ms │ 3022.85 / 3037.10 ±20.32 / 3077.50 ms │     no change │
│ QQuery 29 │        40.24 / 45.49 ±6.41 / 56.18 ms │        40.23 / 43.73 ±4.32 / 50.59 ms │     no change │
│ QQuery 30 │     299.32 / 304.55 ±3.09 / 307.04 ms │     300.15 / 308.54 ±6.80 / 318.09 ms │     no change │
│ QQuery 31 │     281.42 / 287.16 ±8.58 / 304.15 ms │    280.15 / 290.09 ±10.10 / 303.43 ms │     no change │
│ QQuery 32 │     921.24 / 932.54 ±9.22 / 946.68 ms │    923.57 / 947.54 ±27.43 / 999.67 ms │     no change │
│ QQuery 33 │  1432.90 / 1445.13 ±9.16 / 1460.17 ms │ 1414.73 / 1436.78 ±18.02 / 1467.60 ms │     no change │
│ QQuery 34 │ 1439.32 / 1464.52 ±19.95 / 1494.37 ms │ 1442.05 / 1468.56 ±19.38 / 1488.75 ms │     no change │
│ QQuery 35 │    276.75 / 307.45 ±39.46 / 382.70 ms │     277.92 / 285.44 ±7.17 / 297.99 ms │ +1.08x faster │
│ QQuery 36 │        67.12 / 70.90 ±2.92 / 74.78 ms │        66.18 / 71.53 ±4.92 / 78.60 ms │     no change │
│ QQuery 37 │        35.07 / 40.57 ±9.88 / 60.32 ms │        35.32 / 36.95 ±1.92 / 40.61 ms │ +1.10x faster │
│ QQuery 38 │        42.32 / 45.00 ±2.81 / 49.34 ms │        42.02 / 47.46 ±4.51 / 53.66 ms │  1.05x slower │
│ QQuery 39 │     142.85 / 146.98 ±3.19 / 150.87 ms │     141.79 / 151.29 ±6.69 / 160.14 ms │     no change │
│ QQuery 40 │        13.78 / 14.15 ±0.22 / 14.37 ms │        14.06 / 14.24 ±0.16 / 14.48 ms │     no change │
│ QQuery 41 │        13.64 / 16.48 ±3.29 / 21.95 ms │        13.80 / 17.88 ±5.60 / 28.20 ms │  1.08x slower │
│ QQuery 42 │        13.13 / 15.12 ±3.51 / 22.14 ms │        13.16 / 17.52 ±7.99 / 33.49 ms │  1.16x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                         │ 19761.38ms │
│ Total Time (perf_batch-chain-traversal)   │ 19735.52ms │
│ Average Time (HEAD)                       │   459.57ms │
│ Average Time (perf_batch-chain-traversal) │   458.97ms │
│ Queries Faster                            │          2 │
│ Queries Slower                            │          5 │
│ Queries with No Change                    │         36 │
│ Queries with Failure                      │          0 │
└───────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	100.0s
Peak memory	29.6 GiB
Avg memory	23.0 GiB
CPU user	1029.1s
CPU sys	62.9s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	31.2 GiB
Avg memory	23.4 GiB
CPU user	1025.8s
CPU sys	63.1s
Peak spill	0 B

File an issue against this benchmark runner

github-actions Bot added the physical-plan Changes to the physical-plan crate label May 31, 2026

Dandandan closed this May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: batch hash join chain-traversal probe lookups for memory-level parallelism#22677

perf: batch hash join chain-traversal probe lookups for memory-level parallelism#22677
Dandandan wants to merge 1 commit into
apache:mainfrom
Dandandan:perf/batch-chain-traversal

Dandandan commented May 31, 2026

Uh oh!

Dandandan commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Dandandan commented May 31, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Dandandan commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

adriangbot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants