Hash-based dedup in BoxScanner instead of `std::set<pair>` by nikosavola · Pull Request #2377 · KLayout/klayout

nikosavola · 2026-06-23T07:16:27Z

Before a std::set<std::pair<Obj*, Obj*>> was holding every reported pair. In this PR, we change it to a hash table keyed on one pointer, whose value is a hash set of the partners.

This makes the membership tests for candidate pairs a hash map check. Removal also becomes a simple seen.erase(X).

Results

Best of 9 runs, single core (taskset -c 0), g++ 14.2 -O2 -std=c++17, glibc 2.41. Time is the full process() call (ms):

scenario	boxes	pairs reported	before	after	speedup
sparse (low overlap)	20 000	13	1.74	1.65	1.05×
medium overlap	20 000	3 971	3.23	3.00	1.08×
dense (high overlap)	20 000	177 809	45.5	16.6	2.7×
very dense	12 000	1 053 355	320	116	2.8×

The win scales with how large the seen set grows: light/sparse inputs (the common case) are neutral-to-slightly-better, while high-overlap inputs walked the tree more.

Reproduce

//
//  Microbenchmark for db::box_scanner pair-reporting (the cross-band dedup path).
//
//  Build (run before and after change):
//    g++ -std=c++17 -O2 -I<hdr> -Isrc/db/db -Isrc/tl/tl -Isrc/gsi/gsi \
//        bench_boxscanner.cc -o bench -Lbin -lklayout_db -lklayout_tl
//

#include "dbBoxScanner.h"
#include "dbBox.h"

#include <cstdio>
#include <cstdint>
#include <string>
#include <vector>
#include <algorithm>
#include <chrono>

//  Provide the no-Qt translation fallback (the prebuilt libs use QObject::tr).
namespace tl { std::string tr_fallback (const char *s) { return std::string (s); } }

//  Minimal receiver: only counts reported pairs (and acts as a behaviour check).
struct CountingRecorder
{
  uint64_t pairs = 0;
  void initialize () {}
  void finalize (bool) {}
  void finish (const db::Box *, size_t) {}
  bool stop () const { return false; }
  void add (const db::Box *, size_t, const db::Box *, size_t) { ++pairs; }
};

//  Deterministic 64-bit LCG so layouts are identical across builds/runs.
struct Rng
{
  uint64_t s;
  explicit Rng (uint64_t seed) : s (seed) {}
  uint64_t next () { s = s * 6364136223846793005ULL + 1442695040888963407ULL; return s >> 17; }
  int range (int lo, int hi) { return lo + int (next () % uint64_t (hi - lo + 1)); }
};

//  Generate `n` axis-aligned boxes of side in [smin,smax] with lower-left placed
//  uniformly in a square of side `extent`. The ratio side/extent sets the overlap
//  density: large boxes in a small extent -> many mutually overlapping pairs that
//  recur across scan bands, which is exactly what the "seen" set tracks.
static std::vector<db::Box> make_boxes (int n, int extent, int smin, int smax, uint64_t seed)
{
  Rng rng (seed);
  std::vector<db::Box> bb;
  bb.reserve (n);
  for (int i = 0; i < n; ++i) {
    int x = rng.range (0, extent);
    int y = rng.range (0, extent);
    int w = rng.range (smin, smax);
    int h = rng.range (smin, smax);
    bb.push_back (db::Box (x, y, x + w, y + h));
  }
  return bb;
}

struct Scenario { const char *name; int n; int extent; int smin; int smax; };

static double run_once (const std::vector<db::Box> &bb, uint64_t &pairs_out)
{
  db::box_scanner<db::Box, size_t> bs;
  for (size_t i = 0; i < bb.size (); ++i) {
    bs.insert (&bb[i], i);
  }
  CountingRecorder rec;
  db::box_convert<db::Box> bc;
  auto t0 = std::chrono::steady_clock::now ();
  bs.process (rec, 1 /*enl: touching counts*/, bc);
  auto t1 = std::chrono::steady_clock::now ();
  pairs_out = rec.pairs;
  return std::chrono::duration<double, std::milli> (t1 - t0).count ();
}

int main (int argc, char **argv)
{
  int trials = (argc > 1) ? atoi (argv[1]) : 7;

  Scenario scns[] = {
    //  name                  n      extent  smin  smax
    { "sparse (low overlap)",  20000, 2000000, 200,  400 },
    { "medium overlap",        20000,  200000, 300,  600 },
    { "dense (high overlap)",  20000,   40000, 400,  800 },
    { "very dense",            12000,   12000, 500, 1000 },
  };

  printf ("%-24s %8s %12s %11s %11s\n", "scenario", "boxes", "pairs", "best ms", "median ms");
  printf ("%s\n", "------------------------------------------------------------------------");

  for (const Scenario &sc : scns) {
    std::vector<db::Box> bb = make_boxes (sc.n, sc.extent, sc.smin, sc.smax, 0x9e3779b97f4a7c15ULL);
    std::vector<double> ts;
    uint64_t pairs = 0, p0 = 0;
    for (int t = 0; t < trials; ++t) {
      ts.push_back (run_once (bb, pairs));
      if (t == 0) { p0 = pairs; }
      if (pairs != p0) { fprintf (stderr, "NONDETERMINISTIC pair count!\n"); return 1; }
    }
    std::sort (ts.begin (), ts.end ());
    double best = ts.front ();
    double median = ts[ts.size () / 2];
    printf ("%-24s %8d %12llu %11.2f %11.2f\n",
            sc.name, sc.n, (unsigned long long) pairs, best, median);
  }
  return 0;
}

klayoutmatthias · 2026-06-24T21:29:11Z

Hi @nikosavola,

thanks for the patch. The reason I used the pairs was - as far as I recall - memory concerns. When you use a map of sets you allocate two pointers for a "lonely" pair, while you allocate a pointer plus memory for a set in the other case. With hash maps there always is a bigger overhead due to capacity pre-allocation, so I wonder what the memory effects are.

Do you have some numbers from your benchmarks?

Another thing about unordered sets and maps - specifically pointers - is reproducibility of the order, but I don't think that counts here.

Thanks,

Matthias

nikosavola · 2026-06-25T10:13:10Z

Hi @nikosavola,

thanks for the patch. The reason I used the pairs was - as far as I recall - memory concerns. When you use a map of sets you allocate two pointers for a "lonely" pair, while you allocate a pointer plus memory for a set in the other case. With hash maps there always is a bigger overhead due to capacity pre-allocation, so I wonder what the memory effects are.

Do you have some numbers from your benchmarks?

Another thing about unordered sets and maps - specifically pointers - is reproducibility of the order, but I don't think that counts here.

Thanks,

Matthias

You're right, the lonely pairs will be more expensive. Here's a table of the peak seen memory usage for various workloads, with the set and new map<set> implementations:

workload	peak live pairs	old `set`	new `map<set>`	new/old
sparse (ov~1)	273	13 KiB	40 KiB	3.13×
medium (ov~8)	5.3 k	251 KiB	311 KiB	1.24×
dense (ov~25)	28 k	1.29 MiB	0.96 MiB	0.74×
high (ov~60)	102 k	4.66 MiB	3.01 MiB	0.65×
very-high (ov~120)	203 k	9.28 MiB	5.68 MiB	0.61×
lonely-pairs*	200 k	9.16 MiB	39.3 MiB	4.29×

The high-overlap cases this PR should accelerate use ~40% less memory, while the worst case lonely pairs end up with 4.29× more memory usage. I think the memory overhead for the typical low overlap cases is also minimal in the end.

Hash-based dedup in BoxScanner instead of std::set<pair>

7c59fc0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hash-based dedup in BoxScanner instead of `std::set<pair>`#2377

Hash-based dedup in BoxScanner instead of `std::set<pair>`#2377
nikosavola wants to merge 1 commit into
KLayout:masterfrom
nikosavola:nikosavola/push-onqzsyonyllo

nikosavola commented Jun 23, 2026 •

edited

Loading

Uh oh!

klayoutmatthias commented Jun 24, 2026 •

edited

Loading

Uh oh!

nikosavola commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nikosavola commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Reproduce

Uh oh!

klayoutmatthias commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikosavola commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikosavola commented Jun 23, 2026 •

edited

Loading

klayoutmatthias commented Jun 24, 2026 •

edited

Loading