Skip to content

Hash-based dedup in BoxScanner instead of std::set<pair>#2377

Open
nikosavola wants to merge 1 commit into
KLayout:masterfrom
nikosavola:nikosavola/push-onqzsyonyllo
Open

Hash-based dedup in BoxScanner instead of std::set<pair>#2377
nikosavola wants to merge 1 commit into
KLayout:masterfrom
nikosavola:nikosavola/push-onqzsyonyllo

Conversation

@nikosavola

@nikosavola nikosavola commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Before a std::set<std::pair<Obj*, Obj*>> was holding every reported pair. In this PR, we change it to a hash table keyed on one pointer, whose value is a hash set of the partners.

This makes the membership tests for candidate pairs a hash map check. Removal also becomes a simple seen.erase(X).

Results

Best of 9 runs, single core (taskset -c 0), g++ 14.2 -O2 -std=c++17, glibc 2.41. Time is the full process() call (ms):

scenario boxes pairs reported before after speedup
sparse (low overlap) 20 000 13 1.74 1.65 1.05×
medium overlap 20 000 3 971 3.23 3.00 1.08×
dense (high overlap) 20 000 177 809 45.5 16.6 2.7×
very dense 12 000 1 053 355 320 116 2.8×

The win scales with how large the seen set grows: light/sparse inputs (the common case) are neutral-to-slightly-better, while high-overlap inputs walked the tree more.

Reproduce

//
//  Microbenchmark for db::box_scanner pair-reporting (the cross-band dedup path).
//
//  Build (run before and after change):
//    g++ -std=c++17 -O2 -I<hdr> -Isrc/db/db -Isrc/tl/tl -Isrc/gsi/gsi \
//        bench_boxscanner.cc -o bench -Lbin -lklayout_db -lklayout_tl
//

#include "dbBoxScanner.h"
#include "dbBox.h"

#include <cstdio>
#include <cstdint>
#include <string>
#include <vector>
#include <algorithm>
#include <chrono>

//  Provide the no-Qt translation fallback (the prebuilt libs use QObject::tr).
namespace tl { std::string tr_fallback (const char *s) { return std::string (s); } }

//  Minimal receiver: only counts reported pairs (and acts as a behaviour check).
struct CountingRecorder
{
  uint64_t pairs = 0;
  void initialize () {}
  void finalize (bool) {}
  void finish (const db::Box *, size_t) {}
  bool stop () const { return false; }
  void add (const db::Box *, size_t, const db::Box *, size_t) { ++pairs; }
};

//  Deterministic 64-bit LCG so layouts are identical across builds/runs.
struct Rng
{
  uint64_t s;
  explicit Rng (uint64_t seed) : s (seed) {}
  uint64_t next () { s = s * 6364136223846793005ULL + 1442695040888963407ULL; return s >> 17; }
  int range (int lo, int hi) { return lo + int (next () % uint64_t (hi - lo + 1)); }
};

//  Generate `n` axis-aligned boxes of side in [smin,smax] with lower-left placed
//  uniformly in a square of side `extent`. The ratio side/extent sets the overlap
//  density: large boxes in a small extent -> many mutually overlapping pairs that
//  recur across scan bands, which is exactly what the "seen" set tracks.
static std::vector<db::Box> make_boxes (int n, int extent, int smin, int smax, uint64_t seed)
{
  Rng rng (seed);
  std::vector<db::Box> bb;
  bb.reserve (n);
  for (int i = 0; i < n; ++i) {
    int x = rng.range (0, extent);
    int y = rng.range (0, extent);
    int w = rng.range (smin, smax);
    int h = rng.range (smin, smax);
    bb.push_back (db::Box (x, y, x + w, y + h));
  }
  return bb;
}

struct Scenario { const char *name; int n; int extent; int smin; int smax; };

static double run_once (const std::vector<db::Box> &bb, uint64_t &pairs_out)
{
  db::box_scanner<db::Box, size_t> bs;
  for (size_t i = 0; i < bb.size (); ++i) {
    bs.insert (&bb[i], i);
  }
  CountingRecorder rec;
  db::box_convert<db::Box> bc;
  auto t0 = std::chrono::steady_clock::now ();
  bs.process (rec, 1 /*enl: touching counts*/, bc);
  auto t1 = std::chrono::steady_clock::now ();
  pairs_out = rec.pairs;
  return std::chrono::duration<double, std::milli> (t1 - t0).count ();
}

int main (int argc, char **argv)
{
  int trials = (argc > 1) ? atoi (argv[1]) : 7;

  Scenario scns[] = {
    //  name                  n      extent  smin  smax
    { "sparse (low overlap)",  20000, 2000000, 200,  400 },
    { "medium overlap",        20000,  200000, 300,  600 },
    { "dense (high overlap)",  20000,   40000, 400,  800 },
    { "very dense",            12000,   12000, 500, 1000 },
  };

  printf ("%-24s %8s %12s %11s %11s\n", "scenario", "boxes", "pairs", "best ms", "median ms");
  printf ("%s\n", "------------------------------------------------------------------------");

  for (const Scenario &sc : scns) {
    std::vector<db::Box> bb = make_boxes (sc.n, sc.extent, sc.smin, sc.smax, 0x9e3779b97f4a7c15ULL);
    std::vector<double> ts;
    uint64_t pairs = 0, p0 = 0;
    for (int t = 0; t < trials; ++t) {
      ts.push_back (run_once (bb, pairs));
      if (t == 0) { p0 = pairs; }
      if (pairs != p0) { fprintf (stderr, "NONDETERMINISTIC pair count!\n"); return 1; }
    }
    std::sort (ts.begin (), ts.end ());
    double best = ts.front ();
    double median = ts[ts.size () / 2];
    printf ("%-24s %8d %12llu %11.2f %11.2f\n",
            sc.name, sc.n, (unsigned long long) pairs, best, median);
  }
  return 0;
}

@klayoutmatthias

klayoutmatthias commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Hi @nikosavola,

thanks for the patch. The reason I used the pairs was - as far as I recall - memory concerns. When you use a map of sets you allocate two pointers for a "lonely" pair, while you allocate a pointer plus memory for a set in the other case. With hash maps there always is a bigger overhead due to capacity pre-allocation, so I wonder what the memory effects are.

Do you have some numbers from your benchmarks?

Another thing about unordered sets and maps - specifically pointers - is reproducibility of the order, but I don't think that counts here.

Thanks,

Matthias

@nikosavola

Copy link
Copy Markdown
Contributor Author

Hi @nikosavola,

thanks for the patch. The reason I used the pairs was - as far as I recall - memory concerns. When you use a map of sets you allocate two pointers for a "lonely" pair, while you allocate a pointer plus memory for a set in the other case. With hash maps there always is a bigger overhead due to capacity pre-allocation, so I wonder what the memory effects are.

Do you have some numbers from your benchmarks?

Another thing about unordered sets and maps - specifically pointers - is reproducibility of the order, but I don't think that counts here.

Thanks,

Matthias

You're right, the lonely pairs will be more expensive. Here's a table of the peak seen memory usage for various workloads, with the set and new map<set> implementations:

workload peak live pairs old set new map<set> new/old
sparse (ov~1) 273 13 KiB 40 KiB 3.13×
medium (ov~8) 5.3 k 251 KiB 311 KiB 1.24×
dense (ov~25) 28 k 1.29 MiB 0.96 MiB 0.74×
high (ov~60) 102 k 4.66 MiB 3.01 MiB 0.65×
very-high (ov~120) 203 k 9.28 MiB 5.68 MiB 0.61×
lonely-pairs* 200 k 9.16 MiB 39.3 MiB 4.29×

The high-overlap cases this PR should accelerate use ~40% less memory, while the worst case lonely pairs end up with 4.29× more memory usage. I think the memory overhead for the typical low overlap cases is also minimal in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants