Skip to content

Leverage execution policies for LVS equivalence#2366

Open
nikosavola wants to merge 1 commit into
KLayout:parallelizefrom
nikosavola:nikosavola/push-lszwqumryknw
Open

Leverage execution policies for LVS equivalence#2366
nikosavola wants to merge 1 commit into
KLayout:parallelizefrom
nikosavola:nikosavola/push-lszwqumryknw

Conversation

@nikosavola

@nikosavola nikosavola commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Similar to #2364 but for LVS: annotates the std::sort calls in the netlist comparer (dbNetlistCompare.cc, dbNetlistCompareCore.cc) with std::execution::par so the large per-circuit sorts can run in parallel.

Benchmark

A flat, highly-symmetric resistor mesh (G=500 → 250k nets, 499k resistors) is built via the Ruby API and compared against an identical copy (self-compare, all runs matched=true). The timed region is compare() only. Baseline is a separately built serial library (= master). Using g++ 14.2, oneTBB 2022.1.

Build (compare(), vs serial) speedup
serial (master) 1.00×
par — 1 core 1.66×
par — 2 cores 1.97×
par — 4 cores 2.19×
par — 8 cores 2.23×
benchmark_plot

Pinned to one core (taskset -c 0), par is still 1.66× over serial but I don't know why. Would need to check TBB's sorting implementation.

Reproduction

Build bin-release with the C++17 + TBB flags and the corrected guard, then uv run benchmark.py. Both scripts are below.

test_lvs.rb — netlist generator + self-compare (timed)
# Builds a large, flat, highly-symmetric 2-D resistor mesh as a single circuit
# and compares it against an identical copy (self-compare). The regular mesh
# maximises the number of nets per circuit (-> large per-circuit node-set sorts)
# and drives the symmetry / ambiguity paths -- the std::sort call sites this PR
# parallelises.
g = (ENV["MESH"] || "500").to_i

def build_netlist(g)
  nl = RBA::Netlist::new
  rcls = RBA::DeviceClassResistor::new
  rcls.name = "RES"
  nl.add(rcls)
  circuit = RBA::Circuit::new
  circuit.name = "MESH"
  nl.add(circuit)

  nets = []
  g.times do |y|
    row = []
    g.times { |x| row << circuit.create_net("n_#{x}_#{y}") }
    nets << row
  end

  idx = 0
  g.times do |y|
    g.times do |x|
      if x + 1 < g
        d = circuit.create_device(rcls, "RH_#{x}_#{y}")
        d.set_parameter("R", 100.0 + ((x + y) % 16))
        d.connect_terminal("A", nets[y][x]); d.connect_terminal("B", nets[y][x + 1])
        idx += 1
      end
      if y + 1 < g
        d = circuit.create_device(rcls, "RV_#{x}_#{y}")
        d.set_parameter("R", 100.0 + ((x + y) % 16))
        d.connect_terminal("A", nets[y][x]); d.connect_terminal("B", nets[y + 1][x])
        idx += 1
      end
    end
  end
  [nl, circuit, idx]
end

t0 = Time.now
nl1, _c1, ndev = build_netlist(g)
nl2, _c2, _    = build_netlist(g)
t1 = Time.now
$stderr.puts "mesh G=#{g}  nets/circuit=#{g*g}  resistors/circuit=#{ndev}  build=#{'%.3f' % (t1 - t0)}s"

comp = RBA::NetlistComparer::new
tc0 = Time.now
ok = comp.compare(nl1, nl2)
tc1 = Time.now
$stderr.puts "compare matched=#{ok}  compare_time=#{'%.3f' % (tc1 - tc0)}s"
benchmark.py — taskset thread sweep (isolated compare() + end-to-end hyperfine)
# /// script
# requires-python = ">=3.11"
# dependencies = ["matplotlib", "pandas", "tabulate"]
# ///
import json, os, re, statistics, subprocess
import matplotlib; matplotlib.use("Agg")
import matplotlib.pyplot as plt
import pandas as pd

REPO = "."                       # run from the klayout checkout root
SCRIPT = "test_lvs.rb"
THREADS = [1, 2, 4, 6, 8]
MESH = os.environ.get("MESH", "500")
RUNS = 6

def cmd(t):
    return (f"taskset -c 0-{t-1} env MESH={MESH} LD_LIBRARY_PATH={REPO}/bin-release "
            f"{REPO}/bin-release/klayout -b -r {SCRIPT}")

samples = {t: [] for t in THREADS}
for _ in range(RUNS):
    for t in THREADS:
        out = subprocess.run(cmd(t), shell=True, capture_output=True, text=True)
        m = re.search(r"compare_time=([0-9.]+)s", out.stderr + out.stdout)
        if m: samples[t].append(float(m.group(1)))
base = statistics.median(samples[THREADS[0]])
iso = pd.DataFrame([{"Threads": t, "Median (s)": round(statistics.median(samples[t]), 3),
                     "Std (s)": round(statistics.pstdev(samples[t]), 3),
                     "Speedup": round(base / statistics.median(samples[t]), 2)} for t in THREADS])
print(iso.to_markdown(index=False))


plt.plot(iso["Threads"], iso["Speedup"], marker="o", label="Isolated compare()")
plt.plot(THREADS, THREADS, ":", color="gray", label="Ideal")
plt.xlabel("Threads"); plt.ylabel("Speedup"); plt.legend(); plt.grid(True)
plt.savefig("benchmark_plot.png", dpi=120, bbox_inches="tight")

@nikosavola nikosavola changed the base branch from master to parallelize June 22, 2026 14:00
@nikosavola nikosavola force-pushed the nikosavola/push-lszwqumryknw branch from a66e434 to f6a36a4 Compare June 23, 2026 06:06
@nikosavola nikosavola marked this pull request as ready for review June 23, 2026 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant