Skip to content

Modal math an UI fixes#1065

Draft
padenot wants to merge 8 commits into
mozilla:mainfrom
padenot:modal-math-an-ui-fixes
Draft

Modal math an UI fixes#1065
padenot wants to merge 8 commits into
mozilla:mainfrom
padenot:modal-math-an-ui-fixes

Conversation

@padenot

@padenot padenot commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

padenot added 8 commits June 22, 2026 16:54
Fit a 1-D Gaussian mixture to the raw samples and pick the component count
by BIC, instead of carving the KDE at valley floors. Each mode carries a
weight (its probability mass), a diffuse slow path becomes one wide
component, and boundaries sit at the Bayes crossing between components.
Deterministic dual init (equal-count chunks + k-means) and a
resolution-aware variance floor keep it stable on small and quantised perf
samples; cross-checked against scikit-learn's GaussianMixture.

Also gate the existing fitModesFromKde on integrated mass (sample count)
rather than valley depth alone, and add an exploratory adaptiveKde
(Abramson sample-point) estimator, currently unused.
Run fitGmmModes on the raw runs rather than valley-carving the KDE, and
overlay the fitted mixture density (dashed) over the KDE so each mode lines
up with a visible bump — including diffuse slow components the KDE renders
as a flat tail. Replace the valley-depth slider with a 'Mode sensitivity'
control mapped to the BIC penalty, and rename 'Show modes' to 'Modal
analysis'; when it is off the chart is just the KDE, with no overlay or
slider. Tests and the ResultsView snapshot updated accordingly.
BCa's acceleration uses a leave-one-out jackknife, which is undefined for a
single observation (leaving it out gives an empty sample). A subtest with one
run per side therefore produced a [NaN, NaN] median-difference interval.
Return null below two runs per group and omit the interval in the
Mann-Whitney blurb. Adds a regression test.
Add docs/mode-detection.md: how to read the graph (modes, the solid KDE vs
dashed mixture curves, the sensitivity slider) and the maths behind it
(GMM/EM/BIC, the BCa median-diff CI), plus a section on reconciling noisy
benchmarks with the precise statistics.
…lays

gaussianPracticalSupport now enforces a 3σ floor — the atol-based formula
finds where the kernel *value* drops below tolerance, but for wide
bandwidths the low peak height means that happens at only ~1-2σ, truncating
over 20% of the probability mass and producing convolution ringing that
looked like aliasing on the chart. initKmeans no longer seeds cluster
variance with varFloor before any points are assigned, and computes weights
from actual counts instead of counting empty clusters as one member.

CommonGraph's mode overlays are reworked for readability and accessibility:
- Each mode is one unlabeled horizontal span plus a vertical tick carrying a
  single combined label (series, letter, value, fraction) above its peak,
  replacing the old span-only label that clipped against the right axis.
- Labels anchor to the leftmost of a matched Base/New peak pair with a small
  gap, flipping to the right of their own tick instead of crossing the left
  axis.
- Shift arrows between matched Base/New peaks are suppressed when the shift
  is smaller than the KDE bandwidth, since that's within smoothing noise.
- The mode-letter palette (A-E) is darkened so label text clears WCAG AA
  4.5:1 contrast against its background; New's label is a further-darkened
  variant of Base's so the two are distinguishable by color, not just
  font-weight. Label opacity is pinned to 1 so it doesn't inherit the guide
  line's reduced opacity.
- The chart is taller (340px -> 440px) with the scatter strip and legend
  repositioned to match, and the KDE density axis is rounded to 2
  significant figures instead of showing raw floating-point noise.

Horizontal spans are now clamped to each series' actual min/max sample
value instead of the padded KDE grid extent. The debug JSON dump of mode
peaks/boundaries is removed. Tests and the ResultsView snapshot (chart
height) updated accordingly.
…aling

Mean/Median/StdDev/Min/Max were rendered with raw .toFixed(2) and no unit,
so a subtest measured in seconds or bytes showed a bare number with no way
to tell what it meant. Route them through getDisplayScale (already used for
the CommonGraph axes) so the table picks one consistent scale from the
values present and shows it once in the "Metric" column header instead of
repeating it, or omitting it, per cell.
Two related display bugs in utils/format.ts:

- getDisplayScale switched ms to seconds at >= 1000ms, so 6300ms rendered
  as "6.3s" even though the millisecond form is more readable at that
  scale. Raise the threshold to 10000ms (5 digits) before switching units.

- formatNumber wrapped Intl.NumberFormat with no fixed fraction digits, so
  it trimmed trailing zeros independently per value: comparing 586.27ms
  against 587.00ms rendered as "586.27 ms < 587 ms", which reads as if the
  two values have different precision when they don't. Fix
  minimumFractionDigits/maximumFractionDigits to 2 so paired values always
  render with the same number of decimals.

Updates the ResultsTable/SubtestsResultsView hardcoded row expectations and
snapshots across four suites to the corrected, consistent decimal output.
Switch *word* to _word_ for consistency with markdownlint's default
emphasis style. No content changes.
@netlify

netlify Bot commented Jul 3, 2026

Copy link
Copy Markdown

Deploy Preview for mozilla-perfcompare ready!

Name Link
🔨 Latest commit 5f2a4c2
🔍 Latest deploy log https://app.netlify.com/projects/mozilla-perfcompare/deploys/6a47d8db89d1f10008a7cd44
😎 Deploy Preview https://deploy-preview-1065--mozilla-perfcompare.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant