Modal math an UI fixes#1065
Draft
padenot wants to merge 8 commits into
Draft
Conversation
Fit a 1-D Gaussian mixture to the raw samples and pick the component count by BIC, instead of carving the KDE at valley floors. Each mode carries a weight (its probability mass), a diffuse slow path becomes one wide component, and boundaries sit at the Bayes crossing between components. Deterministic dual init (equal-count chunks + k-means) and a resolution-aware variance floor keep it stable on small and quantised perf samples; cross-checked against scikit-learn's GaussianMixture. Also gate the existing fitModesFromKde on integrated mass (sample count) rather than valley depth alone, and add an exploratory adaptiveKde (Abramson sample-point) estimator, currently unused.
Run fitGmmModes on the raw runs rather than valley-carving the KDE, and overlay the fitted mixture density (dashed) over the KDE so each mode lines up with a visible bump — including diffuse slow components the KDE renders as a flat tail. Replace the valley-depth slider with a 'Mode sensitivity' control mapped to the BIC penalty, and rename 'Show modes' to 'Modal analysis'; when it is off the chart is just the KDE, with no overlay or slider. Tests and the ResultsView snapshot updated accordingly.
BCa's acceleration uses a leave-one-out jackknife, which is undefined for a single observation (leaving it out gives an empty sample). A subtest with one run per side therefore produced a [NaN, NaN] median-difference interval. Return null below two runs per group and omit the interval in the Mann-Whitney blurb. Adds a regression test.
Add docs/mode-detection.md: how to read the graph (modes, the solid KDE vs dashed mixture curves, the sensitivity slider) and the maths behind it (GMM/EM/BIC, the BCa median-diff CI), plus a section on reconciling noisy benchmarks with the precise statistics.
…lays gaussianPracticalSupport now enforces a 3σ floor — the atol-based formula finds where the kernel *value* drops below tolerance, but for wide bandwidths the low peak height means that happens at only ~1-2σ, truncating over 20% of the probability mass and producing convolution ringing that looked like aliasing on the chart. initKmeans no longer seeds cluster variance with varFloor before any points are assigned, and computes weights from actual counts instead of counting empty clusters as one member. CommonGraph's mode overlays are reworked for readability and accessibility: - Each mode is one unlabeled horizontal span plus a vertical tick carrying a single combined label (series, letter, value, fraction) above its peak, replacing the old span-only label that clipped against the right axis. - Labels anchor to the leftmost of a matched Base/New peak pair with a small gap, flipping to the right of their own tick instead of crossing the left axis. - Shift arrows between matched Base/New peaks are suppressed when the shift is smaller than the KDE bandwidth, since that's within smoothing noise. - The mode-letter palette (A-E) is darkened so label text clears WCAG AA 4.5:1 contrast against its background; New's label is a further-darkened variant of Base's so the two are distinguishable by color, not just font-weight. Label opacity is pinned to 1 so it doesn't inherit the guide line's reduced opacity. - The chart is taller (340px -> 440px) with the scatter strip and legend repositioned to match, and the KDE density axis is rounded to 2 significant figures instead of showing raw floating-point noise. Horizontal spans are now clamped to each series' actual min/max sample value instead of the padded KDE grid extent. The debug JSON dump of mode peaks/boundaries is removed. Tests and the ResultsView snapshot (chart height) updated accordingly.
…aling Mean/Median/StdDev/Min/Max were rendered with raw .toFixed(2) and no unit, so a subtest measured in seconds or bytes showed a bare number with no way to tell what it meant. Route them through getDisplayScale (already used for the CommonGraph axes) so the table picks one consistent scale from the values present and shows it once in the "Metric" column header instead of repeating it, or omitting it, per cell.
Two related display bugs in utils/format.ts: - getDisplayScale switched ms to seconds at >= 1000ms, so 6300ms rendered as "6.3s" even though the millisecond form is more readable at that scale. Raise the threshold to 10000ms (5 digits) before switching units. - formatNumber wrapped Intl.NumberFormat with no fixed fraction digits, so it trimmed trailing zeros independently per value: comparing 586.27ms against 587.00ms rendered as "586.27 ms < 587 ms", which reads as if the two values have different precision when they don't. Fix minimumFractionDigits/maximumFractionDigits to 2 so paired values always render with the same number of decimals. Updates the ResultsTable/SubtestsResultsView hardcoded row expectations and snapshots across four suites to the corrected, consistent decimal output.
Switch *word* to _word_ for consistency with markdownlint's default emphasis style. No content changes.
✅ Deploy Preview for mozilla-perfcompare ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.