+ "details": "`isRepeatedSingleCharRun()` in `src/analysis.ts` (line 285) re-scans the entire accumulated segment on every merge iteration during text analysis, producing O(n²) total work for input consisting of repeated identical punctuation characters. An attacker who controls text passed to `prepare()` can block the main thread for ~20 seconds with 80KB of input (e.g., `\"(\".repeat(80_000)`).\n\nTested against commit 9364741d3562fcc65aacc50953e867a5cb9fdb23 (v0.0.4) on Node.js v24.12.0, Windows x64.\n\nA standalone PoC and detailed write-up are attached below.\n\n---\n\n## Root Cause\n\nThe `buildMergedSegmentation()` function (line 795) processes text segments produced by `Intl.Segmenter`. When consecutive non-word-like segments consist of the same single character (e.g., `(`, `[`, `!`, `#`), the code merges them into one growing segment (line 859):\n\n```typescript\n// analysis.ts:849-859 - the merge branch inside the build loop\n} else if (\n isText &&\n !piece.isWordLike &&\n mergedLen > 0 &&\n mergedKinds[mergedLen - 1] === 'text' &&\n piece.text.length === 1 &&\n piece.text !== '-' &&\n piece.text !== '—' &&\n isRepeatedSingleCharRun(mergedTexts[mergedLen - 1]!, piece.text) // <- O(n) per call\n) {\n mergedTexts[mergedLen - 1] += piece.text // append to accumulator\n```\n\nBefore each merge, it calls `isRepeatedSingleCharRun()` (line 857) to verify that ALL characters in the accumulated segment match the new character:\n\n```typescript\n// analysis.ts:285-291\nfunction isRepeatedSingleCharRun(segment: string, ch: string): boolean {\n if (segment.length === 0) return false\n for (const part of segment) { // <- Iterates ENTIRE accumulated string\n if (part !== ch) return false\n }\n return true\n}\n```\n\n`Intl.Segmenter` with `granularity: 'word'` produces individual non-word segments for each punctuation character. For a string of N identical punctuation characters, the merge check is called N times. On the k-th call, the accumulated segment is k characters long, so `isRepeatedSingleCharRun` performs k comparisons.\n\nTotal work: `1 + 2 + 3 + ... + N = N(N+1)/2 = O(n^2)`\n\n### Call chain\n\n```\nprepare(text, font) // layout.ts:472\n -> prepareInternal(text, font, ...) // layout.ts:424\n -> analyzeText(text, profile, whiteSpace='normal') // layout.ts:430 -> analysis.ts:993\n -> buildMergedSegmentation(normalized, profile, ...) // analysis.ts:1013 -> analysis.ts:795\n -> for each Intl.Segmenter segment:\n -> isRepeatedSingleCharRun(accumulated, newChar) // line 857 -> line 285\n -> iterates entire accumulated string // O(k) per call, k growing\n```\n\n## Proof of Concept\n\nThe simplest payload is a string of repeated `(` characters:\n\n```typescript\nimport { prepare } from '@chenglou/pretext'\n\n// 80,000 characters -> ~20 seconds of main-thread blocking\nconst payload = '('.repeat(80_000)\nprepare(payload, '16px Arial') // Blocks for ~20 seconds\n```\n\nAny single character that meets these criteria works:\n1. Classified as `'text'` by `classifySegmentBreakChar` (analysis.ts:321) - i.e., not a space, NBSP, ZWSP, soft-hyphen, tab, or newline\n2. Produced as individual non-word segments by `Intl.Segmenter` (word granularity)\n3. Not `-` or em-dash (explicitly excluded at lines 855-856)\n\nWorking payload characters include: `(`, `[`, `{`, `#`, `@`, `!`, `%`, `^`, `~`, `<`, `>`, etc.\n\n---\n\n## Impact\n\n- **Chat/messaging applications:** User sends an 80KB message of `(` characters;\n the receiving client's UI thread freezes for ~20 seconds while rendering.\n- **Comment/form systems:** User-supplied text in any text field that uses\n `pretext` for layout measurement blocks the main thread.\n- **Server-side rendering:** If `prepare()` is called server-side (Node.js/Bun),\n a single request can consume 20+ seconds of CPU time per 80KB of payload.\n\nThe attack requires no authentication, special characters, or encoding tricks -\njust repeated ASCII punctuation. 80KB is well within typical text input limits.\n\nAs an application-level mitigation, callers can cap the length of text passed to\n`prepare()` before a library-level fix is available.\n\n## Suggested Fix\n\nReplace the O(n) full-scan verification with O(1) constant-time checks. \nSince the merge only ever appends the same character to an existing repeated-char run, the invariant is maintained structurally:\n\n**Option A - Check only endpoints (O(1)):**\n```typescript\nfunction isRepeatedSingleCharRun(segment: string, ch: string): boolean {\n return segment.length > 0 && segment[0] === ch && segment[segment.length - 1] === ch\n}\n```\nThis works for the current code because this branch only fires after earlier merge branches (CJK, Myanmar, Arabic) have been skipped, and those branches produce segments that would not start and end with the same ASCII punctuation character. However, the safety relies on an emergent property of the branch ordering and the other merge branches. Future refactors that add new merge branches or reorder the existing ones could silently break the invariant.\n\n**Option B - Track with metadata**\nAdd a boolean `lastMergeWasSingleCharRun` alongside the accumulator arrays. Set it to `true` when a single-char merge succeeds, `false` when any other merge branch is taken. Check the flag instead of re-scanning the string.",
0 commit comments