Skip to content

fix: avoid splitting surrogate pairs when truncating wide characters#360

Open
greymoth-jp wants to merge 1 commit into
cli-table:masterfrom
greymoth-jp:fix-surrogate-truncate
Open

fix: avoid splitting surrogate pairs when truncating wide characters#360
greymoth-jp wants to merge 1 commit into
cli-table:masterfrom
greymoth-jp:fix-surrogate-truncate

Conversation

@greymoth-jp

Copy link
Copy Markdown

truncateWidth takes a fast path when str.length === strlen(str):

if (str.length === strlen(str)) {
  return str.substr(0, desiredLength);
}

The idea is "every character is one code unit and one column, so cutting by code unit is safe." That holds for ASCII, but it's also true for surrogate-pair characters such as CJK Extension B (e.g. 𠮷, used in the surname 𠮷田) or emoji: they are two code units and two columns, so length and strlen stay equal. substr (and the slice(0, -1) loop underneath) cut by code unit, so truncating on the boundary of such a character leaves a lone surrogate:

const { truncate } = require('cli-table3/src/utils');
truncate('a𠮷bc', 3); // => "a\uD842…"  — lone high surrogate, renders as the replacement char

In a table this shows up as a in the cell whenever a wide character lands on the truncation point.

The change keeps the fast path for plain strings, skips it when a surrogate is present, and trims by code point in the slow path so a wide character is never split. BMP input (including the existing full-width CJK cases) is unaffected. Added tests for a CJK Extension B character and an emoji.

truncateWidth() takes a substr/slice fast path when str.length === strlen(str),
which is also true for surrogate-pair characters such as CJK Extension B or
emoji (2 code units, 2 columns). Cutting by code unit on the truncation
boundary can leave a lone surrogate. Exclude surrogate pairs from the fast
path and trim by code point so a wide character is never split.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant