Skip to content

Collapse nested character markup to single level (BL-16387)#8019

Open
StephenMcConnel wants to merge 1 commit into
masterfrom
BL-16387-NestedCharMarkup
Open

Collapse nested character markup to single level (BL-16387)#8019
StephenMcConnel wants to merge 1 commit into
masterfrom
BL-16387-NestedCharMarkup

Conversation

@StephenMcConnel

@StephenMcConnel StephenMcConnel commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

For example, ... goes to ....

Devin review


This change is Reviewable

@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR extends UpdateCharacterStyleMarkup to collapse same-tag nested character markup (e.g. <em><em>…</em></em><em>…</em>) that can appear in the wild due to pasting. A second pass of the existing <b><strong> / <i><em> substitution handles nested legacy tags, and a new deduplication regex then flattens any remaining double-wrapped semantic tags; the empty-markup cleanup also gains IgnoreCase for correctness.

  • Book.cs: Adds conditional second-pass <b>/<i> conversions (checked via IndexOf before running) and a new back-reference deduplication regex that strips one level of same-tag nesting across <strong>, <em>, <sup>, and <u>.
  • BookTests.cs: New test covers <b><b>, <i><i>, <sup><sup>, and <u><u> nesting, including partial-text cases; inner style attributes exercise the existing attribute-stripping path.

Important Files Changed

Filename Overview
src/BloomExe/Book/Book.cs Adds two-pass <b>/<i> re-conversion to catch nested legacy tags, plus a deduplication regex that collapses double-wrapped semantic tags; also adds IgnoreCase to the empty-markup cleanup regex.
src/BloomTests/Book/BookTests.cs Adds UpdateCharacterStyleMarkup_ReducesNestingOfSameTags covering all four tag types; inner style attributes exercise the existing attribute-stripping path.

Reviews (4): Last reviewed commit: "Collapse nested character markup to sing..." | Re-trigger Greptile

Comment thread src/BloomTests/Book/BookTests.cs
Comment thread src/BloomExe/Book/Book.cs
@StephenMcConnel StephenMcConnel force-pushed the BL-16387-NestedCharMarkup branch 2 times, most recently from b857247 to 6c5d409 Compare June 30, 2026 16:29

@JohnThomson JohnThomson left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:, but a couple of ideas to consider. If you don't think it's worth it you can merge as-is.

@JohnThomson reviewed 2 files and all commit messages, and made 3 comments.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on StephenMcConnel).


src/BloomExe/Book/Book.cs line 2631 at r2 (raw file):

        /// <summary>
        /// Convert old &lt;b&gt; and &lt;i&gt; to &lt;strong&gt; and &lt;em&gt; respectively.
        /// Also remove instances like &lt;/b&gt;&lt;b&gt; altogether since such markup is redundant.

Comment is rather specific, so needs update.


src/BloomExe/Book/Book.cs line 2695 at r2 (raw file):

                inner = Regex.Replace(
                    inner,
                    @"<(strong|em|sup|u)>(<\1>.*?</\1>)</\1>",

This also only handles things that are doubled right at the boundary. Maybe that's enough, but I'm wondering if we should handle the case that there is text inside the outer element but outside the inner one.

For example, <em><em>...</em></em> goes to <em>...</em>.
@StephenMcConnel StephenMcConnel force-pushed the BL-16387-NestedCharMarkup branch from 6c5d409 to a3f63e4 Compare July 1, 2026 22:45

@StephenMcConnel StephenMcConnel left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephenMcConnel made 2 comments.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on JohnThomson).


src/BloomExe/Book/Book.cs line 2631 at r2 (raw file):

Previously, JohnThomson (John Thomson) wrote…

Comment is rather specific, so needs update.

Done. I added to the comment.


src/BloomExe/Book/Book.cs line 2695 at r2 (raw file):

Previously, JohnThomson (John Thomson) wrote…

This also only handles things that are doubled right at the boundary. Maybe that's enough, but I'm wondering if we should handle the case that there is text inside the outer element but outside the inner one.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants