@@ -14,13 +14,15 @@ Instead, this script post-processes the line-oriented diff, finds pairs
1414of lines, and highlights the differing segments. It's currently very
1515simple and stupid about doing these tasks. In particular:
1616
17- 1. It will only highlight a pair of lines if they are the only two
18- lines in a hunk. It could instead try to match up "before" and
19- "after" lines for a given hunk into pairs of similar lines.
20- However, this may end up visually distracting, as the paired
21- lines would have other highlighted lines in between them. And in
22- practice, the lines which most need attention called to their
23- small, hard-to-see changes are touching only a single line.
17+ 1. It will only highlight hunks in which the number of removed and
18+ added lines is the same, and it will pair lines within the hunk by
19+ position (so the first removed line is compared to the first added
20+ line, and so forth). This is simple and tends to work well in
21+ practice. More complex changes don't highlight well, so we tend to
22+ exclude them due to the "same number of removed and added lines"
23+ restriction. Or even if we do try to highlight them, they end up
24+ not highlighting because of our "don't highlight if the whole line
25+ would be highlighted" rule.
2426
2527 2. It will find the common prefix and suffix of two lines, and
2628 consider everything in the middle to be "different". It could
@@ -55,3 +57,96 @@ following in your git configuration:
5557 show = diff-highlight | less
5658 diff = diff-highlight | less
5759---------------------------------------------
60+
61+ Bugs
62+ ----
63+
64+ Because diff-highlight relies on heuristics to guess which parts of
65+ changes are important, there are some cases where the highlighting is
66+ more distracting than useful. Fortunately, these cases are rare in
67+ practice, and when they do occur, the worst case is simply a little
68+ extra highlighting. This section documents some cases known to be
69+ sub-optimal, in case somebody feels like working on improving the
70+ heuristics.
71+
72+ 1. Two changes on the same line get highlighted in a blob. For example,
73+ highlighting:
74+
75+ ----------------------------------------------
76+ -foo(buf, size);
77+ +foo(obj->buf, obj->size);
78+ ----------------------------------------------
79+
80+ yields (where the inside of "+{}" would be highlighted):
81+
82+ ----------------------------------------------
83+ -foo(buf, size);
84+ +foo(+{obj->buf, obj->}size);
85+ ----------------------------------------------
86+
87+ whereas a more semantically meaningful output would be:
88+
89+ ----------------------------------------------
90+ -foo(buf, size);
91+ +foo(+{obj->}buf, +{obj->}size);
92+ ----------------------------------------------
93+
94+ Note that doing this right would probably involve a set of
95+ content-specific boundary patterns, similar to word-diff. Otherwise
96+ you get junk like:
97+
98+ -----------------------------------------------------
99+ -this line has some -{i}nt-{ere}sti-{ng} text on it
100+ +this line has some +{fa}nt+{a}sti+{c} text on it
101+ -----------------------------------------------------
102+
103+ which is less readable than the current output.
104+
105+ 2. The multi-line matching assumes that lines in the pre- and post-image
106+ match by position. This is often the case, but can be fooled when a
107+ line is removed from the top and a new one added at the bottom (or
108+ vice versa). Unless the lines in the middle are also changed, diffs
109+ will show this as two hunks, and it will not get highlighted at all
110+ (which is good). But if the lines in the middle are changed, the
111+ highlighting can be misleading. Here's a pathological case:
112+
113+ -----------------------------------------------------
114+ -one
115+ -two
116+ -three
117+ -four
118+ +two 2
119+ +three 3
120+ +four 4
121+ +five 5
122+ -----------------------------------------------------
123+
124+ which gets highlighted as:
125+
126+ -----------------------------------------------------
127+ -one
128+ -t-{wo}
129+ -three
130+ -f-{our}
131+ +two 2
132+ +t+{hree 3}
133+ +four 4
134+ +f+{ive 5}
135+ -----------------------------------------------------
136+
137+ because it matches "two" to "three 3", and so forth. It would be
138+ nicer as:
139+
140+ -----------------------------------------------------
141+ -one
142+ -two
143+ -three
144+ -four
145+ +two +{2}
146+ +three +{3}
147+ +four +{4}
148+ +five 5
149+ -----------------------------------------------------
150+
151+ which would probably involve pre-matching the lines into pairs
152+ according to some heuristic.
0 commit comments