Skip to content

Commit 5ac7af4

Browse files
Update docs
1 parent bc0fb83 commit 5ac7af4

2 files changed

Lines changed: 94 additions & 5 deletions

File tree

src/ImageSharp.Drawing.WebGPU/WEBGPU_BACKEND_PROCESS.md

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,30 @@ DrawingCanvasBatcher.Flush()
6767
-> on any GPU failure: scene-scoped fallback (DefaultDrawingBackend)
6868
```
6969

70+
## Stroke Processing
71+
72+
For stroke definitions (`CompositionCoverageDefinition.IsStroke`), the backend
73+
performs stroke expansion on the GPU using `StrokeExpandComputeShader`:
74+
75+
1. **Dash splitting** (CPU): If the definition has a dash pattern, `DashPathSplitter.SplitDashes()`
76+
(shared with `DefaultDrawingBackend` in the core project) segments the centerline into
77+
open dash sub-paths before edge building.
78+
79+
2. **Centerline edge building** (CPU): `path.Flatten()` produces contour vertices.
80+
Centerline edges are built as `GpuEdge` structs with `StrokeEdgeFlags` indicating
81+
the edge type (`None` for side edges, `Join`, `CapStart`, `CapEnd`). Join edges
82+
carry adjacent vertex coordinates in `AdjX`/`AdjY`. Centerline edges are band-sorted
83+
with Y expansion of `halfWidth * max(miterLimit, 1)`.
84+
85+
3. **GPU stroke expansion**: One `StrokeExpandCommand` per band dispatches the compute
86+
shader. Each thread expands one centerline edge into outline edges written to
87+
per-band output slots via atomic counters. Output buffer size is computed by
88+
`ComputeOutlineEdgesPerCenterline()` which accounts for join/cap type and arc
89+
step count for round joins/caps.
90+
91+
4. **Rasterization**: The generated outline edges are band-sorted and rasterized
92+
by the composite shader's fill path (same fixed-point scanline rasterizer).
93+
7094
## GPU Buffer Layout
7195

7296
### Edge Buffer (`coverage-aggregated-edges`)
@@ -77,10 +101,8 @@ Each edge is a 32-byte `GpuEdge` struct (sequential layout):
77101
|---|---|---|
78102
| X0, Y0 | i32 | Start point in 24.8 fixed-point |
79103
| X1, Y1 | i32 | End point in 24.8 fixed-point |
80-
| MinRow | i32 | First pixel row touched (clamped to interest) |
81-
| MaxRow | i32 | Last pixel row touched (clamped to interest) |
82-
| CsrBandOffset | u32 | Start index into CSR offsets for this definition |
83-
| DefinitionEdgeStart | u32 | Edge index offset for this definition in merged buffer |
104+
| Flags | StrokeEdgeFlags | Stroke edge type (None/Join/CapStart/CapEnd) |
105+
| AdjX, AdjY | i32 | Auxiliary coords (join adjacent vertex) |
84106

85107
### CSR Buffers
86108

@@ -149,4 +171,10 @@ Coverage rasterization and compositing are fused into a single compute dispatch.
149171

150172
Edge preparation (path flattening, fixed-point conversion, CSR construction) runs on the CPU. The `path.Flatten()` cost is shared with the CPU rasterizer pipeline. CSR construction is three passes over the edge set: count, prefix sum, scatter.
151173

152-
For the benchmark workload (7200x4800 US states GeoJSON polygon, 2px stroke, ~262K edges), NativeSurface performance is at parity with the CPU rasterizer (~28ms).
174+
Both the CPU and GPU backends use per-band parallel stroke expansion — the CPU
175+
via `DefaultRasterizer.RasterizeStrokeRows` and the GPU via
176+
`StrokeExpandComputeShader`. Both share the same `StrokeEdgeFlags` enum and
177+
`DashPathSplitter` (in the core project). The CPU backend fuses stroke expansion
178+
directly into the rasterizer's band loop, while the GPU backend uses a separate
179+
compute dispatch that writes outline edges into pre-allocated per-band output
180+
slots sized by `ComputeOutlineEdgesPerCenterline()`.

src/ImageSharp.Drawing/Processing/Backends/PolygonScanning.MD

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ https://github.com/aurimasg/blaze (MIT-Licensed)
1717

1818
## High-Level Pipeline
1919

20+
### Fill Path (`RasterizeRows`)
21+
2022
```
2123
IPath
2224
|
@@ -43,6 +45,65 @@ Choose execution mode:
4345
+--> Invoke rasterizer callback per dirty row
4446
```
4547

48+
### Stroke Path (`RasterizeStrokeRows`)
49+
50+
Stroke rasterization fuses stroke expansion with coverage rasterization so that
51+
each parallel band only expands the centerline edges that overlap it. This avoids
52+
the cost of a serial full-path `GenerateOutline()` call and eliminates the
53+
intermediate `IPath` allocation for the expanded outline.
54+
55+
For dashed strokes, `DashPathSplitter` splits the centerline into dash segments
56+
on the CPU before passing the result through the same per-band stroke expansion
57+
pipeline.
58+
59+
```
60+
IPath (centerline)
61+
|
62+
+--> [if dashed] DashPathSplitter.SplitDashes(path, strokeWidth, pattern)
63+
|
64+
v
65+
path.Flatten() -> List<ISimplePath> (preserving open/closed state)
66+
|
67+
v
68+
BuildStrokeEdgeTable(contours) -> StrokeEdgeData[]
69+
| For each contour:
70+
| - Closed: N side edges + N join descriptors = 2N descriptors
71+
| - Open: (N-1) side edges + (N-2) joins + 2 caps = 2N-1 descriptors
72+
| Each descriptor carries StrokeEdgeFlags (None/Join/CapStart/CapEnd)
73+
|
74+
v
75+
TryBuildBandSortedStrokeEdges(edges, expansion)
76+
| Band-sort with Y expansion = halfWidth * max(miterLimit, 1)
77+
| to ensure join/cap geometry reaches all overlapping bands
78+
|
79+
v
80+
Choose execution mode:
81+
|
82+
+--> Parallel row-tiles
83+
| |
84+
| +--> Per tile: ExpandStrokeEdges -> EmitOutlineEdge -> RasterizeLine
85+
| +--> EmitCoverageRows -> ordered emit via output buffer
86+
|
87+
+--> Sequential band loop
88+
|
89+
+--> Per band: ExpandStrokeEdges -> EmitOutlineEdge -> RasterizeLine
90+
+--> EmitCoverageRows -> direct callback
91+
```
92+
93+
#### Stroke Edge Expansion (`ExpandStrokeEdges`)
94+
95+
Each `StrokeEdgeData` descriptor is expanded into outline edges based on its
96+
`StrokeEdgeFlags`, mirroring the GPU `StrokeExpandComputeShader`:
97+
98+
| Flag | Expansion | Outline edges |
99+
|------|-----------|---------------|
100+
| `None` (side) | Two edges offset by stroke normal | 2 |
101+
| `Join` | Inner bevel + outer join (miter/round/bevel) | 2-N (round scales with width) |
102+
| `CapStart`/`CapEnd` | Cap geometry (butt/square/round) | 1-N (round scales with width) |
103+
104+
Outline edges are converted from float to 24.8 fixed-point, clipped to band
105+
bounds, and fed directly to `RasterizeLine` — no intermediate edge buffer.
106+
46107
## Coordinate System and Precision
47108

48109
- Geometry is transformed to scanner-local coordinates:

0 commit comments

Comments
 (0)