DrawingCanvas API: Replace imperative extension methods with stateful canvas-based drawing model#377
DrawingCanvas API: Replace imperative extension methods with stateful canvas-based drawing model#377JimBobSquarePants wants to merge 230 commits intomainfrom
Conversation
Fixed. Daft mistake I was caching |
…nd.cs Co-authored-by: Anton Firszov <antonfir@gmail.com>
…lper.cs Co-authored-by: Anton Firszov <antonfir@gmail.com>
…rp.Drawing into js/canvas-api
I agree that I did not go with I also considered your I’ve also updated the IntelliSense docs for the renamed extensions/processors and the related markdown/readme content to reflect the new API. |
Thanks @antonfirsov for adding the benchmarks, they're VERY useful! I dug into this a bit further and reran the throughput benchmark with the two concurrency dimensions separated:
That changes the picture quite a lot. When I hold
So I do not think the previous result supports the conclusion that our internal parallel execution is slower than serial. For the small-image case, I also would not expect MP/s to behave like a typical core pixel processor. Tiger rendering has a substantial amount of per-scene work that does not scale with image area in the same way:
That work still exists even when the target is small, so lower MP/s on small images is not by itself evidence of a regression. In absolute terms the small images are still much faster; they just amortize the fixed scene cost less effectively. The earlier throughput result appears to have been confounded by mixing inner parallelism with outer request concurrency. If I also tested the service-throughput shape explicitly on my machine:
That suggests the best throughput here comes from a balanced split between outer concurrency and inner parallelism, not from maximizing either one in isolation. On the Taken together, I think these results support keeping the current defaults for the ordinary non-concurrent case. For concurrent hosts, the optimal split between per-request parallelism and request-level concurrency is workload-dependent, so I think the right answer is exactly what we have now: sensible defaults, with |
Paint sounds awesome!
Single-user "slowness" wasn't the conclusion I wanted to imply. The data proves that given high concurrency, parallelism hurts throughput, which is something I've seen to happen only with I will try helping by running some profiling. |
Is there any more information that could be cached when the same path/stroke is being used?
In the core library, threading overhead is being reduced or sometimes eliminated for small images via. |
We memoise the result of I just ran some experiments using Using
Not using
|
I agree that high-concurrency service throughput is worth understanding, but I don't think it should automatically define the default optimization target for ImageSharp.Drawing. I don't see this library primarily as a web-server drawing component in the same way some ImageSharp image-processing workloads are. The scenarios I have in mind are things like CAD-style rendering, charts and graphs, UI generation, tooling, and other programmatic rendering workloads where single-render performance and overall rendering capability matter more than maximizing throughput under heavily concurrent request load. That makes the current single-request results directly relevant, and it also means I'm comfortable with I did explore whether the library could self-throttle under concurrent load without user tuning. The honest answer is that any in-library mechanism needs either a shared scheduler across requests or a runtime pool-pressure signal, neither of which exists in a form we can consume without significant upstream changes. The |
CPU rasterizer parallel performance
Scenarios that involve immediate, on-screen rendering are better addressed by the WebGPU renderer. Whatever is the current primary use-case of That said, I just ran the same benchmarks with V2 rasterizer, and this PR is significantly improving every aspect perf-wise (🚀), regardless if parallelism is used or not, so I'm no longer pursuing to figure this out in the PR.
I don't know if self-throttling is the only possible answer here, I would recommend to open a tracking issue once this is merged. I'll move on reviewing other aspects now. |
antonfirsov
left a comment
There was a problem hiding this comment.
Plenty of notes although the review is still incomplete. Most importantly, there seems to be a bug in the GPU renderer, see the comment on the lines demo.
| /// <summary> | ||
| /// Small offscreen WebGPU host used by the sample so the benchmark can drive the real backend without manual WebGPU bootstrap code. | ||
| /// </summary> | ||
| internal sealed class WebGpuBenchmarkBackend : IBenchmarkBackend |
There was a problem hiding this comment.
With 100k lines the output differs from the CPU output and from skia outputs: lines seem to stack in a wrong order.
| /// without carrying a separate success flag beside the handle. Disposing it releases the | ||
| /// reference acquired by <see cref="AcquireReference"/>. | ||
| /// </remarks> | ||
| internal sealed class HandleReference : IDisposable |
There was a problem hiding this comment.
It is a more common pattern to make such types structs so it's cheap to instantiate them, see Memory<T>.Pin() for an example.
| this.deviceReference = this.deviceHandle.AcquireReference(); | ||
| this.queueReference = this.queueHandle.AcquireReference(); |
There was a problem hiding this comment.
We should be extremely careful persisting these. Forgetting a dispose means leaking a handle. I would only do it when P/Invoking happens on such a a burning hot path that makes AddRef/Release measurably expensive.
Otherwise passing around the SafeHandle till the point when the native call happens should be way safer, and align with existing SafeHandle practices.
[OFF] I wish Silk.NET exposed interop APIs working with SafeHandles directly, but I guess they wanted to stay hardcore for gaming perf.
There was a problem hiding this comment.
I wish Silk.NET exposed interop APIs working with SafeHandles directly, but I guess they wanted to stay hardcore for gaming perf.
Now I know about them. Agreed.
| texture = flushContext.Api.DeviceCreateTexture(flushContext.Device, in textureDescriptor); | ||
| if (texture is null) | ||
| { | ||
| error = "Failed to create WebGPU composition texture."; |
There was a problem hiding this comment.
Nit: passing around these error strings is odd, it would be cleaner to emit these messages to some sort of IErrorLogger.
| toolbar.Controls.Add(this.CreateRunButton("1k", 1_000)); | ||
| toolbar.Controls.Add(this.CreateRunButton("10k", 10_000)); | ||
| toolbar.Controls.Add(this.CreateRunButton("100k", 100_000)); | ||
| toolbar.Controls.Add(this.CreateRunButton("200k", 200_000)); |
There was a problem hiding this comment.
I actually wanted to make this 1M, but that makes the implementation hit buffer limits:
WebGPU (Failed: The staged-scene path tiles buffer requires 141001808 bytes, exceeding the current WebGPU binding limit of 134217728 bytes.)
Definitely not a show stopper, but some users may be limited by this, so it's worth noting down.
There was a problem hiding this comment.
It might be possible to work around this. There are hard limits to what you can send though.
| canvas.Flush(); | ||
| } | ||
|
|
||
| stopwatch.Stop(); |
There was a problem hiding this comment.
Yeah, at 200K I'm seeing about 4x. That's annoying. I'll see what I can do.
| public WebGPUWindow(Configuration configuration, WebGPUWindowOptions options) | ||
| : this(CreateConstruction(configuration, options)) | ||
| { | ||
| } | ||
|
|
||
| private WebGPUWindow(WindowConstruction construction) | ||
| : this(construction.Window, construction.Configuration, construction.Format, construction.PresentMode) | ||
| { | ||
| } | ||
|
|
||
| private WebGPUWindow( |
There was a problem hiding this comment.
Nit: merging these constructors and deleting the WindowConstruction class would simplify things.
| public WebGPURenderTarget(Configuration configuration, int width, int height) | ||
| : this(AllocateOwnedTarget(configuration, width, height)) | ||
| { | ||
| } | ||
|
|
||
| private WebGPURenderTarget(OwnedTarget ownedTarget) |
There was a problem hiding this comment.
Nit: merging constructors and deleting OwnedTarget would simplify things. The type name is especially confusing.
| private readonly bool ownsGraphics; | ||
| private bool isDisposed; | ||
|
|
||
| private WebGPURenderTarget( |
There was a problem hiding this comment.
What happened to the "public things go first" stylecop rule?
There was a problem hiding this comment.
I think that was dumped from StyleCop years ago.
| /// Use <see cref="Run(Action{DrawingCanvas{TPixel}})"/> when you want the window to drive rendering for you, or <see cref="TryAcquireFrame(TimeSpan, out WebGPUWindowFrame{TPixel}?)"/> when you need to drive the frame loop yourself. | ||
| /// </summary> | ||
| /// <typeparam name="TPixel">The canvas pixel format.</typeparam> | ||
| public sealed unsafe class WebGPUWindow<TPixel> : IDisposable |
There was a problem hiding this comment.
This type has limited usability since it needs to create a brand-new popup window and doesn't allow rendering to a control of hosted by popular UI frameworks. I hardly see anyone using it for serious things except gamedevs who want to experiment with our API.
In #377 (review) I suggested exploring ways to create a canvas around an existing window handle that could be an entry point for WinForms/WPF/WinUI integration (maybe even Avalonia or MAUI). IMO such a feature would radically help adaption.
I can explore this, but I can't promise to do it fast.
There was a problem hiding this comment.
WebGPUWindow<TPixel> is a managed presentation surface intended for tools where we own the window. The embedding entry point you're looking for is WebGPUDeviceContext<TPixel>:
WebGPUDeviceContext(nint deviceHandle, nint queueHandle)wraps externally-owned device/queue handles without taking ownership.CreateCanvas(nint textureHandle, nint textureViewHandle, WebGPUTextureFormatId format, int width, int height, DrawingOptions options)wraps an externally-owned texture/view into aDrawingCanvas<TPixel>for one frame.
The integration shape is the same on every platform:
- The host owns the WebGPU surface, swap-chain, device, and queue.
- The render loop acquires the current texture and view from the surface.
- The four handles go into
WebGPUDeviceContext.CreateCanvas(...), draw, dispose, present through the host's loop.
So the primitive for "render to an existing window/control" is already there. Per-framework wrappers (WinForms control, WPF host, Avalonia control, etc.) that hide steps 1 and 2 would just need framework-specific swap-chain plumbing utilizing that type.


Prerequisites
Breaking Changes: DrawingCanvas API
Fix #106
Fix #244
Fix #344
Fix #367
This is a major breaking change. The library's public API has been completely redesigned around a canvas-based drawing model, replacing the previous collection of imperative extension methods.
What changed
The old API surface — dozens of
IImageProcessingContextextension methods likeDrawLine(),DrawPolygon(),FillPolygon(),DrawBeziers(),DrawImage(),DrawText(), etc. — has been removed entirely. These methods were individually simple but suffered from several architectural limitations:The new model:
DrawingCanvasAll drawing now goes through
IDrawingCanvas/DrawingCanvas<TPixel>, a stateful canvas that queues draw commands and flushes them as a batch.Via
Image.Mutate()(most common)Standalone usage (without
Image.Mutate)DrawingCanvas<TPixel>can be created directly from an image or frame using theCreateCanvas(...)extensions:Canvas state management
The canvas supports a save/restore stack (similar to HTML Canvas or SkCanvas):
State includes
DrawingOptions(graphics options, shape options, transform) and clip paths.SaveLayercreates an offscreen layer that composites back onRestore.IDrawingBackend— bring your own rendererThe library's rasterization and composition pipeline is abstracted behind
IDrawingBackend. This interface has the following methods:FlushCompositions<TPixel>TryReadRegion<TPixel>Process()andDrawImage()).The library ships with
DefaultDrawingBackend(CPU, tiled fixed-point rasterizer). An experimental WebGPU compute-shader backend (ImageSharp.Drawing.WebGPU) is also available, demonstrating how alternate backends plug in. Users can provide their own implementations — for example, GPU-accelerated backends, SVG emitters, or recording/replay layers.Backends are registered on
Configuration:Migration guide
ctx.Fill(color, path)ctx.ProcessWithCanvas(c => c.Fill(Brushes.Solid(color), path))ctx.Fill(brush, path)ctx.ProcessWithCanvas(c => c.Fill(brush, path))ctx.Draw(pen, path)ctx.ProcessWithCanvas(c => c.Draw(pen, path))ctx.DrawLine(pen, points)ctx.ProcessWithCanvas(c => c.DrawLine(pen, points))ctx.DrawPolygon(pen, points)ctx.ProcessWithCanvas(c => c.Draw(pen, new Polygon(new LinearLineSegment(points))))ctx.FillPolygon(brush, points)ctx.ProcessWithCanvas(c => c.Fill(brush, new Polygon(new LinearLineSegment(points))))ctx.DrawText(text, font, color, origin)ctx.ProcessWithCanvas(c => c.DrawText(new RichTextOptions(font) { Origin = origin }, text, Brushes.Solid(color), null))ctx.DrawImage(overlay, opacity)ctx.ProcessWithCanvas(c => c.DrawImage(overlay, sourceRect, destRect))ProcessWithCanvasblock — commands are batched and flushed togetherOther breaking changes in this PR
AntialiasSubpixelDepthremoved — The rasterizer now uses a fixed 256-step (8-bit) subpixel depth. The oldAntialiasSubpixelDepthproperty (default: 16) controlled how many vertical subpixel steps the rasterizer used per pixel row. The new fixed-point scanline rasterizer integrates area/cover analytically per cell rather than sampling at discrete subpixel rows, so the "depth" is a property of the coordinate precision (24.8 fixed-point), not a tunable sample count. 256 steps gives ~0.4% coverage granularity — more than sufficient for all practical use cases. The old default of 16 (~6.25% granularity) could produce visible banding on gentle slopes.GraphicsOptions.Antialias— now controlsRasterizationMode(antialiased vs aliased). Whenfalse, coverage is snapped to binary usingAntialiasThreshold.GraphicsOptions.AntialiasThreshold— new property (0–1, default 0.5) controlling the coverage cutoff in aliased mode. Pixels with coverage at or above this value become fully opaque; pixels below are discarded.Benchmarks
All benchmarks run under the following environment.
DrawPolygonAll - Renders a 7200x4800px path of the state of Mississippi with a 2px stroke.
FillParis - Renders 1096x1060px scene containing 50K fill paths.