Skip to content

Commit 852dc33

Browse files
authored
WebAudio OfflineAudioContext streaming API (#1183)
Added explainer for WebAudio/web-audio-api#2445.
1 parent c62408a commit 852dc33

2 files changed

Lines changed: 224 additions & 0 deletions

File tree

OfflineAudioContext/explainer.md

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
# WebAudio OfflineAudioContext.startRendering() streaming output
2+
3+
## Authors:
4+
5+
- [Matt Birman](mailto:mattbirman@microsoft.com)
6+
- [Gabriel Brito](mailto:gabrielbrito@microsoft.com)
7+
- [Steve Becker](mailto:stevebe@microsoft.com)
8+
9+
## Participate
10+
11+
- [Issue tracker](https://github.com/MicrosoftEdge/MSEdgeExplainers/labels/OfflineAudioContextStreaming)
12+
- [Discussion forum](https://github.com/WebAudio/web-audio-api/issues/2445)
13+
14+
## Introduction
15+
16+
[WebAudio](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API) provides a powerful and versatile API for performing audio-processing workflows in the browser. It supports complex node-based audio graphs that can be piped to system out (speakers) or an in-memory AudioBuffer for further processing, such as writing to a file. WebAudio can be used for many different workloads in the browser. An example relevant to this discussion is web-based video editors, like [clipchamp.com](https://clipchamp.com), which can use WebAudio to build up complex audio graphs based on multiple input files. These input files are composed, trimmed and processed according to a linear project timeline. The project can be previewed at realtime in the browser or exported faster-than-realtime as an .mp4.
17+
18+
WebAudio works well in a realtime playback context but it is not suitable for offline context (faster-than-realtime) processing due to a limitation in the design of WebAudio's [OfflineAudioContext API](https://developer.mozilla.org/en-US/docs/Web/API/OfflineAudioContext). The design of the API requires allocating memory to render the whole audio graph's memory up-front which can reach gigabytes of AudioBuffer data.
19+
20+
This document will propose adding a streaming offline context rendering function so that the audio graph data can be incrementally processed rather than allocating the whole audio buffer up-front.
21+
22+
## User-Facing Problem
23+
24+
The [OfflineAudioContext API](https://developer.mozilla.org/en-US/docs/Web/API/OfflineAudioContext) works well for rendering small audio graphs but it does not scale for larger projects because it allocates the full graph's AudioBuffer up-front. For example, rendering a 2 hour video composition project in clipchamp.com in an offline context would require an extremely large AudioBuffer allocation. `OfflineAudioContext.startRendering()` allocates an `AudioBuffer` large enough to hold the entire rendered WebAudio graph before returning. 2 hour of audio at 48 kHz with 4 channels results in gigabytes of in-memory float32 data in the `AudioBuffer`. This makes the API unsuitable for very long offline renders or very large channel/length combinations. There is no simple way to chunk the output or consume it as a stream.
25+
26+
The implication of this API is that a user's computer must have enough available memory to export the project even if the in-memory audio buffer will eventually be discarded after it is written to a file. In situations with limited hardware resources or low-powered devices, this limitation makes WebAudio unusable as an offline processor. If memory capacity is exceeded on a user's machine then the processing will stop and the browser may terminate the tab/window leading to potential loss of data for the user and a poor user experience.
27+
28+
Another implication is that the audio buffer cannot be easily interleaved with video data streamed out of WebCodecs. To use clipchamp.com again as an example, the video and audio are combined into a .mp4 file during the export process. The video and audio streams need to be interleaved/muxed in the correct order before writing to the file; the audio data cannot simply be appended at the end. Ignoring the memory implications of the current API, it is difficult to interleave video and audio when all the audio data is delivered as a single chunk at the end of processing. If the audio data was streamed out at the same time as video data is streamed out of WebCodecs then it would simplify the interleaving process.
29+
30+
A workaround to these limitations is for developers to build custom WASM audio-processing which can stream out data incrementally so that the full AudioBuffer is not allocated and therefore memory pressure is not applied to a user's machine. While this works around the API constraint, these 3rd party libraries require complex integration and increase maintenance burden for developers. Custom WASM libraries duplicate features that already exist in WebAudio and only provide streaming output support as a benefit.
31+
32+
### Goals
33+
34+
- Allow streaming data out of a WebAudio in an offline context for rendering large audio graphs
35+
36+
### Non-goals
37+
38+
- Change the existing `startRendering()` behavior, this API change is additive
39+
40+
## Proposed Approach - Add `startRenderingStream()` function
41+
42+
The preferred approach is adding a new method `startRenderingStream()` that yields buffers of interleaved audio samples in a Float32Array, or another format as outlined in Open Questions. In this scenario, the user can read chunks as they arrive and consume them for storage, transcoding via WebCodecs, sending to a server, etc.
43+
44+
Usage example:
45+
46+
```js
47+
const context = new OfflineAudioContext({ numberOfChannels: 2, length: 44100, sampleRate: 44100 });
48+
49+
// Add some nodes to build a graph...
50+
51+
if ("startRenderingStream" in context) {
52+
const reader = context.startRenderingStream().getReader();
53+
while (true) {
54+
// get the next chunk of data from the stream
55+
const result = await reader.read();
56+
57+
// the reader returns done = true when there are no more chunks to consume
58+
if (result.done) {
59+
break;
60+
}
61+
62+
// result.value contains interleaved Float32Array values
63+
const buffers = result.value;
64+
}
65+
} else {
66+
audioBuffer = await offlineAudioContext.startRendering();
67+
}
68+
```
69+
70+
Proposed interface:
71+
72+
```js
73+
partial interface OfflineAudioContext {
74+
// Returns a stream that yields buffers of interleaved audio samples in Float32Array or whatever format is specified
75+
Promise<ReadableStream> startRenderingStream();
76+
};
77+
```
78+
79+
### Pros
80+
81+
- The new capability is feature detectable because it is a new function. Compared to Alternative 1 which cannot be easily detected
82+
- Aligns well with other web streaming APIs, similar to [WebCodecs](https://streams.spec.whatwg.org/#readablestream)
83+
- Works with very large durations, no upper limit to WebAudio graph duration
84+
85+
### Cons
86+
87+
- None of note
88+
89+
### Output format
90+
91+
There is an open question of what data format `startRenderingStream()` should return. The options under consideration are `AudioBuffer`, `Float32Array` planar or `Float32Array` interleaved.
92+
93+
#### `AudioBuffer`
94+
95+
**Pros**
96+
97+
- semantically closest to the `startRendering()` API
98+
99+
**Cons**
100+
101+
- does not allow developers to BYOB (bring your own buffer) and BYOB helps developers manage memory usage, so `AudioBuffer` removes a bit of control
102+
103+
#### Planar Float32Array
104+
105+
**Pros**
106+
107+
- `f32-planar` also already exists in the WebAudio spec
108+
109+
**Cons**
110+
111+
- requires the output of `startStreamingRendering()` to return an array of `Float32Array` in planar format for each output channel
112+
- this leaves a question of what to do if only one channel is read by the consumer, i.e. what should happen to the other channel's data?
113+
114+
#### Interleaved Float32Array
115+
116+
**Pros**
117+
118+
- allows for streaming a single stream of data, rather than one for each channel
119+
- enables BYOB reading
120+
121+
**Cons**
122+
123+
- None of note
124+
125+
## Alternative 1 - Modify existing `startRendering` method to allow streaming output
126+
127+
An alternative approach is to add options to the existing `startRendering()` to configure its operating mode. The mode can be set to `stream` to achieve streaming output. This is similar to the proposed approach but rather than adding a new function, it re-uses an existing function.
128+
129+
Usage example:
130+
131+
```js
132+
const context = new OfflineAudioContext({ numberOfChannels: 2, length: 44100, sampleRate: 44100 });
133+
134+
// Add some nodes to build a graph...
135+
136+
const reader = await context.startRendering(options: { mode: "stream"}).getReader();
137+
while (true) {
138+
// get the next chunk of data from the stream
139+
const result = await reader.read();
140+
141+
// the reader returns done = true when there are no more chunks to consume
142+
if (result.done) {
143+
break;
144+
}
145+
146+
const buffers = result.value;
147+
}
148+
```
149+
150+
The existing API remains unchanged for backwards compatibility:
151+
152+
```js
153+
/**
154+
* Existing API unchanged
155+
*/
156+
const context = new OfflineAudioContext({
157+
numberOfChannels: 2,
158+
length: 44100,
159+
sampleRate: 44100,
160+
});
161+
162+
// Add some nodes to build a graph...
163+
164+
// Full AudioBuffer is allocated
165+
const renderedBuffer = await context.startRendering();
166+
```
167+
168+
Proposed interface:
169+
170+
```js
171+
interface OfflineAudioRenderingOptions {
172+
mode: "audiobuffer" | "stream"
173+
}
174+
175+
interface OfflineAudioContext {
176+
Promise<AudioBuffer | ReadableStream> startRendering(optional: startRenderingOptions);
177+
}
178+
```
179+
180+
### Pros
181+
182+
- The same pros as the proposed approach
183+
184+
### Cons
185+
186+
- The same cons at the proposed approach
187+
- It is not feature detectable, as compared to the Proposed Approach, because it only adds options dictionary to an existing function
188+
- Less explicit than the proposed approach as it overloads an existing public API function. It is safer and simpler to add a new function and not change the behaviour of an existing function
189+
190+
## Alternative 2 - emit `ondataavailable` events
191+
192+
Keep current `startRendering()` API but do not allocate the full `AudioBuffer`. After starting, periodically emit events on the context or a new interface such as `ondataavailable(chunk: AudioBuffer)`.
193+
194+
The user can subscribe and collect chunks for processing.
195+
196+
At the end, the API may optionally still provide a full `AudioBuffer`.
197+
198+
### Pros
199+
200+
- Simple to integrate with existing event-driven patterns
201+
202+
### Cons
203+
204+
- None of note but lacking support in the community discussion
205+
206+
## Stakeholder Feedback / Opposition
207+
208+
- Web community : Positive
209+
210+
The participants on the [GitHub discussion](https://github.com/WebAudio/web-audio-api/issues/2445) agree that incremental delivery of data is necessary. Either streaming chunks of rendered audio or dispatching data in bits rather a single AudioBuffer so that memory usage is bounded and the data can be processed/consumed as it is produced.
211+
212+
## References & acknowledgements
213+
214+
Many thanks for valuable feedback and advice from:
215+
216+
- [Hongchan Choi](https://github.com/hoch)
217+
- [Paul Adenot](https://github.com/padenot)
218+
- [John Weisz](https://github.com/JohnWeisz)
219+
- [Nishitha Dey](https://github.com/nishitha-burman)
220+
- [Gabriel Brito](https://github.com/gabrielsanbrito)
221+
- [Steve Becker](https://github.com/SteveBeckerMSFT)
222+
- [Jasmine Minter](https://github.com/matanui159)
223+
- [Hayden Warmington](https://github.com/dosatross)

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ we move them into the [Alumni section](#alumni-) below.
9999
| [Page Interaction Restriction Manager](PageInteractionRestrictionManager/explainer.md) | <a href="https://github.com/MicrosoftEdge/MSEdgeExplainers/labels/Page%20Interaction%20Restriction%20Manager">![GitHub issues by-label](https://img.shields.io/github/issues/MicrosoftEdge/MSEdgeExplainers/Page%20Interaction%20Restriction%20Manager?label=issues)</a> | [New issue...](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/new?assignees=jineens&labels=PageInteractionRestrictionManager&template=page-interaction-restriction-manager.md&title=%5BPage+Interaction+Restriction+Manager%5D+%3CTITLE+HERE%3E) | Enterprise |
100100
| [DataTransferForInputEvent](Editing/input-event-dataTransfer-explainer.md) | <a href="https://github.com/MicrosoftEdge/MSEdgeExplainers/labels/DataTransferForInputEvent">![GitHub issues by-label](https://img.shields.io/github/issues/MicrosoftEdge/MSEdgeExplainers/DataTransferForInputEvent?label=issues)</a> | [New Issue...](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/new?assignees=pranavmodi&labels=DataTransferForInputEvent&template=data-transfer-for-input-event.md&title=%5BData+Transfer+For+Input+Event%5D+%3CTITLE+HERE%3E) | Editing |
101101
| [RTCRtpReceiver Decoder Fallback](RTCRtpReceiverDecoderFallback/explainer.md) | <a href="https://github.com/MicrosoftEdge/MSEdgeExplainers/labels/RTCRtpReceiverDecoderFallback">![GitHub issues by-label](https://img.shields.io/github/issues/MicrosoftEdge/MSEdgeExplainers/RTCRtpReceiverDecoderFallback?label=issues)</a> | [New Issue...](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/new?assignees=nishitha-burman&labels=RTCRtpReceiverDecoderFallback&template=rtcrtpreceiver-decoder-fallback.md&title=%5BRTCRtpReceiver+Decoder+Fallback%5D+%3CTITLE+HERE%3E) | WebRTC |
102+
| [Offline Audio Context Streaming](OfflineAudioContext/explainer.md) | <a href="https://github.com/MicrosoftEdge/MSEdgeExplainers/labels/OfflineAudioContextStreaming">![GitHub issues by-label](https://img.shields.io/github/issues/MicrosoftEdge/MSEdgeExplainers/OfflineAudioContextStreaming?label=issues)</a> | [New Issue...](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/new?assignees=nishitha-burman&labels=OfflineAudioContextStreaming&title=%5BOfflineAudioContextStreaming%5D+%3CTITLE+HERE%3E) | Audio |
102103

103104
# Brainstorming 🧠
104105

0 commit comments

Comments
 (0)