Skip to content

Commit cd12f3f

Browse files
readme updates
1 parent efdd809 commit cd12f3f

7 files changed

Lines changed: 500 additions & 39 deletions

File tree

sdk/runanywhere-web/packages/core/README.md

Lines changed: 85 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -100,25 +100,36 @@ On-device AI for the browser. Run LLMs, Speech-to-Text, Text-to-Speech, Vision,
100100

101101
## Package Structure
102102

103-
The Web SDK is a single npm package. Unlike native SDKs (iOS, Android, React Native, Flutter) which use separate packages per backend, the Web SDK compiles all inference backends into a single WebAssembly binary. Backend selection happens at WASM build time, not at the package level.
103+
The Web SDK is split into three npm packages so you only ship the backends you need:
104104

105-
```
106-
@runanywhere/web -- TypeScript API + pre-built WASM (all backends)
107-
```
105+
| Package | Description | Includes |
106+
|---------|-------------|----------|
107+
| [`@runanywhere/web`](https://www.npmjs.com/package/@runanywhere/web) | Core SDK — lifecycle, logging, events, model management, storage | TypeScript only (no WASM) |
108+
| [`@runanywhere/web-llamacpp`](https://www.npmjs.com/package/@runanywhere/web-llamacpp) | LLM, VLM, tool calling, structured output, embeddings, diffusion | llama.cpp WASM (~3.7 MB CPU, ~3.9 MB WebGPU) |
109+
| [`@runanywhere/web-onnx`](https://www.npmjs.com/package/@runanywhere/web-onnx) | STT, TTS, VAD, audio capture/playback | [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) WASM (~12 MB, lazy-loaded) |
108110

109-
The pre-built WASM includes llama.cpp (LLM/VLM), whisper.cpp (STT), and sherpa-onnx (TTS/VAD). Developers who need a smaller WASM binary with specific backends can [build from source](#building-from-source) with selective flags.
111+
Install only what you need — `@runanywhere/web` is always required as the core.
110112

111113
---
112114

113115
## Installation
114116

115117
```bash
116-
npm install @runanywhere/web
118+
# Core + all backends
119+
npm install @runanywhere/web @runanywhere/web-llamacpp @runanywhere/web-onnx
120+
121+
# LLM/VLM only (no speech)
122+
npm install @runanywhere/web @runanywhere/web-llamacpp
123+
124+
# Speech only (no LLM)
125+
npm install @runanywhere/web @runanywhere/web-onnx
117126
```
118127

119-
### Serve WASM Files
128+
### Serve WASM Files + Cross-Origin Isolation
120129

121-
The package includes pre-built WASM files in `node_modules/@runanywhere/web/wasm/`. Configure your bundler to serve these as static assets.
130+
WASM files are included in `@runanywhere/web-llamacpp` and `@runanywhere/web-onnx`. Configure your bundler to serve them as static assets.
131+
132+
> **Important:** Your server **must** set Cross-Origin Isolation headers for `SharedArrayBuffer` and multi-threaded WASM to work. Without these headers the SDK falls back to single-threaded mode, which is significantly slower. See [Cross-Origin Isolation Headers](#cross-origin-isolation-headers) for all platforms (Nginx, Vercel, Netlify, Cloudflare, AWS, Apache).
122133
123134
**Vite:**
124135

@@ -132,9 +143,15 @@ export default defineConfig({
132143
'Cross-Origin-Embedder-Policy': 'credentialless',
133144
},
134145
},
146+
worker: { format: 'es' },
147+
optimizeDeps: {
148+
exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
149+
},
135150
});
136151
```
137152

153+
> **Warning (Vite users):** You **must** add `@runanywhere/web-llamacpp` and `@runanywhere/web-onnx` to `optimizeDeps.exclude`. Vite's dependency pre-bundling flattens packages into `.vite/deps/`, which breaks the relative `import.meta.url` paths the SDK uses to locate its WASM files. Without this exclusion, WASM loading will fail with a "Failed to fetch dynamically imported module" error. This is a known Vite limitation with npm packages that resolve static assets via `import.meta.url`.
154+
138155
**Webpack:**
139156

140157
```javascript
@@ -145,9 +162,17 @@ module.exports = {
145162
{ test: /\.wasm$/, type: 'asset/resource' },
146163
],
147164
},
165+
devServer: {
166+
headers: {
167+
'Cross-Origin-Opener-Policy': 'same-origin',
168+
'Cross-Origin-Embedder-Policy': 'credentialless',
169+
},
170+
},
148171
};
149172
```
150173

174+
> **Safari/iOS:** Safari does not support `credentialless` COEP. Use the COI service worker pattern shown in the [demo app](../../examples/web/RunAnywhereAI/) — it intercepts responses and injects `require-corp` headers at runtime.
175+
151176
---
152177

153178
## Quick Start
@@ -156,14 +181,20 @@ module.exports = {
156181

157182
```typescript
158183
import { RunAnywhere } from '@runanywhere/web';
184+
import { LlamaCPP, TextGeneration } from '@runanywhere/web-llamacpp';
185+
import { ONNX, STT, STTModelType, TTS, VAD, SpeechActivity } from '@runanywhere/web-onnx';
159186

160187
await RunAnywhere.initialize({ environment: 'development', debug: true });
188+
189+
// Register backends
190+
await LlamaCPP.register();
191+
await ONNX.register();
161192
```
162193

163194
### 2. Text Generation (LLM)
164195

165196
```typescript
166-
import { TextGeneration } from '@runanywhere/web';
197+
import { TextGeneration } from '@runanywhere/web-llamacpp';
167198

168199
// Load a GGUF model
169200
await TextGeneration.loadModel('/models/qwen2.5-0.5b-instruct-q4_0.gguf', 'qwen2.5-0.5b');
@@ -181,7 +212,7 @@ for await (const token of TextGeneration.generateStream('Write a haiku about cod
181212
### 3. Speech-to-Text (STT)
182213

183214
```typescript
184-
import { STT } from '@runanywhere/web';
215+
import { STT, STTModelType } from '@runanywhere/web-onnx';
185216

186217
await STT.loadModel({
187218
modelId: 'whisper-tiny',
@@ -197,7 +228,7 @@ console.log(result.text);
197228
### 4. Text-to-Speech (TTS)
198229

199230
```typescript
200-
import { TTS } from '@runanywhere/web';
231+
import { TTS } from '@runanywhere/web-onnx';
201232

202233
await TTS.loadVoice({
203234
voiceId: 'piper-en',
@@ -213,7 +244,7 @@ const result = await TTS.synthesize('Hello from RunAnywhere!');
213244
### 5. Voice Activity Detection (VAD)
214245

215246
```typescript
216-
import { VAD, SpeechActivity } from '@runanywhere/web';
247+
import { VAD, SpeechActivity } from '@runanywhere/web-onnx';
217248

218249
await VAD.initialize({ modelPath: '/models/silero_vad.onnx' });
219250

@@ -231,7 +262,7 @@ VAD.processSamples(audioChunk);
231262
### 6. Vision Language Model (VLM)
232263

233264
```typescript
234-
import { VLM, VLMImageFormat } from '@runanywhere/web';
265+
import { VLM, VLMImageFormat } from '@runanywhere/web-llamacpp';
235266

236267
await VLM.loadModel('/models/qwen2-vl.gguf', '/models/mmproj.gguf', 'qwen2-vl');
237268

@@ -650,38 +681,46 @@ The demo app runs on Vite with Cross-Origin Isolation headers pre-configured.
650681

651682
---
652683

653-
## npm Package
654-
655-
```
656-
@runanywhere/web
657-
```
684+
## npm Packages
658685

659-
### Published Exports
686+
### `@runanywhere/web` (core)
660687

661688
| Export | Description |
662689
|--------|-------------|
663690
| `RunAnywhere` | SDK lifecycle (initialize, shutdown, capabilities) |
691+
| `ModelManager` | Model download, storage, and loading |
692+
| `OPFSStorage` | Persistent storage via OPFS |
693+
| `SDKLogger` | Structured logging |
694+
| `SDKError` | Typed error hierarchy |
695+
| `EventBus` | SDK event system |
696+
| `detectCapabilities` | Browser feature detection |
697+
698+
### `@runanywhere/web-llamacpp`
699+
700+
| Export | Description |
701+
|--------|-------------|
702+
| `LlamaCPP` | Backend registration |
664703
| `TextGeneration` | LLM text generation and streaming |
665-
| `STT` | Speech-to-text transcription |
666-
| `TTS` | Text-to-speech synthesis |
667-
| `VAD` | Voice activity detection |
668704
| `VLM` | Vision-language model inference |
669-
| `VoicePipeline` | STT -> LLM -> TTS orchestration |
670-
| `VoiceAgent` | Complete voice agent with C API pipeline |
671-
| `ToolCalling` | Function calling with typed tool definitions |
705+
| `ToolCalling` | Function calling with typed definitions |
672706
| `StructuredOutput` | JSON schema-guided generation |
673707
| `Embeddings` | Vector embedding generation |
674-
| `Diffusion` | Image generation (WebGPU, scaffold) |
708+
| `Diffusion` | Image generation (WebGPU) |
709+
| `VLMWorkerBridge` | Web Worker bridge for VLM inference |
710+
| `VideoCapture` | Camera capture and frame extraction |
711+
| `TelemetryService` | Telemetry and analytics |
712+
713+
### `@runanywhere/web-onnx`
714+
715+
| Export | Description |
716+
|--------|-------------|
717+
| `ONNX` | Backend registration |
718+
| `STT` | Speech-to-text transcription |
719+
| `TTS` | Text-to-speech synthesis |
720+
| `VAD` | Voice activity detection |
675721
| `AudioCapture` | Microphone capture via Web Audio API |
676722
| `AudioPlayback` | Audio playback via Web Audio API |
677-
| `VideoCapture` | Camera capture and frame extraction |
678-
| `ModelManager` | Advanced model download/storage/loading |
679-
| `OPFSStorage` | Low-level OPFS persistence |
680-
| `VLMWorkerBridge` | Web Worker bridge for VLM inference |
681-
| `SDKLogger` | Structured logging |
682-
| `SDKError` | Typed error hierarchy |
683-
| `EventBus` | SDK event system |
684-
| `detectCapabilities` | Browser feature detection |
723+
| `AudioFileLoader` | Audio file loading and decoding |
685724

686725
---
687726

@@ -715,6 +754,18 @@ Yes. Any GGUF-format model compatible with llama.cpp works for LLM/VLM. STT mode
715754

716755
## Troubleshooting
717756

757+
### "Failed to fetch dynamically imported module" / WASM not loading (Vite)
758+
759+
**Cause:** Vite pre-bundles npm dependencies into `.vite/deps/`, which breaks the relative `import.meta.url` paths used by `@runanywhere/web-llamacpp` and `@runanywhere/web-onnx` to locate their WASM files.
760+
761+
**Fix:** Add both packages to `optimizeDeps.exclude` in your `vite.config.ts`:
762+
763+
```typescript
764+
optimizeDeps: {
765+
exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
766+
},
767+
```
768+
718769
### "SharedArrayBuffer is not defined"
719770

720771
**Cause:** Missing Cross-Origin Isolation headers.

sdk/runanywhere-web/packages/core/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@runanywhere/web",
3-
"version": "0.1.0-beta.8",
3+
"version": "0.1.0-beta.9",
44
"description": "RunAnywhere Web SDK - Core infrastructure for on-device AI in the browser (pure TypeScript)",
55
"type": "module",
66
"main": "./dist/index.js",
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# @runanywhere/web-llamacpp
2+
3+
LLM, VLM, tool calling, structured output, embeddings, and diffusion backend for the [RunAnywhere Web SDK](https://www.npmjs.com/package/@runanywhere/web) — powered by [llama.cpp](https://github.com/ggerganov/llama.cpp) compiled to WebAssembly.
4+
5+
> **Peer dependency:** Requires [`@runanywhere/web`](https://www.npmjs.com/package/@runanywhere/web) `>=0.1.0-beta.0`
6+
7+
## Installation
8+
9+
```bash
10+
npm install @runanywhere/web @runanywhere/web-llamacpp
11+
```
12+
13+
## Quick Start
14+
15+
```typescript
16+
import { RunAnywhere } from '@runanywhere/web';
17+
import { LlamaCPP, TextGeneration } from '@runanywhere/web-llamacpp';
18+
19+
// 1. Initialize core SDK
20+
await RunAnywhere.initialize({ environment: 'development' });
21+
22+
// 2. Register the llama.cpp backend
23+
await LlamaCPP.register();
24+
25+
// 3. Load a GGUF model and generate
26+
await TextGeneration.loadModel('/models/qwen2.5-0.5b-instruct-q4_0.gguf', 'qwen2.5-0.5b');
27+
const result = await TextGeneration.generate('Explain quantum computing briefly.');
28+
console.log(result.text);
29+
30+
// Stream tokens
31+
for await (const token of TextGeneration.generateStream('Write a haiku.')) {
32+
process.stdout.write(token);
33+
}
34+
```
35+
36+
## Capabilities
37+
38+
| Feature | Class | Description |
39+
|---------|-------|-------------|
40+
| **Text Generation** | `TextGeneration` | LLM inference with streaming, system prompts, temperature, top-k/top-p |
41+
| **Vision Language Models** | `VLM` | Multimodal inference (image + text) via llama.cpp mtmd — runs in a Web Worker |
42+
| **Tool Calling** | `ToolCalling` | Function calling with typed definitions (Hermes-style and generic) |
43+
| **Structured Output** | `StructuredOutput` | JSON schema-guided generation |
44+
| **Embeddings** | `Embeddings` | Vector embedding generation with configurable normalization/pooling |
45+
| **Diffusion** | `Diffusion` | Image generation (WebGPU, scaffold) |
46+
47+
## WASM Files
48+
49+
This package includes pre-built WASM binaries:
50+
51+
| File | Description |
52+
|------|-------------|
53+
| `wasm/racommons-llamacpp.wasm` | CPU variant (~3.7 MB) |
54+
| `wasm/racommons-llamacpp-webgpu.wasm` | WebGPU-accelerated variant (~3.9 MB) |
55+
56+
The SDK automatically selects the WebGPU variant when available, falling back to CPU.
57+
58+
Configure your bundler to serve these as static assets — see the [main SDK README](https://www.npmjs.com/package/@runanywhere/web) for Vite/Webpack examples.
59+
60+
> **Warning (Vite):** You must add `@runanywhere/web-llamacpp` to [`optimizeDeps.exclude`](https://vite.dev/config/dep-optimization-options#optimizedeps-exclude) in your `vite.config.ts`. Vite's pre-bundling breaks the `import.meta.url` paths the SDK uses to locate WASM files. See the [main SDK README](https://www.npmjs.com/package/@runanywhere/web#serve-wasm-files--cross-origin-isolation) for the full Vite config.
61+
62+
## Cross-Origin Isolation
63+
64+
Multi-threaded WASM requires `SharedArrayBuffer`, which needs Cross-Origin Isolation headers:
65+
66+
```
67+
Cross-Origin-Opener-Policy: same-origin
68+
Cross-Origin-Embedder-Policy: credentialless
69+
```
70+
71+
See the [main SDK docs](https://www.npmjs.com/package/@runanywhere/web#cross-origin-isolation-headers) for platform-specific configuration.
72+
73+
## License
74+
75+
Apache 2.0

sdk/runanywhere-web/packages/llamacpp/package.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@runanywhere/web-llamacpp",
3-
"version": "0.1.0-beta.8",
3+
"version": "0.1.0-beta.9",
44
"description": "RunAnywhere Web SDK - LlamaCpp backend for on-device LLM/VLM inference",
55
"type": "module",
66
"main": "./dist/index.js",
@@ -28,7 +28,8 @@
2828
"dev": "tsc --watch",
2929
"lint": "tsc --noEmit",
3030
"typecheck": "tsc --noEmit",
31-
"clean": "rm -rf dist"
31+
"clean": "rm -rf dist",
32+
"prepublishOnly": "test -d dist || (echo 'ERROR: dist/ missing. Run npm run build first.' && exit 1); test -f wasm/racommons-llamacpp.wasm || (echo 'ERROR: wasm/racommons-llamacpp.wasm missing. Run build-web.sh first.' && exit 1)"
3233
},
3334
"keywords": [
3435
"runanywhere",

0 commit comments

Comments
 (0)