RunanywhereAI
diff --git a/‎sdk/runanywhere-web/packages/core/README.md‎
Lines changed: 85 additions & 34 deletions b/‎sdk/runanywhere-web/packages/core/README.md‎
Lines changed: 85 additions & 34 deletions
diff --git a/‎sdk/runanywhere-web/packages/core/package.json‎
Lines changed: 1 addition & 1 deletion b/‎sdk/runanywhere-web/packages/core/package.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎sdk/runanywhere-web/packages/llamacpp/README.md‎
Lines changed: 75 additions & 0 deletions b/‎sdk/runanywhere-web/packages/llamacpp/README.md‎
Lines changed: 75 additions & 0 deletions
diff --git a/‎sdk/runanywhere-web/packages/llamacpp/package.json‎
Lines changed: 3 additions & 2 deletions b/‎sdk/runanywhere-web/packages/llamacpp/package.json‎
Lines changed: 3 additions & 2 deletions
@@ -100,25 +100,36 @@ On-device AI for the browser. Run LLMs, Speech-to-Text, Text-to-Speech, Vision,
 
 ## Package Structure
 
-The Web SDK is a single npm package. Unlike native SDKs (iOS, Android, React Native, Flutter) which use separate packages per backend, the Web SDK compiles all inference backends into a single WebAssembly binary. Backend selection happens at WASM build time, not at the package level.
+The Web SDK is split into three npm packages so you only ship the backends you need:
 
-```
-@runanywhere/web           -- TypeScript API + pre-built WASM (all backends)
-```
+| Package | Description | Includes |
+|---------|-------------|----------|
+| [`@runanywhere/web`](https://www.npmjs.com/package/@runanywhere/web) | Core SDK — lifecycle, logging, events, model management, storage | TypeScript only (no WASM) |
+| [`@runanywhere/web-llamacpp`](https://www.npmjs.com/package/@runanywhere/web-llamacpp) | LLM, VLM, tool calling, structured output, embeddings, diffusion | llama.cpp WASM (~3.7 MB CPU, ~3.9 MB WebGPU) |
+| [`@runanywhere/web-onnx`](https://www.npmjs.com/package/@runanywhere/web-onnx) | STT, TTS, VAD, audio capture/playback | [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) WASM (~12 MB, lazy-loaded) |
 
-The pre-built WASM includes llama.cpp (LLM/VLM), whisper.cpp (STT), and sherpa-onnx (TTS/VAD). Developers who need a smaller WASM binary with specific backends can [build from source](#building-from-source) with selective flags.
+Install only what you need — `@runanywhere/web` is always required as the core.
 
 ---
 
 ## Installation
 
 ```bash
-npm install @runanywhere/web
+# Core + all backends
+npm install @runanywhere/web @runanywhere/web-llamacpp @runanywhere/web-onnx
+
+# LLM/VLM only (no speech)
+npm install @runanywhere/web @runanywhere/web-llamacpp
+
+# Speech only (no LLM)
+npm install @runanywhere/web @runanywhere/web-onnx
 ```
 
-### Serve WASM Files
+### Serve WASM Files + Cross-Origin Isolation
 
-The package includes pre-built WASM files in `node_modules/@runanywhere/web/wasm/`. Configure your bundler to serve these as static assets.
+WASM files are included in `@runanywhere/web-llamacpp` and `@runanywhere/web-onnx`. Configure your bundler to serve them as static assets.
+
+> **Important:** Your server **must** set Cross-Origin Isolation headers for `SharedArrayBuffer` and multi-threaded WASM to work. Without these headers the SDK falls back to single-threaded mode, which is significantly slower. See [Cross-Origin Isolation Headers](#cross-origin-isolation-headers) for all platforms (Nginx, Vercel, Netlify, Cloudflare, AWS, Apache).
 
 **Vite:**
 
@@ -132,9 +143,15 @@ export default defineConfig({
       'Cross-Origin-Embedder-Policy': 'credentialless',
     },
   },
+  worker: { format: 'es' },
+  optimizeDeps: {
+    exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
+  },
 });
 ```
 
+> **Warning (Vite users):** You **must** add `@runanywhere/web-llamacpp` and `@runanywhere/web-onnx` to `optimizeDeps.exclude`. Vite's dependency pre-bundling flattens packages into `.vite/deps/`, which breaks the relative `import.meta.url` paths the SDK uses to locate its WASM files. Without this exclusion, WASM loading will fail with a "Failed to fetch dynamically imported module" error. This is a known Vite limitation with npm packages that resolve static assets via `import.meta.url`.
+
 **Webpack:**
 
 ```javascript
@@ -145,9 +162,17 @@ module.exports = {
       { test: /\.wasm$/, type: 'asset/resource' },
     ],
   },
+  devServer: {
+    headers: {
+      'Cross-Origin-Opener-Policy': 'same-origin',
+      'Cross-Origin-Embedder-Policy': 'credentialless',
+    },
+  },
 };
 ```
 
+> **Safari/iOS:** Safari does not support `credentialless` COEP. Use the COI service worker pattern shown in the [demo app](../../examples/web/RunAnywhereAI/) — it intercepts responses and injects `require-corp` headers at runtime.
+
 ---
 
 ## Quick Start
@@ -156,14 +181,20 @@ module.exports = {
 
 ```typescript
 import { RunAnywhere } from '@runanywhere/web';
+import { LlamaCPP, TextGeneration } from '@runanywhere/web-llamacpp';
+import { ONNX, STT, STTModelType, TTS, VAD, SpeechActivity } from '@runanywhere/web-onnx';
 
 await RunAnywhere.initialize({ environment: 'development', debug: true });
+
+// Register backends
+await LlamaCPP.register();
+await ONNX.register();
 ```
 
 ### 2. Text Generation (LLM)
 
 ```typescript
-import { TextGeneration } from '@runanywhere/web';
+import { TextGeneration } from '@runanywhere/web-llamacpp';
 
 // Load a GGUF model
 await TextGeneration.loadModel('/models/qwen2.5-0.5b-instruct-q4_0.gguf', 'qwen2.5-0.5b');
@@ -181,7 +212,7 @@ for await (const token of TextGeneration.generateStream('Write a haiku about cod
 ### 3. Speech-to-Text (STT)
 
 ```typescript
-import { STT } from '@runanywhere/web';
+import { STT, STTModelType } from '@runanywhere/web-onnx';
 
 await STT.loadModel({
   modelId: 'whisper-tiny',
@@ -197,7 +228,7 @@ console.log(result.text);
 ### 4. Text-to-Speech (TTS)
 
 ```typescript
-import { TTS } from '@runanywhere/web';
+import { TTS } from '@runanywhere/web-onnx';
 
 await TTS.loadVoice({
   voiceId: 'piper-en',
@@ -213,7 +244,7 @@ const result = await TTS.synthesize('Hello from RunAnywhere!');
 ### 5. Voice Activity Detection (VAD)
 
 ```typescript
-import { VAD, SpeechActivity } from '@runanywhere/web';
+import { VAD, SpeechActivity } from '@runanywhere/web-onnx';
 
 await VAD.initialize({ modelPath: '/models/silero_vad.onnx' });
 
@@ -231,7 +262,7 @@ VAD.processSamples(audioChunk);
 ### 6. Vision Language Model (VLM)
 
 ```typescript
-import { VLM, VLMImageFormat } from '@runanywhere/web';
+import { VLM, VLMImageFormat } from '@runanywhere/web-llamacpp';
 
 await VLM.loadModel('/models/qwen2-vl.gguf', '/models/mmproj.gguf', 'qwen2-vl');
 
@@ -650,38 +681,46 @@ The demo app runs on Vite with Cross-Origin Isolation headers pre-configured.
 
 ---
 
-## npm Package
-
-```
-@runanywhere/web
-```
+## npm Packages
 
-### Published Exports
+### `@runanywhere/web` (core)
 
 | Export | Description |
 |--------|-------------|
 | `RunAnywhere` | SDK lifecycle (initialize, shutdown, capabilities) |
+| `ModelManager` | Model download, storage, and loading |
+| `OPFSStorage` | Persistent storage via OPFS |
+| `SDKLogger` | Structured logging |
+| `SDKError` | Typed error hierarchy |
+| `EventBus` | SDK event system |
+| `detectCapabilities` | Browser feature detection |
+
+### `@runanywhere/web-llamacpp`
+
+| Export | Description |
+|--------|-------------|
+| `LlamaCPP` | Backend registration |
 | `TextGeneration` | LLM text generation and streaming |
-| `STT` | Speech-to-text transcription |
-| `TTS` | Text-to-speech synthesis |
-| `VAD` | Voice activity detection |
 | `VLM` | Vision-language model inference |
-| `VoicePipeline` | STT -> LLM -> TTS orchestration |
-| `VoiceAgent` | Complete voice agent with C API pipeline |
-| `ToolCalling` | Function calling with typed tool definitions |
+| `ToolCalling` | Function calling with typed definitions |
 | `StructuredOutput` | JSON schema-guided generation |
 | `Embeddings` | Vector embedding generation |
-| `Diffusion` | Image generation (WebGPU, scaffold) |
+| `Diffusion` | Image generation (WebGPU) |
+| `VLMWorkerBridge` | Web Worker bridge for VLM inference |
+| `VideoCapture` | Camera capture and frame extraction |
+| `TelemetryService` | Telemetry and analytics |
+
+### `@runanywhere/web-onnx`
+
+| Export | Description |
+|--------|-------------|
+| `ONNX` | Backend registration |
+| `STT` | Speech-to-text transcription |
+| `TTS` | Text-to-speech synthesis |
+| `VAD` | Voice activity detection |
 | `AudioCapture` | Microphone capture via Web Audio API |
 | `AudioPlayback` | Audio playback via Web Audio API |
-| `VideoCapture` | Camera capture and frame extraction |
-| `ModelManager` | Advanced model download/storage/loading |
-| `OPFSStorage` | Low-level OPFS persistence |
-| `VLMWorkerBridge` | Web Worker bridge for VLM inference |
-| `SDKLogger` | Structured logging |
-| `SDKError` | Typed error hierarchy |
-| `EventBus` | SDK event system |
-| `detectCapabilities` | Browser feature detection |
+| `AudioFileLoader` | Audio file loading and decoding |
 
 ---
 
@@ -715,6 +754,18 @@ Yes. Any GGUF-format model compatible with llama.cpp works for LLM/VLM. STT mode
 
 ## Troubleshooting
 
+### "Failed to fetch dynamically imported module" / WASM not loading (Vite)
+
+**Cause:** Vite pre-bundles npm dependencies into `.vite/deps/`, which breaks the relative `import.meta.url` paths used by `@runanywhere/web-llamacpp` and `@runanywhere/web-onnx` to locate their WASM files.
+
+**Fix:** Add both packages to `optimizeDeps.exclude` in your `vite.config.ts`:
+
+```typescript
+optimizeDeps: {
+  exclude: ['@runanywhere/web-llamacpp', '@runanywhere/web-onnx'],
+},
+```
+
 ### "SharedArrayBuffer is not defined"
 
 **Cause:** Missing Cross-Origin Isolation headers.
 
@@ -1,6 +1,6 @@
 {
   "name": "@runanywhere/web",
-  "version": "0.1.0-beta.8",
+  "version": "0.1.0-beta.9",
   "description": "RunAnywhere Web SDK - Core infrastructure for on-device AI in the browser (pure TypeScript)",
   "type": "module",
   "main": "./dist/index.js",
 
@@ -0,0 +1,75 @@
+# @runanywhere/web-llamacpp
+
+LLM, VLM, tool calling, structured output, embeddings, and diffusion backend for the [RunAnywhere Web SDK](https://www.npmjs.com/package/@runanywhere/web) — powered by [llama.cpp](https://github.com/ggerganov/llama.cpp) compiled to WebAssembly.
+
+> **Peer dependency:** Requires [`@runanywhere/web`](https://www.npmjs.com/package/@runanywhere/web) `>=0.1.0-beta.0`
+
+## Installation
+
+```bash
+npm install @runanywhere/web @runanywhere/web-llamacpp
+```
+
+## Quick Start
+
+```typescript
+import { RunAnywhere } from '@runanywhere/web';
+import { LlamaCPP, TextGeneration } from '@runanywhere/web-llamacpp';
+
+// 1. Initialize core SDK
+await RunAnywhere.initialize({ environment: 'development' });
+
+// 2. Register the llama.cpp backend
+await LlamaCPP.register();
+
+// 3. Load a GGUF model and generate
+await TextGeneration.loadModel('/models/qwen2.5-0.5b-instruct-q4_0.gguf', 'qwen2.5-0.5b');
+const result = await TextGeneration.generate('Explain quantum computing briefly.');
+console.log(result.text);
+
+// Stream tokens
+for await (const token of TextGeneration.generateStream('Write a haiku.')) {
+  process.stdout.write(token);
+}
+```
+
+## Capabilities
+
+| Feature | Class | Description |
+|---------|-------|-------------|
+| **Text Generation** | `TextGeneration` | LLM inference with streaming, system prompts, temperature, top-k/top-p |
+| **Vision Language Models** | `VLM` | Multimodal inference (image + text) via llama.cpp mtmd — runs in a Web Worker |
+| **Tool Calling** | `ToolCalling` | Function calling with typed definitions (Hermes-style and generic) |
+| **Structured Output** | `StructuredOutput` | JSON schema-guided generation |
+| **Embeddings** | `Embeddings` | Vector embedding generation with configurable normalization/pooling |
+| **Diffusion** | `Diffusion` | Image generation (WebGPU, scaffold) |
+
+## WASM Files
+
+This package includes pre-built WASM binaries:
+
+| File | Description |
+|------|-------------|
+| `wasm/racommons-llamacpp.wasm` | CPU variant (~3.7 MB) |
+| `wasm/racommons-llamacpp-webgpu.wasm` | WebGPU-accelerated variant (~3.9 MB) |
+
+The SDK automatically selects the WebGPU variant when available, falling back to CPU.
+
+Configure your bundler to serve these as static assets — see the [main SDK README](https://www.npmjs.com/package/@runanywhere/web) for Vite/Webpack examples.
+
+> **Warning (Vite):** You must add `@runanywhere/web-llamacpp` to [`optimizeDeps.exclude`](https://vite.dev/config/dep-optimization-options#optimizedeps-exclude) in your `vite.config.ts`. Vite's pre-bundling breaks the `import.meta.url` paths the SDK uses to locate WASM files. See the [main SDK README](https://www.npmjs.com/package/@runanywhere/web#serve-wasm-files--cross-origin-isolation) for the full Vite config.
+
+## Cross-Origin Isolation
+
+Multi-threaded WASM requires `SharedArrayBuffer`, which needs Cross-Origin Isolation headers:
+
+```
+Cross-Origin-Opener-Policy: same-origin
+Cross-Origin-Embedder-Policy: credentialless
+```
+
+See the [main SDK docs](https://www.npmjs.com/package/@runanywhere/web#cross-origin-isolation-headers) for platform-specific configuration.
+
+## License
+
+Apache 2.0
@@ -1,6 +1,6 @@
 {
   "name": "@runanywhere/web-llamacpp",
-  "version": "0.1.0-beta.8",
+  "version": "0.1.0-beta.9",
   "description": "RunAnywhere Web SDK - LlamaCpp backend for on-device LLM/VLM inference",
   "type": "module",
   "main": "./dist/index.js",
@@ -28,7 +28,8 @@
     "dev": "tsc --watch",
     "lint": "tsc --noEmit",
     "typecheck": "tsc --noEmit",
-    "clean": "rm -rf dist"
+    "clean": "rm -rf dist",
+    "prepublishOnly": "test -d dist || (echo 'ERROR: dist/ missing. Run npm run build first.' && exit 1); test -f wasm/racommons-llamacpp.wasm || (echo 'ERROR: wasm/racommons-llamacpp.wasm missing. Run build-web.sh first.' && exit 1)"
   },
   "keywords": [
     "runanywhere",
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "@runanywhere/web",`
`3`		`- "version": "0.1.0-beta.8",`
	`3`	`+ "version": "0.1.0-beta.9",`
`4`	`4`	`"description": "RunAnywhere Web SDK - Core infrastructure for on-device AI in the browser (pure TypeScript)",`
`5`	`5`	`"type": "module",`
`6`	`6`	`"main": "./dist/index.js",`