|
1 | 1 | # Foundry Local JS SDK |
2 | 2 |
|
3 | | -The Foundry Local JS SDK provides a JavaScript/TypeScript interface for interacting with local AI models via the Foundry Local Core. It allows you to discover, download, load, and run inference on models directly on your local machine. |
| 3 | +The Foundry Local JS SDK provides a JavaScript/TypeScript interface for running AI models locally on your machine. Discover, download, load, and run inference — all without cloud dependencies. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Local-first AI** — Run models entirely on your machine with no cloud calls |
| 8 | +- **Model catalog** — Browse and discover available models, check what's cached or loaded |
| 9 | +- **Automatic model management** — Download, load, unload, and remove models from cache |
| 10 | +- **Chat completions** — OpenAI-compatible chat API with both synchronous and streaming responses |
| 11 | +- **Audio transcription** — Transcribe audio files locally with streaming support |
| 12 | +- **Multi-variant models** — Models can have multiple variants (e.g., different quantizations) with automatic selection of the best cached variant |
| 13 | +- **Embedded web service** — Start a local HTTP service for OpenAI-compatible API access |
| 14 | +- **WinML support** — Automatic execution provider download on Windows for NPU/GPU acceleration |
| 15 | +- **Configurable inference** — Control temperature, max tokens, top-k, top-p, frequency penalty, and more |
4 | 16 |
|
5 | 17 | ## Installation |
6 | 18 |
|
7 | | -To install the SDK, run the following command in your project directory: |
| 19 | +```bash |
| 20 | +npm install @prathikrao/foundry-local-sdk |
| 21 | +``` |
| 22 | + |
| 23 | +## WinML: Automatic Hardware Acceleration (Windows) |
| 24 | + |
| 25 | +On Windows, install with the `--winml` flag to enable automatic execution provider management. The SDK will automatically discover, download, and register hardware-specific execution providers (e.g., Qualcomm QNN for NPU acceleration) via the Windows App Runtime — no manual driver or EP setup required. |
8 | 26 |
|
9 | 27 | ```bash |
10 | | -npm install foundry-local-sdk |
| 28 | +npm install @prathikrao/foundry-local-sdk --winml |
11 | 29 | ``` |
12 | 30 |
|
13 | | -## Usage |
| 31 | +When WinML is enabled: |
| 32 | +- Execution providers like `QNNExecutionProvider`, `OpenVINOExecutionProvider`, etc. are downloaded and registered on the fly, enabling NPU/GPU acceleration without manual configuration |
| 33 | +- **No code changes needed** — your application code stays the same whether WinML is enabled or not |
14 | 34 |
|
15 | | -### Initialization |
| 35 | +> **Note:** The `--winml` flag is only relevant on Windows. On macOS and Linux, the standard installation is used regardless of this flag. |
16 | 36 |
|
17 | | -Initialize the `FoundryLocalManager` with your configuration. |
| 37 | +## Quick Start |
18 | 38 |
|
19 | 39 | ```typescript |
20 | | -import { FoundryLocalManager } from 'foundry-local-sdk'; |
| 40 | +import { FoundryLocalManager } from '@prathikrao/foundry-local-sdk'; |
21 | 41 |
|
| 42 | +// Initialize the SDK |
22 | 43 | const manager = FoundryLocalManager.create({ |
23 | | - libraryPath: '/path/to/core/library', |
24 | | - modelCacheDir: '/path/to/model/cache', |
| 44 | + appName: 'MyApp', |
25 | 45 | logLevel: 'info' |
26 | 46 | }); |
| 47 | + |
| 48 | +// Get a model from the catalog |
| 49 | +const model = await manager.catalog.getModel('phi-3-mini'); |
| 50 | + |
| 51 | +// Load the model into memory |
| 52 | +await model.load(); |
| 53 | + |
| 54 | +// Run a chat completion |
| 55 | +const chatClient = model.createChatClient(); |
| 56 | +const response = await chatClient.completeChat([ |
| 57 | + { role: 'user', content: 'Hello, how are you?' } |
| 58 | +]); |
| 59 | +console.log(response.choices[0].message.content); |
| 60 | + |
| 61 | +// Clean up |
| 62 | +await model.unload(); |
27 | 63 | ``` |
28 | 64 |
|
29 | | -### Discovering Models |
| 65 | +## Usage |
| 66 | + |
| 67 | +### Browsing the Model Catalog |
30 | 68 |
|
31 | | -Use the `Catalog` to list available models. |
| 69 | +The `Catalog` lets you discover what models are available, which are already cached locally, and which are currently loaded in memory. |
32 | 70 |
|
33 | 71 | ```typescript |
34 | 72 | const catalog = manager.catalog; |
35 | | -const models = catalog.models; |
36 | 73 |
|
| 74 | +// List all available models |
| 75 | +const models = await catalog.getModels(); |
37 | 76 | models.forEach(model => { |
38 | | - console.log(`Model: ${model.alias}`); |
| 77 | + console.log(`${model.alias} — cached: ${model.isCached}`); |
39 | 78 | }); |
| 79 | + |
| 80 | +// See what's already downloaded |
| 81 | +const cached = await catalog.getCachedModels(); |
| 82 | + |
| 83 | +// See what's currently loaded in memory |
| 84 | +const loaded = await catalog.getLoadedModels(); |
40 | 85 | ``` |
41 | 86 |
|
42 | | -### Loading and Running a Model |
| 87 | +### Loading and Running Models |
| 88 | + |
| 89 | +Each `Model` can have multiple variants (different quantizations or formats). The SDK automatically selects the best available variant, preferring cached versions. |
43 | 90 |
|
44 | 91 | ```typescript |
45 | | -const model = catalog.getModel('phi-3-mini'); |
46 | | - |
47 | | -if (model) { |
48 | | - await model.load(); |
49 | | - |
50 | | - const chatClient = model.createChatClient(); |
51 | | - const response = await chatClient.completeChat([ |
52 | | - { role: 'user', content: 'Hello, how are you?' } |
53 | | - ]); |
54 | | - |
55 | | - console.log(response.choices[0].message.content); |
56 | | - |
57 | | - await model.unload(); |
| 92 | +const model = await catalog.getModel('phi-3-mini'); |
| 93 | + |
| 94 | +// Download if not cached (with optional progress tracking) |
| 95 | +if (!model.isCached) { |
| 96 | + await model.download((progress) => { |
| 97 | + console.log(`Download: ${progress}%`); |
| 98 | + }); |
58 | 99 | } |
| 100 | + |
| 101 | +// Load into memory and run inference |
| 102 | +await model.load(); |
| 103 | +const chatClient = model.createChatClient(); |
59 | 104 | ``` |
60 | 105 |
|
61 | | -## Documentation |
| 106 | +You can also select a specific variant manually: |
62 | 107 |
|
63 | | -The SDK source code is documented using TSDoc. You can generate the API documentation using TypeDoc. |
| 108 | +```typescript |
| 109 | +const variants = model.variants; |
| 110 | +model.selectVariant(variants[0].id); |
| 111 | +``` |
64 | 112 |
|
65 | | -### Generating Docs |
| 113 | +### Chat Completions |
66 | 114 |
|
67 | | -Run the following command to generate the HTML documentation in the `docs` folder: |
| 115 | +The `ChatClient` follows the OpenAI Chat Completion API structure. |
68 | 116 |
|
69 | | -```bash |
70 | | -npm run docs |
| 117 | +```typescript |
| 118 | +const chatClient = model.createChatClient(); |
| 119 | + |
| 120 | +// Configure settings |
| 121 | +chatClient.settings.temperature = 0.7; |
| 122 | +chatClient.settings.maxTokens = 800; |
| 123 | +chatClient.settings.topP = 0.9; |
| 124 | + |
| 125 | +// Synchronous completion |
| 126 | +const response = await chatClient.completeChat([ |
| 127 | + { role: 'system', content: 'You are a helpful assistant.' }, |
| 128 | + { role: 'user', content: 'Explain quantum computing in simple terms.' } |
| 129 | +]); |
| 130 | +console.log(response.choices[0].message.content); |
71 | 131 | ``` |
72 | 132 |
|
73 | | -Open `docs/index.html` in your browser to view the documentation. |
| 133 | +### Streaming Responses |
74 | 134 |
|
75 | | -## Running Tests |
| 135 | +For real-time output, use streaming: |
| 136 | + |
| 137 | +```typescript |
| 138 | +await chatClient.completeStreamingChat( |
| 139 | + [{ role: 'user', content: 'Write a short poem about programming.' }], |
| 140 | + (chunk) => { |
| 141 | + const content = chunk.choices?.[0]?.message?.content; |
| 142 | + if (content) { |
| 143 | + process.stdout.write(content); |
| 144 | + } |
| 145 | + } |
| 146 | +); |
| 147 | +``` |
| 148 | + |
| 149 | +### Audio Transcription |
| 150 | + |
| 151 | +Transcribe audio files locally using the `AudioClient`: |
| 152 | + |
| 153 | +```typescript |
| 154 | +const audioClient = model.createAudioClient(); |
| 155 | +audioClient.settings.language = 'en'; |
| 156 | + |
| 157 | +// Synchronous transcription |
| 158 | +const result = await audioClient.transcribe('/path/to/audio.wav'); |
| 159 | + |
| 160 | +// Streaming transcription |
| 161 | +await audioClient.transcribeStreaming('/path/to/audio.wav', (chunk) => { |
| 162 | + console.log(chunk); |
| 163 | +}); |
| 164 | +``` |
| 165 | + |
| 166 | +### Embedded Web Service |
| 167 | + |
| 168 | +Start a local HTTP server that exposes an OpenAI-compatible API: |
| 169 | + |
| 170 | +```typescript |
| 171 | +manager.startWebService(); |
| 172 | +console.log('Service running at:', manager.urls); |
76 | 173 |
|
77 | | -To run the tests, use: |
| 174 | +// Use with any OpenAI-compatible client library |
| 175 | +// ... |
| 176 | + |
| 177 | +manager.stopWebService(); |
| 178 | +``` |
| 179 | + |
| 180 | +### Configuration |
| 181 | + |
| 182 | +The SDK is configured via `FoundryLocalConfig` when creating the manager: |
| 183 | + |
| 184 | +| Option | Description | Default | |
| 185 | +|--------|-------------|---------| |
| 186 | +| `appName` | **Required.** Application name for logs and telemetry. | — | |
| 187 | +| `appDataDir` | Directory where application data should be stored | `~/.{appName}` | |
| 188 | +| `logLevel` | Logging level: `trace`, `debug`, `info`, `warn`, `error`, `fatal` | `warn` | |
| 189 | +| `modelCacheDir` | Directory for downloaded models | `~/.{appName}/cache/models` | |
| 190 | +| `logsDir` | Directory for log files | `~/.{appName}/logs` | |
| 191 | +| `libraryPath` | Path to native Foundry Local Core libraries | Auto-discovered | |
| 192 | +| `serviceEndpoint` | URL of an existing external service to connect to | — | |
| 193 | +| `webServiceUrls` | URL(s) for the embedded web service to bind to | — | |
| 194 | + |
| 195 | +## API Reference |
| 196 | + |
| 197 | +Auto-generated class documentation lives in [`docs/classes/`](docs/classes/): |
| 198 | + |
| 199 | +- [FoundryLocalManager](docs/classes/FoundryLocalManager.md) — SDK entry point, web service management |
| 200 | +- [Catalog](docs/classes/Catalog.md) — Model discovery and browsing |
| 201 | +- [Model](docs/classes/Model.md) — High-level model with variant selection |
| 202 | +- [ModelVariant](docs/classes/ModelVariant.md) — Specific model variant: download, load, inference |
| 203 | +- [ChatClient](docs/classes/ChatClient.md) — Chat completions (sync and streaming) |
| 204 | +- [AudioClient](docs/classes/AudioClient.md) — Audio transcription (sync and streaming) |
| 205 | +- [ModelLoadManager](docs/classes/ModelLoadManager.md) — Low-level model loading management |
| 206 | + |
| 207 | +## Running Tests |
78 | 208 |
|
79 | 209 | ```bash |
80 | 210 | npm test |
81 | 211 | ``` |
82 | 212 |
|
83 | | -See `test/README.md` for more details on setting up and running tests. |
| 213 | +See `test/README.md` for details on prerequisites and setup. |
84 | 214 |
|
85 | 215 | ## Running Examples |
86 | 216 |
|
87 | | -The SDK includes an example script demonstrating chat completion. To run it: |
88 | | - |
89 | | -1. Ensure you have the necessary core libraries and a model available (see Tests Prerequisites). |
90 | | -2. Run the example command: |
91 | | - |
92 | 217 | ```bash |
93 | 218 | npm run example |
94 | 219 | ``` |
95 | 220 |
|
96 | | -This will execute `examples/chat-completion.ts`. |
| 221 | +This runs the chat completion example in `examples/chat-completion.ts`. |
0 commit comments