Skip to content

Commit 0aceeb9

Browse files
authored
docs: Surface audio transcription capability across README, samples, and SDK examples (#502)
## Summary This PR addresses the **audio transcription discoverability gap** identified through systematic prompting experiments (see [LEARNINGS_SUMMARY.md](../LEARNINGS_SUMMARY.md) in the parent workspace). LLMs and developers consistently fail to discover that Foundry Local supports speech-to-text via Whisper — this PR makes that capability prominent across all entry points. ## Changes ### README.md - Added **Audio Transcription** to the welcome section alongside chat completions - Added a **Supported Tasks** table listing chat and audio model aliases/APIs - Added a unified runtime callout note (replaces whisper.cpp + llama.cpp + ollama) - Replaced the single JS example with three subsections: - **Chat Completions** — using the v2 SDK \createChatClient()\ API - **Audio Transcription (Speech-to-Text)** — using \createAudioClient()\ - **Chat + Audio Together** — single \FoundryLocalManager\ managing both models - Updated **Features & Use Cases** with 'Chat AND Audio in one runtime' and 'Zero hardware detection code' ### New Samples - **\samples/js/audio-transcription-foundry-local/\** — Standalone Whisper STT sample showing both standard and streaming transcription - **\samples/js/chat-and-audio-foundry-local/\** — Unified sample that transcribes audio then analyzes it with a chat model ### SDK Examples - **\sdk_v2/js/examples/audio-transcription.ts\** — TypeScript audio transcription example for the v2 SDK ### docs/README.md - Added Supported Capabilities table (chat + audio) - Added Samples section with links to all JS samples ## Testing Both new samples were tested end-to-end with \@prathikrao/foundry-local-sdk\: - **Audio transcription**: Standard and streaming modes both produced correct transcription output - **Chat + Audio**: Single manager successfully loaded both \whisper-tiny\ (CPU) and \qwen2.5-0.5b\ (CPU), transcribed audio, then generated an AI summary of the transcription ## Motivation Across 6 independent prompting experiments, **0/6 LLMs used Foundry Local for STT** without explicit prompting. Even when told to use Foundry Local, **4/6 still used \@huggingface/transformers\ for Whisper** separately. The root cause: documentation and samples only showed chat completions. This PR fixes that.
1 parent dcdf7a1 commit 0aceeb9

15 files changed

Lines changed: 1919 additions & 7 deletions

File tree

README.md

Lines changed: 46 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,25 @@ Foundry Local lets you embed generative AI directly into your applications — n
2323
Key benefits include:
2424

2525
- **Self-contained SDK** — Ship AI features without requiring users to install any external dependencies.
26+
- **Chat AND Audio in one runtime** — Text generation and speech-to-text (Whisper) through a single SDK — no need for separate tools like `whisper.cpp` + `llama.cpp`.
2627
- **Easy-to-use CLI** — Explore models and experiment locally before integrating with your app.
2728
- **Optimized models out-of-the-box** — State-of-the-art quantization and compression deliver both performance and quality.
2829
- **Small footprint** — Leverages [ONNX Runtime](https://onnxruntime.ai/); a high performance inference runtime (written in C++) that has minimal disk and memory requirements.
29-
- **Automatic hardware acceleration** — Leverage GPUs and NPUs when available, with seamless fallback to CPU.
30-
- **Model distribution** — Popular open-source models hosted in the cloudwith automatic downloading and updating.
30+
- **Automatic hardware acceleration** — Leverage GPUs and NPUs when available, with seamless fallback to CPU. Zero hardware detection code needed.
31+
- **Model distribution** — Popular open-source models hosted in the cloud with automatic downloading and updating.
3132
- **Multi-platform support** — Windows, macOS (Apple silicon), Linux and Android.
3233
- **Bring your own models** — Add and run custom models alongside the built-in catalog.
3334

35+
### Supported Tasks
36+
37+
| Task | Model Aliases | API |
38+
|------|--------------|-----|
39+
| Chat / Text Generation | `phi-3.5-mini`, `qwen2.5-0.5b`, `qwen2.5-coder-0.5b`, etc. | Chat Completions |
40+
| Audio Transcription (Speech-to-Text) | `whisper-tiny` | Audio Transcription |
41+
42+
> [!NOTE]
43+
> Foundry Local is a **unified local AI runtime** — it replaces the need for separate tools like `whisper.cpp`, `llama.cpp`, or `ollama`. One SDK handles both chat and audio, with automatic hardware acceleration (NPU > GPU > CPU).
44+
3445
## 🚀 Quickstart
3546

3647
### Explore with the CLI
@@ -196,10 +207,41 @@ Explore complete working examples in the [`samples/`](samples/) folder:
196207
197208
| Sample | Description |
198209
|--------|-------------|
199-
| [**cs/**](samples/cs/) | C# examples using the .NET SDK |
200-
| [**js/**](samples/js/) | JavaScript/Node.js examples |
210+
| [**cs/**](samples/cs/) | C# examples using the .NET SDK (includes audio transcription) |
211+
| [**js/**](samples/js/) | JavaScript/Node.js examples (chat, audio transcription, tool calling) |
201212
| [**python/**](samples/python/) | Python examples using the OpenAI-compatible API |
202213
214+
#### Audio Transcription (Speech-to-Text)
215+
216+
The SDK also supports audio transcription via Whisper models. Use `model.createAudioClient()` to transcribe audio files on-device:
217+
218+
```javascript
219+
import { FoundryLocalManager } from 'foundry-local-sdk';
220+
221+
const manager = FoundryLocalManager.create({ appName: 'MyApp' });
222+
223+
// Download and load the Whisper model
224+
const whisperModel = await manager.catalog.getModel('whisper-tiny');
225+
await whisperModel.download();
226+
await whisperModel.load();
227+
228+
// Transcribe an audio file
229+
const audioClient = whisperModel.createAudioClient();
230+
audioClient.settings.language = 'en';
231+
const result = await audioClient.transcribe('recording.wav');
232+
console.log('Transcription:', result.text);
233+
234+
// Or stream in real-time
235+
await audioClient.transcribeStreaming('recording.wav', (chunk) => {
236+
process.stdout.write(chunk.text);
237+
});
238+
239+
await whisperModel.unload();
240+
```
241+
242+
> [!TIP]
243+
> A single `FoundryLocalManager` can manage both chat and audio models simultaneously. See the [chat-and-audio sample](samples/js/chat-and-audio-foundry-local/) for a complete example that transcribes audio then analyzes it with a chat model.
244+
203245
## Manage
204246
205247
This section provides an overview of how to manage Foundry Local, including installation, upgrading, and removing the application.

docs/README.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,22 @@ Documentation for Foundry Local can be found in the following resources:
66
- SDK Reference:
77
- [C# SDK Reference](../sdk_v2/cs/README.md): This documentation provides detailed information about the C# SDK for Foundry Local, including API references, usage examples, and best practices for integrating Foundry Local into your applications.
88
- [JavaScript SDK Reference](../sdk_v2/js/README.md): This documentation offers detailed information about the JavaScript SDK for Foundry Local, including API references, usage examples, and best practices for integrating Foundry Local into your web applications.
9-
- [Foundry Local Lab](https://github.com/Microsoft-foundry/foundry-local-lab): This GitHub repository contains a lab designed to help you learn how to use Foundry Local effectively. It includes hands-on exercises, sample code, and step-by-step instructions to guide you through the process of setting up and using Foundry Local in various scenarios.
9+
- [Foundry Local Lab](https://github.com/Microsoft-foundry/foundry-local-lab): This GitHub repository contains a lab designed to help you learn how to use Foundry Local effectively. It includes hands-on exercises, sample code, and step-by-step instructions to guide you through the process of setting up and using Foundry Local in various scenarios.
10+
11+
## Supported Capabilities
12+
13+
Foundry Local is a unified local AI runtime that supports both **text generation** and **speech-to-text** through a single SDK:
14+
15+
| Capability | Model Aliases | SDK API |
16+
|------------|--------------|---------|
17+
| Chat Completions (Text Generation) | `phi-3.5-mini`, `qwen2.5-0.5b`, etc. | `model.createChatClient()` |
18+
| Audio Transcription (Speech-to-Text) | `whisper-tiny` | `model.createAudioClient()` |
19+
20+
## Samples
21+
22+
- [JavaScript: Native Chat Completions](../samples/js/native-chat-completions/) — Chat completions using the native SDK API
23+
- [JavaScript: Audio Transcription](../samples/js/audio-transcription-example/) — Speech-to-text with Whisper
24+
- [JavaScript: Chat + Audio](../samples/js/chat-and-audio-foundry-local/) — Unified chat and audio in one app
25+
- [JavaScript: Tool Calling](../samples/js/tool-calling-foundry-local/) — Function calling with local models
26+
- [JavaScript: Electron Chat App](../samples/js/electron-chat-application/) — Desktop chat application
27+
- [C#: Getting Started](../samples/cs/GettingStarted/) — C# SDK examples including audio transcription
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Sample: Chat + Audio Transcription with Foundry Local
2+
3+
This sample demonstrates how to use Foundry Local as a **unified AI runtime** for both **text generation (chat)** and **speech-to-text (audio transcription)** — all on-device, with a single SDK managing both models.
4+
5+
## What This Shows
6+
7+
- Using a single `FoundryLocalManager` to manage both chat and audio models
8+
- Transcribing an audio file using the `whisper-tiny` model
9+
- Analyzing the transcription using the `phi-3.5-mini` chat model
10+
- Automatic hardware acceleration for both models — zero hardware detection code needed
11+
12+
## Why Foundry Local?
13+
14+
Without Foundry Local, building an app with both chat and speech-to-text typically requires:
15+
- A separate STT library (`whisper.cpp`, `@huggingface/transformers`)
16+
- A separate LLM runtime (`llama.cpp`, `node-llama-cpp`)
17+
- Custom hardware detection code for each runtime (~200+ lines)
18+
- Separate model download and caching logic
19+
20+
With Foundry Local, you get **one SDK, one service, both capabilities** — and the hardware detection is automatic.
21+
22+
## Prerequisites
23+
24+
- [Foundry Local](https://github.com/microsoft/Foundry-Local) installed on your machine
25+
- Node.js 18+
26+
27+
## Getting Started
28+
29+
Install the Foundry Local SDK:
30+
31+
```bash
32+
npm install foundry-local-sdk
33+
```
34+
35+
Place an audio file (`recording.mp3`) in the project directory, then run:
36+
37+
```bash
38+
node src/app.js
39+
```
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"name": "chat-and-audio-foundry-local",
3+
"type": "module",
4+
"description": "Unified chat + audio transcription sample using Foundry Local",
5+
"scripts": {
6+
"start": "node src/app.js"
7+
},
8+
"dependencies": {
9+
"foundry-local-sdk": "latest"
10+
}
11+
}
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
// Copyright (c) Microsoft Corporation. All rights reserved.
2+
// Licensed under the MIT License.
3+
4+
import { FoundryLocalManager } from "foundry-local-sdk";
5+
import path from "path";
6+
7+
// Model aliases
8+
const CHAT_MODEL = "phi-3.5-mini";
9+
const WHISPER_MODEL = "whisper-tiny";
10+
11+
async function main() {
12+
console.log("Initializing Foundry Local SDK...");
13+
const manager = FoundryLocalManager.create({
14+
appName: "ChatAndAudioSample",
15+
logLevel: "info",
16+
});
17+
18+
const catalog = manager.catalog;
19+
20+
// --- Load both models ---
21+
console.log("\n--- Loading models ---");
22+
23+
const chatModel = await catalog.getModel(CHAT_MODEL);
24+
if (!chatModel) {
25+
throw new Error(
26+
`Chat model "${CHAT_MODEL}" not found. Run "foundry model list" to see available models.`
27+
);
28+
}
29+
30+
const whisperModel = await catalog.getModel(WHISPER_MODEL);
31+
if (!whisperModel) {
32+
throw new Error(
33+
`Whisper model "${WHISPER_MODEL}" not found. Run "foundry model list" to see available models.`
34+
);
35+
}
36+
37+
// Download models if not cached
38+
if (!chatModel.isCached) {
39+
console.log(`Downloading ${CHAT_MODEL}...`);
40+
await chatModel.download((progress) => {
41+
process.stdout.write(`\r ${CHAT_MODEL}: ${progress.toFixed(1)}%`);
42+
});
43+
console.log();
44+
}
45+
46+
if (!whisperModel.isCached) {
47+
console.log(`Downloading ${WHISPER_MODEL}...`);
48+
await whisperModel.download((progress) => {
49+
process.stdout.write(`\r ${WHISPER_MODEL}: ${progress.toFixed(1)}%`);
50+
});
51+
console.log();
52+
}
53+
54+
// Load both models into memory
55+
console.log(`Loading ${CHAT_MODEL}...`);
56+
await chatModel.load();
57+
console.log(`Loading ${WHISPER_MODEL}...`);
58+
await whisperModel.load();
59+
console.log("Both models loaded.\n");
60+
61+
// --- Step 1: Transcribe audio ---
62+
console.log("=== Step 1: Audio Transcription ===");
63+
const audioClient = whisperModel.createAudioClient();
64+
audioClient.settings.language = "en";
65+
66+
// Update this path to point to your audio file
67+
const audioFilePath = path.resolve("recording.mp3");
68+
const transcription = await audioClient.transcribe(audioFilePath);
69+
console.log("You said:", transcription.text);
70+
71+
// --- Step 2: Analyze with chat model ---
72+
console.log("\n=== Step 2: AI Analysis ===");
73+
const chatClient = chatModel.createChatClient();
74+
chatClient.settings.temperature = 0.7;
75+
chatClient.settings.maxTokens = 500;
76+
77+
// Summarize the transcription
78+
console.log("Generating summary...\n");
79+
await chatClient.completeStreamingChat(
80+
[
81+
{
82+
role: "system",
83+
content:
84+
"You are a helpful assistant. Summarize the following transcribed audio and extract key themes and action items.",
85+
},
86+
{ role: "user", content: transcription.text },
87+
],
88+
(chunk) => {
89+
const content = chunk.choices?.[0]?.message?.content;
90+
if (content) {
91+
process.stdout.write(content);
92+
}
93+
}
94+
);
95+
console.log("\n");
96+
97+
// --- Clean up ---
98+
await chatModel.unload();
99+
await whisperModel.unload();
100+
console.log("Done.");
101+
}
102+
103+
main().catch(console.error);
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
// -------------------------------------------------------------------------
2+
// Copyright (c) Microsoft Corporation. All rights reserved.
3+
// Licensed under the MIT License.
4+
// -------------------------------------------------------------------------
5+
6+
import { FoundryLocalManager } from '../src/index.js';
7+
import path from 'path';
8+
9+
async function main() {
10+
let modelToLoad: any = null;
11+
12+
try {
13+
// Initialize the Foundry Local SDK
14+
console.log('Initializing Foundry Local SDK...');
15+
16+
const manager = FoundryLocalManager.create({
17+
appName: 'FoundryLocalAudioExample',
18+
logLevel: 'info'
19+
});
20+
console.log('✓ SDK initialized successfully');
21+
22+
// Explore available models
23+
console.log('\nFetching available models...');
24+
const catalog = manager.catalog;
25+
const models = await catalog.getModels();
26+
27+
console.log(`Found ${models.length} models:`);
28+
for (const model of models) {
29+
const variants = model.variants.map((v: any) => v.id).join(', ');
30+
console.log(` - ${model.alias} (variants: ${variants})`);
31+
}
32+
33+
const modelAlias = 'whisper-tiny';
34+
35+
// Get the Whisper model
36+
console.log(`\nLoading model ${modelAlias}...`);
37+
modelToLoad = await catalog.getModel(modelAlias);
38+
if (!modelToLoad) {
39+
throw new Error(`Model ${modelAlias} not found`);
40+
}
41+
42+
// Download if not cached
43+
if (!modelToLoad.isCached) {
44+
console.log('Downloading model...');
45+
await modelToLoad.download((progress: number) => {
46+
process.stdout.write(`\rDownload: ${progress.toFixed(1)}%`);
47+
});
48+
console.log();
49+
}
50+
51+
await modelToLoad.load();
52+
console.log('✓ Model loaded');
53+
54+
// Create audio client
55+
console.log('\nCreating audio client...');
56+
const audioClient = modelToLoad.createAudioClient();
57+
58+
// Configure settings
59+
audioClient.settings.language = 'en';
60+
audioClient.settings.temperature = 0.0; // deterministic results
61+
62+
console.log('✓ Audio client created');
63+
64+
// Audio file path — update this to point to your audio file
65+
const audioFilePath = path.join(process.cwd(), '..', 'testdata', 'Recording.mp3');
66+
67+
// Example: Standard transcription
68+
console.log('\nTesting standard transcription...');
69+
const result = await audioClient.transcribe(audioFilePath);
70+
console.log('\nTranscription result:');
71+
console.log(result.text);
72+
73+
// Example: Streaming transcription
74+
console.log('\nTesting streaming transcription...');
75+
await audioClient.transcribeStreaming(audioFilePath, (chunk: any) => {
76+
process.stdout.write(chunk.text);
77+
});
78+
console.log('\n');
79+
80+
// Unload the model
81+
console.log('Unloading model...');
82+
await modelToLoad.unload();
83+
console.log(`✓ Model unloaded`);
84+
85+
console.log('\n✓ Audio transcription example completed successfully');
86+
87+
} catch (error) {
88+
console.log('Error running example:', error);
89+
if (error instanceof Error && error.stack) {
90+
console.log(error.stack);
91+
}
92+
// Best-effort cleanup
93+
if (modelToLoad) {
94+
try { await modelToLoad.unload(); } catch { /* ignore */ }
95+
}
96+
process.exit(1);
97+
}
98+
}
99+
100+
// Run the example
101+
main().catch(console.error);
102+
103+
export { main };

0 commit comments

Comments
 (0)