Preview.mp4
Echo is a blazing-fast, privacy-first, push-to-talk voice typing assistant. It runs locally on your machine, leveraging the power of OpenAI's Whisper to transcribe your speech into text and automatically insert it wherever your text cursor is active.
It also features seamless auto-translation to english language: speak in your native language (Russian, Spanish, German, etc.), and Echo will instantly translate it to perfect English.
This makes it an ideal tool for bilingual workflows, coding, and writing documentation! (Note: Auto-translation works best with English-only models like ggml-base.en.bin).
It features a "Hot Mic" architecture for zero-latency recording and advanced Voice Activity Detection (VAD) to ensure perfect transcriptions without cutting off your first words.
Before running the application, ensure you have:
- A working Microphone.
- Whisper Model: A compatible
.binWhisper model file.
Echo uses Whisper GGML models (e.g. ggml-medium.bin). The model you choose determines the speed and accuracy of your dictation.
You can download them from HuggingFace: ggerganov/whisper.cpp.
English vs. Multilingual:
- If you only dictate in English, always download models with the
.ensuffix (e.g.,small.en.bin). They are faster, more accurate, and hallucinate much less. - If you dictate in other languages (or mix them), use the standard models (e.g.,
small.bin). Note: Large models do not have.enversions as they are inherently multilingual.
| Model Size | Approx. VRAM / RAM | Speed | Accuracy | Hardware Recommendation |
|---|---|---|---|---|
| Tiny | ~500 MB | Blazing Fast | Basic | Potato PCs, quick testing. |
| Base | ~1 GB | Very Fast | Acceptable | Older laptops or CPU-only inference. |
| Small | ~2 GB | Fast | Good | Recommended. Best balance for most modern PCs. |
| Medium | ~3.5 GB | Moderate | Very Good | Dedicated GPUs (e.g., GTX 1060 or better). |
| Large | ~6 GB+ | Slow | Excellent | Modern GPUs (RTX 3060+). Perfect for complex jargon. |
Performance depends on your hardware and the chosen model. Below are average inference times for my standard session:
| Hardware | Model | Average Inference Time (very short session: 10-20 sec) |
|---|---|---|
| CPU | ggml-small |
~500 ms |
ggml-base |
~1 sec | |
ggml-medium |
~4 sec | |
| CUDA (tested on RTX 5070 Ti) | ggml-base |
~400 ms |
ggml-small |
~500 ms | |
ggml-medium |
~600 ms | |
ggml-large-v3-turbo |
~500 ms |
Option A: Download Pre-built Release (Recommended)
- Go to the Releases page of this repository.
- Download the latest
Echo-win-x64.zipfile. - Extract the folder to your preferred location.
Option B: Build from Source
- Clone the repository:
git clone https://github.com/GithubPhobos/Echo - Navigate to the folder:
cd Echo - Build the project:
dotnet publish -c Release -r win-x64 --self-contained true
- Ensure there is an
Assetsfolder in the root directory alongside theEcho.exeexecutable. That folder containsstart-recording.wavandstop-recording.wavfor audible push-to-talk feedback. - Place your downloaded Whisper model file into the
Assetsfolder. - Open
appsettings.jsonto customize the application (all settings are documented inline). Key settings include:WhisperSettings.ModelName: Must match the exact name of the model you placed in the Assets folder.PushToTalkSettings.Key: The global hotkey to trigger recording (Default is "`").Serilog.MinimumLevel.Default: Available log levels areDebug,Information,Warning,Error.WhisperSettings.Prompt: The initial context provided to the AI. Use this to specify complex domain terminology, define your preferred punctuation style, or provide a baseline vocabulary to help the model transcribe your speech more accurately.
For maximum speed, configure the HardwareBackend in appsettings.json based on your system:
NVIDIA (CUDA) - Maximum Speed (Only if you have an NVIDIA graphics card)
- Ensure your NVIDIA graphics drivers are up to date.
- Download the required CUDA redistributable libraries from the NVIDIA Developer Archive.
You will need files from
cuda_cudartandlibcublas. - Extract and place the following specific
.dllfiles next toEcho.exe:cublas64_13.dllcublasLt64_13.dllcudart64_13.dll
AMD / Intel / Basic NVIDIA (Vulkan)
- Works with AMD Adrenalin, Intel Arc Graphics, or standard NVIDIA drivers.
- You don't need to install anything, because the required
vulkan-1.dllis automatically installed by Windows with your GPU drivers. - Set
"HardwareBackend": "Vulkan"inappsettings.json.
CPU Only
- Set
"HardwareBackend": "CPU". No extra steps required.
- Launch
Echo.exe. - A console window will appear, initializing all the required settings.
- Follow the logs—they are designed to be extremely readable so you can instantly tell if something goes wrong.
- Hold down your designated Push-To-Talk key, speak your thoughts, and release the key.
- The transcribed text will automatically be saved to your clipboard and typed into your active window right under your cursor if you have enabled 'UseAutoInsert'.
Issue: Couldn't find recording devices in the system, shutting down...
- Cause: Windows privacy settings are blocking access to your microphone, or no microphone is plugged in.
- Fix: Go to Windows Settings -> Privacy & security -> Microphone. Ensure "Let desktop apps access your microphone" is turned ON.
Issue: BadDeviceId calling waveOutOpen or application crashes on startup.
- Cause: You have audio feedback enabled (
"PlaySound": true), but your system currently has no active audio output devices (speakers/headphones). - Fix: Connect a speaker/headphone or set
"PlaySound": falseinappsettings.json.
Issue: The application types out random words, hallucinations, or just the prompt instead of what you said.
- Cause: Your microphone is too quiet, and the Voice Activity Detection (VAD) cut off your speech, sending a silent audio file to the AI. The AI hallucinated based on the default context prompt.
- Fix: Lower the
"SilenceThreshold"inappsettings.json(e.g., from0.03to0.01or0.005). You can also enable"OutputMicAmplitudeDebugInfo": trueto see your actual mic levels in the console.
Issue: The AI transcribes keyboard clicks, mechanical sounds, or the app's own "beep" as random words.
- Cause: Your microphone is picking up the physical sound of your Push-To-Talk keystroke or the application's audio feedback. The VAD registers this sharp noise as speech.
- Fix: Lower the application's audio feedback
"Volume"inappsettings.json(e.g., to0.05) and slightly increase the"SilenceThreshold"(e.g., to0.02) so the VAD ignores these background noises. Positioning your microphone further away from the keyboard also helps.