Ech👂 Voice Typing Assistant

Preview.mp4

Echo is a blazing-fast, privacy-first, push-to-talk voice typing assistant. It runs locally on your machine, leveraging the power of OpenAI's Whisper to transcribe your speech into text and automatically insert it wherever your text cursor is active. It also features seamless auto-translation to english language: speak in your native language (Russian, Spanish, German, etc.), and Echo will instantly translate it to perfect English. This makes it an ideal tool for bilingual workflows, coding, and writing documentation! (Note: Auto-translation works best with English-only models like ggml-base.en.bin). It features a "Hot Mic" architecture for zero-latency recording and advanced Voice Activity Detection (VAD) to ensure perfect transcriptions without cutting off your first words.

⚙️ Prerequisites

Before running the application, ensure you have:

A working Microphone.
Whisper Model: A compatible .bin Whisper model file.

🧠 How to Choose the Right Model:

Echo uses Whisper GGML models (e.g. ggml-medium.bin). The model you choose determines the speed and accuracy of your dictation.
You can download them from HuggingFace: ggerganov/whisper.cpp.

English vs. Multilingual:

If you only dictate in English, always download models with the .en suffix (e.g., small.en.bin). They are faster, more accurate, and hallucinate much less.
If you dictate in other languages (or mix them), use the standard models (e.g., small.bin). Note: Large models do not have .en versions as they are inherently multilingual.

Model Hardware Requirements (My advice is to try multiple models)

Model Size	Approx. VRAM / RAM	Speed	Accuracy	Hardware Recommendation
Tiny	~500 MB	Blazing Fast	Basic	Potato PCs, quick testing.
Base	~1 GB	Very Fast	Acceptable	Older laptops or CPU-only inference.
Small	~2 GB	Fast	Good	Recommended. Best balance for most modern PCs.
Medium	~3.5 GB	Moderate	Very Good	Dedicated GPUs (e.g., GTX 1060 or better).
Large	~6 GB+	Slow	Excellent	Modern GPUs (RTX 3060+). Perfect for complex jargon.

📊 Average Benchmarks

Performance depends on your hardware and the chosen model. Below are average inference times for my standard session:

Hardware	Model	Average Inference Time (very short session: 10-20 sec)
CPU	`ggml-small`	~500 ms
	`ggml-base`	~1 sec
	`ggml-medium`	~4 sec
CUDA (tested on RTX 5070 Ti)	`ggml-base`	~400 ms
	`ggml-small`	~500 ms
	`ggml-medium`	~600 ms
	`ggml-large-v3-turbo`	~500 ms

🚀 Getting Started

1. Installation

Option A: Download Pre-built Release (Recommended)

Go to the Releases page of this repository.
Download the latest Echo-win-x64.zip file.
Extract the folder to your preferred location.

Option B: Build from Source

Clone the repository: git clone https://github.com/GithubPhobos/Echo
Navigate to the folder: cd Echo
Build the project: dotnet publish -c Release -r win-x64 --self-contained true

2. Assets Configuration

Ensure there is an Assets folder in the root directory alongside the Echo.exe executable. That folder contains start-recording.wav and stop-recording.wav for audible push-to-talk feedback.
Place your downloaded Whisper model file into the Assets folder.
Open appsettings.json to customize the application (all settings are documented inline). Key settings include:
- WhisperSettings.ModelName: Must match the exact name of the model you placed in the Assets folder.
- PushToTalkSettings.Key: The global hotkey to trigger recording (Default is "`").
- Serilog.MinimumLevel.Default: Available log levels are Debug, Information, Warning, Error.
- WhisperSettings.Prompt: The initial context provided to the AI. Use this to specify complex domain terminology, define your preferred punctuation style, or provide a baseline vocabulary to help the model transcribe your speech more accurately.

3. Hardware Acceleration Setup 🚀

For maximum speed, configure the HardwareBackend in appsettings.json based on your system:

NVIDIA (CUDA) - Maximum Speed (Only if you have an NVIDIA graphics card)

Ensure your NVIDIA graphics drivers are up to date.
Download the required CUDA redistributable libraries from the NVIDIA Developer Archive. You will need files from cuda_cudart and libcublas.
Extract and place the following specific .dll files next to Echo.exe:
- cublas64_13.dll
- cublasLt64_13.dll
- cudart64_13.dll

AMD / Intel / Basic NVIDIA (Vulkan)

Works with AMD Adrenalin, Intel Arc Graphics, or standard NVIDIA drivers.
You don't need to install anything, because the required vulkan-1.dll is automatically installed by Windows with your GPU drivers.
Set "HardwareBackend": "Vulkan" in appsettings.json.

CPU Only

Set "HardwareBackend": "CPU". No extra steps required.

4. Running the App

Launch Echo.exe.
A console window will appear, initializing all the required settings.
Follow the logs—they are designed to be extremely readable so you can instantly tell if something goes wrong.
Hold down your designated Push-To-Talk key, speak your thoughts, and release the key.
The transcribed text will automatically be saved to your clipboard and typed into your active window right under your cursor if you have enabled 'UseAutoInsert'.

🛠️ Troubleshooting

Issue: Couldn't find recording devices in the system, shutting down...

Cause: Windows privacy settings are blocking access to your microphone, or no microphone is plugged in.
Fix: Go to Windows Settings -> Privacy & security -> Microphone. Ensure "Let desktop apps access your microphone" is turned ON.

Issue: BadDeviceId calling waveOutOpen or application crashes on startup.

Cause: You have audio feedback enabled ("PlaySound": true), but your system currently has no active audio output devices (speakers/headphones).
Fix: Connect a speaker/headphone or set "PlaySound": false in appsettings.json.

Issue: The application types out random words, hallucinations, or just the prompt instead of what you said.

Cause: Your microphone is too quiet, and the Voice Activity Detection (VAD) cut off your speech, sending a silent audio file to the AI. The AI hallucinated based on the default context prompt.
Fix: Lower the "SilenceThreshold" in appsettings.json (e.g., from 0.03 to 0.01 or 0.005). You can also enable "OutputMicAmplitudeDebugInfo": true to see your actual mic levels in the console.

Issue: The AI transcribes keyboard clicks, mechanical sounds, or the app's own "beep" as random words.

Cause: Your microphone is picking up the physical sound of your Push-To-Talk keystroke or the application's audio feedback. The VAD registers this sharp noise as speech.
Fix: Lower the application's audio feedback "Volume" in appsettings.json (e.g., to 0.05) and slightly increase the "SilenceThreshold" (e.g., to 0.02) so the VAD ignores these background noises. Positioning your microphone further away from the keyboard also helps.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
Echo.Tests		Echo.Tests
Echo		Echo
.gitattributes		.gitattributes
.gitignore		.gitignore
Echo.sln		Echo.sln
LICENSE.txt		LICENSE.txt
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ech👂 Voice Typing Assistant

⚙️ Prerequisites

🧠 How to Choose the Right Model:

Model Hardware Requirements (My advice is to try multiple models)

📊 Average Benchmarks

🚀 Getting Started

1. Installation

2. Assets Configuration

3. Hardware Acceleration Setup 🚀

4. Running the App

🛠️ Troubleshooting

About

Uh oh!

Releases 2

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Ech👂 Voice Typing Assistant

⚙️ Prerequisites

🧠 How to Choose the Right Model:

Model Hardware Requirements (My advice is to try multiple models)

📊 Average Benchmarks

🚀 Getting Started

1. Installation

2. Assets Configuration

3. Hardware Acceleration Setup 🚀

4. Running the App

🛠️ Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages