AI Video Transcriber

AI Video Transcriber is a powerful and user-friendly application that allows you to transcribe audio from video files using state-of-the-art speech recognition technologies. It supports both Whisper and Vosk models for accurate transcription across multiple languages.

Description

This application provides a graphical user interface for transcribing video files. It extracts the audio from video files, applies noise reduction, and then uses either the Whisper or Vosk speech recognition model to generate accurate transcriptions. The tool also offers features like subtitle generation, multiple output formats, and advanced settings for fine-tuning the transcription process.

Features

Support for multiple video file formats (mp4, avi, mov, mkv, flv, wmv)
Audio extraction from video files
Noise reduction for improved transcription accuracy
Choice between Whisper and Vosk speech recognition engines
Multiple Whisper model sizes (tiny, base, small, medium, large, large-v2)
Support for custom Vosk model directories
Automatic language detection or manual language selection
Multiple output formats (Text, JSON, CSV, DOCX)
Subtitle (.srt) file generation
Advanced settings for fine-tuning transcription parameters
Progress tracking and resource usage monitoring
Logging of transcription process

Requirements

Python 3.6+
FFmpeg
PyTorch
Whisper
Vosk
Other dependencies listed in requirements.txt (to be created)

Installation

Clone this repository:

git clone https://github.com/Saddytech/ai-video-transcriber.git
cd ai-video-transcriber

Install FFmpeg:
- On Windows: Download from ffmpeg.org and add to PATH
- On macOS: brew install ffmpeg
- On Linux: sudo apt-get install ffmpeg
Install Python dependencies:
```
pip install -r requirements.txt
```
Download Whisper models (optional): The application will download models automatically, but you can pre-download them from OpenAI's Whisper repository.
Download Vosk models (optional): If you plan to use Vosk, download models from the Vosk website.

Usage

Run the application:
```
python transcript.py
```
Use the GUI to:
- Select video file(s) for transcription
- Choose an output directory
- Select the speech recognition engine (Whisper or Vosk)
- Choose the model size or path
- Set the transcription language (or use automatic detection)
- Adjust advanced settings if needed
- Start the transcription process
Monitor the progress and resource usage in the application window.
Find the transcription results and subtitle files in your chosen output directory.

Advanced Settings

Beam Size: For Whisper model, sets the beam size for beam search decoding.
Best Of: For Whisper model, sets the number of candidates when sampling with non-zero temperature.
Temperature: For Whisper model, sets the temperature for sampling. Lower values make output more deterministic.
Segment Length: Sets the length of audio segments for processing. Shorter segments use less memory.

Contributing

Contributions to the AI Video Transcriber are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
requirements.txt		requirements.txt
sfogoprime.txt		sfogoprime.txt
transcript.py		transcript.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Video Transcriber

Description

Features

Requirements

Installation

Usage

Advanced Settings

Contributing

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Video Transcriber

Description

Features

Requirements

Installation

Usage

Advanced Settings

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages