π¬ VidGen is a powerful, local-first AI video editing pipeline that transforms raw video footage into engaging short-form content (TikTok/YouTube Shorts style) with automatic scene detection, AI-generated scripts, voice-over, and synchronized subtitles.
- π― Automatic Scene Detection: Intelligently identifies and scores interesting moments in your footage
- π€ AI Script Generation: Creates engaging narratives using local LLMs (Mistral, etc.) or OpenAI API
- ποΈ Text-to-Speech: High-quality voice-over generation with Coqui TTS or Bark
- π Word-Level Subtitles: Precise subtitle timing using Whisper with word-by-word highlighting
- π¨ Professional Assembly: Automatic video composition with transitions, resizing, and effects
- π» Local-First: Runs entirely on your machine - no cloud required (optional API support)
- π₯οΈ Interactive UI: Beautiful terminal interface built with Textual
- π Web UI: Modern web interface with real-time progress tracking (built with Bun + React)
- β‘ GPU Acceleration: Supports CUDA/Metal for faster processing
Dashboard - Real-Time Job Monitoring:
Dashboard showing input/output statistics, running jobs, and WebSocket connection status
Generate - Easy Video Creation:
Intuitive interface with drag & drop upload, configuration options, and real-time progress tracking
Transform this:
πΉ video1.mp4 (5 min)
πΉ video2.mp4 (8 min)
πΉ video3.mp4 (3 min)
Into this:
β¨ engaging_video_20240115_143022.mp4 (60 seconds)
- Auto-selected best moments
- AI-generated script
- Professional voice-over
- Synced subtitles
- Vertical 9:16 format ready for social media
- Python 3.9 or higher
- FFmpeg installed and in PATH
- (Optional) CUDA-capable GPU for acceleration
-
Clone the repository
git clone https://github.com/yourusername/vidgen.git cd vidgen -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Run setup
python cli.py setup
Start the web interface:
cd webui
./start.shThen open http://localhost:8000 in your browser.
Key Features:
- β¨ Drag & drop video uploads - Intuitive file management
- π Real-time progress tracking - WebSocket-powered live updates with progress bars
- π Job management dashboard - Monitor all your video generation tasks
- πΎ Direct downloads - One-click access to generated videos
- π¨ Modern, responsive UI - Beautiful dark theme interface (see screenshots above)
- π Secure & production-ready - Complete security hardening implemented
- π€ Optional authentication - Works with or without user accounts
- π Background processing - Queue system with Laravel jobs
Starting Services:
# All-in-one script (recommended)
./start.sh
# Or start individually in separate terminals:
php artisan serve # Web server (port 8000)
php artisan reverb:start # WebSocket (port 8080)
php artisan queue:work # Background jobsDocumentation:
- π Quick Start Guide - Get started in 5 minutes
- π User Guide - Complete walkthrough
- π Security Documentation - Production deployment & security
- π§ Troubleshooting - Common issues & solutions
- π Complete Documentation - Everything in one place
python app.pyThis launches the full interactive terminal application with:
- File browser for selecting videos
- Configuration forms
- Live progress tracking
- Results display
Generate a video:
python cli.py generate "Amazing Nature Facts" \
-i data/input/video1.mp4 \
-i data/input/video2.mp4 \
-o output/nature.mp4 \
--duration 60 \
--style engagingAnalyze a video:
python cli.py analyze data/input/video.mp4Generate just a script:
python cli.py script "Cooking Tips" --style educationalCheck system info:
python cli.py info- Input: Place raw MP4/MOV files in
data/input/ - Analysis: AI detects scenes and scores interesting moments
- Script: LLM generates an engaging narrative for your topic
- Voice: TTS creates professional voice-over
- Subtitles: Whisper provides word-level timing
- Assembly: Clips are composed with subtitles and audio
- Output: Final video saved to
data/output/
Edit config/config.yaml to customize:
video:
target_duration: 60 # seconds
resolution:
width: 1080
height: 1920 # 9:16 vertical
fps: 30
script:
llm:
provider: "local" # or "openai"
local_model: "mistralai/Mistral-7B-Instruct-v0.2"
style: "engaging" # engaging, educational, funny, dramatic
tts:
engine: "coqui" # or "bark"
subtitles:
word_by_word: true
style:
font_size: 48
position: "center"- Engaging: Hook-driven, conversational, perfect for general content
- Educational: Informative, clear, great for tutorials
- Funny: Lighthearted, humorous tone
- Dramatic: Intense, suspenseful narration
Set your API key:
export OPENAI_API_KEY="your-key-here"Update config:
script:
llm:
provider: "openai"The pipeline automatically detects and uses GPU when available:
- NVIDIA: CUDA (h264_nvenc)
- AMD: AMF (h264_amf)
- Mac: VideoToolbox (h264_videotoolbox)
tts:
coqui:
speaker_wav: "path/to/reference_voice.wav"vidgen/
βββ app.py # Interactive Textual UI
βββ cli.py # Command-line interface
βββ config/
β βββ config.yaml # Main configuration
βββ src/
β βββ pipeline.py # Main orchestrator
β βββ modules/
β β βββ video_analyzer.py # Scene detection
β β βββ script_generator.py # LLM script generation
β β βββ voice_generator.py # TTS
β β βββ subtitle_generator.py # Whisper subtitles
β β βββ video_assembler.py # Final composition
β βββ utils/
β βββ logger.py # Logging utilities
β βββ file_manager.py # File operations
β βββ tui_manager.py # Progress display
βββ data/
β βββ input/ # Place raw videos here
β βββ output/ # Generated videos
β βββ temp/ # Temporary files
βββ models/ # Downloaded AI models
βββ logs/ # Application logs
- Default: Mistral-7B-Instruct-v0.2
- Alternatives: Llama-2, GPT-J, etc.
- Requires: 8GB+ RAM/VRAM
- Models: tiny, base, small, medium, large
- Base model recommended (balance of speed/accuracy)
- Coqui TTS: Fast, good quality
- Bark: More natural, slower
FFmpeg not found:
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Windows
# Download from https://ffmpeg.org/download.htmlOut of memory:
- Use smaller Whisper model (
tinyorbase) - Switch to OpenAI API for script generation
- Reduce video resolution in config
Slow processing:
- Enable GPU acceleration
- Use smaller AI models
- Process shorter clips
Import errors:
pip install -r requirements.txt --force-reinstallApproximate processing times (60s output video):
| Hardware | Time |
|---|---|
| RTX 3080 + 16GB RAM | ~3-5 min |
| M1 Mac + 16GB RAM | ~5-8 min |
| CPU only (i7) + 16GB RAM | ~15-25 min |
Contributions welcome! Areas for improvement:
- Background music integration
- More transition effects
- Web UI
- Direct social media upload
- Real-time preview
- Batch processing
Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
This software is licensed for non-commercial use only. You are free to:
- β Use for personal projects
- β Use for education and research
- β Share and modify the code
- β Create derivative works
Commercial use is NOT permitted without written permission. This includes:
- β Selling videos generated by this software
- β Using in a business or for-profit organization
- β Monetizing content (ads, sponsorships, etc.)
- β Providing paid services using this software
For commercial licensing inquiries, please contact the copyright holders.
See LICENSE file for complete terms.
Built with:
- Whisper - OpenAI
- Coqui TTS
- Transformers - Hugging Face
- MoviePy
- Textual
- PySceneDetect