VidGen - AI Video Generation Pipeline

🎬 VidGen is a powerful, local-first AI video editing pipeline that transforms raw video footage into engaging short-form content (TikTok/YouTube Shorts style) with automatic scene detection, AI-generated scripts, voice-over, and synchronized subtitles.

✨ Features

🎯 Automatic Scene Detection: Intelligently identifies and scores interesting moments in your footage
🤖 AI Script Generation: Creates engaging narratives using local LLMs (Mistral, etc.) or OpenAI API
🎙️ Text-to-Speech: High-quality voice-over generation with Coqui TTS or Bark
📝 Word-Level Subtitles: Precise subtitle timing using Whisper with word-by-word highlighting
🎨 Professional Assembly: Automatic video composition with transitions, resizing, and effects
💻 Local-First: Runs entirely on your machine - no cloud required (optional API support)
🖥️ Interactive UI: Beautiful terminal interface built with Textual
🌐 Web UI: Modern web interface with real-time progress tracking (built with Bun + React)
⚡ GPU Acceleration: Supports CUDA/Metal for faster processing

🎥 Demo

🌐 Web Interface

Dashboard - Real-Time Job Monitoring:

Dashboard showing input/output statistics, running jobs, and WebSocket connection status

Generate - Easy Video Creation:

Intuitive interface with drag & drop upload, configuration options, and real-time progress tracking

What VidGen Does

Transform this:

📹 video1.mp4 (5 min)
📹 video2.mp4 (8 min)
📹 video3.mp4 (3 min)

Into this:

✨ engaging_video_20240115_143022.mp4 (60 seconds)
   - Auto-selected best moments
   - AI-generated script
   - Professional voice-over
   - Synced subtitles
   - Vertical 9:16 format ready for social media

🚀 Quick Start

Prerequisites

Python 3.9 or higher
FFmpeg installed and in PATH
(Optional) CUDA-capable GPU for acceleration

Installation

Clone the repository

git clone https://github.com/yourusername/vidgen.git
cd vidgen

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Run setup
```
python cli.py setup
```

Usage

🌐 Web UI (Recommended)

Start the web interface:

cd webui
./start.sh

Then open http://localhost:8000 in your browser.

Key Features:

✨ Drag & drop video uploads - Intuitive file management
🔄 Real-time progress tracking - WebSocket-powered live updates with progress bars
📊 Job management dashboard - Monitor all your video generation tasks
💾 Direct downloads - One-click access to generated videos
🎨 Modern, responsive UI - Beautiful dark theme interface (see screenshots above)
🔐 Secure & production-ready - Complete security hardening implemented
👤 Optional authentication - Works with or without user accounts
🚀 Background processing - Queue system with Laravel jobs

Starting Services:

# All-in-one script (recommended)
./start.sh

# Or start individually in separate terminals:
php artisan serve          # Web server (port 8000)
php artisan reverb:start   # WebSocket (port 8080)
php artisan queue:work     # Background jobs

Documentation:

📖 Quick Start Guide - Get started in 5 minutes
📚 User Guide - Complete walkthrough
🔐 Security Documentation - Production deployment & security
🔧 Troubleshooting - Common issues & solutions
📋 Complete Documentation - Everything in one place

🖥️ Interactive Terminal UI

python app.py

This launches the full interactive terminal application with:

File browser for selecting videos
Configuration forms
Live progress tracking
Results display

🔧 Command Line Interface

Generate a video:

python cli.py generate "Amazing Nature Facts" \
  -i data/input/video1.mp4 \
  -i data/input/video2.mp4 \
  -o output/nature.mp4 \
  --duration 60 \
  --style engaging

Analyze a video:

python cli.py analyze data/input/video.mp4

Generate just a script:

python cli.py script "Cooking Tips" --style educational

Check system info:

python cli.py info

📋 Workflow

Input: Place raw MP4/MOV files in data/input/
Analysis: AI detects scenes and scores interesting moments
Script: LLM generates an engaging narrative for your topic
Voice: TTS creates professional voice-over
Subtitles: Whisper provides word-level timing
Assembly: Clips are composed with subtitles and audio
Output: Final video saved to data/output/

⚙️ Configuration

Edit config/config.yaml to customize:

video:
  target_duration: 60  # seconds
  resolution:
    width: 1080
    height: 1920  # 9:16 vertical
  fps: 30

script:
  llm:
    provider: "local"  # or "openai"
    local_model: "mistralai/Mistral-7B-Instruct-v0.2"
  style: "engaging"  # engaging, educational, funny, dramatic

tts:
  engine: "coqui"  # or "bark"
  
subtitles:
  word_by_word: true
  style:
    font_size: 48
    position: "center"

🎨 Video Styles

Engaging: Hook-driven, conversational, perfect for general content
Educational: Informative, clear, great for tutorials
Funny: Lighthearted, humorous tone
Dramatic: Intense, suspenseful narration

🔧 Advanced Usage

Using OpenAI API

Set your API key:

export OPENAI_API_KEY="your-key-here"

Update config:

script:
  llm:
    provider: "openai"

GPU Acceleration

The pipeline automatically detects and uses GPU when available:

NVIDIA: CUDA (h264_nvenc)
AMD: AMF (h264_amf)
Mac: VideoToolbox (h264_videotoolbox)

Custom Voice Cloning

tts:
  coqui:
    speaker_wav: "path/to/reference_voice.wav"

📁 Project Structure

vidgen/
├── app.py                    # Interactive Textual UI
├── cli.py                    # Command-line interface
├── config/
│   └── config.yaml          # Main configuration
├── src/
│   ├── pipeline.py          # Main orchestrator
│   ├── modules/
│   │   ├── video_analyzer.py       # Scene detection
│   │   ├── script_generator.py     # LLM script generation
│   │   ├── voice_generator.py      # TTS
│   │   ├── subtitle_generator.py   # Whisper subtitles
│   │   └── video_assembler.py      # Final composition
│   └── utils/
│       ├── logger.py        # Logging utilities
│       ├── file_manager.py  # File operations
│       └── tui_manager.py   # Progress display
├── data/
│   ├── input/              # Place raw videos here
│   ├── output/             # Generated videos
│   └── temp/               # Temporary files
├── models/                 # Downloaded AI models
└── logs/                   # Application logs

🧠 AI Models

Local LLM (Script Generation)

Default: Mistral-7B-Instruct-v0.2
Alternatives: Llama-2, GPT-J, etc.
Requires: 8GB+ RAM/VRAM

Whisper (Subtitles)

Models: tiny, base, small, medium, large
Base model recommended (balance of speed/accuracy)

TTS

Coqui TTS: Fast, good quality
Bark: More natural, slower

🐛 Troubleshooting

FFmpeg not found:

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Out of memory:

Use smaller Whisper model (tiny or base)
Switch to OpenAI API for script generation
Reduce video resolution in config

Slow processing:

Enable GPU acceleration
Use smaller AI models
Process shorter clips

Import errors:

pip install -r requirements.txt --force-reinstall

📊 Performance

Approximate processing times (60s output video):

Hardware	Time
RTX 3080 + 16GB RAM	~3-5 min
M1 Mac + 16GB RAM	~5-8 min
CPU only (i7) + 16GB RAM	~15-25 min

🤝 Contributing

Contributions welcome! Areas for improvement:

📄 License

Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

This software is licensed for non-commercial use only. You are free to:

✅ Use for personal projects
✅ Use for education and research
✅ Share and modify the code
✅ Create derivative works

Commercial use is NOT permitted without written permission. This includes:

❌ Selling videos generated by this software
❌ Using in a business or for-profit organization
❌ Monetizing content (ads, sponsorships, etc.)
❌ Providing paid services using this software

For commercial licensing inquiries, please contact the copyright holders.

See LICENSE file for complete terms.

🙏 Acknowledgments

Built with:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
data		data
docs		docs
logs		logs
models		models
scripts		scripts
src		src
tests		tests
webui		webui
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cli.py		cli.py
install_dependencies.sh		install_dependencies.sh
pytest.ini		pytest.ini
requirements-flexible.txt		requirements-flexible.txt
requirements-frozen.txt		requirements-frozen.txt
requirements.txt		requirements.txt
setup.sh		setup.sh
setup_imagemagick.sh		setup_imagemagick.sh
setup_ollama_model.sh		setup_ollama_model.sh
tui.py		tui.py

Folders and files

Latest commit

History

Repository files navigation

VidGen - AI Video Generation Pipeline

✨ Features

🎥 Demo

🌐 Web Interface

What VidGen Does

🚀 Quick Start

Prerequisites

Installation

Usage

🌐 Web UI (Recommended)

🖥️ Interactive Terminal UI

🔧 Command Line Interface

📋 Workflow

⚙️ Configuration

🎨 Video Styles

🔧 Advanced Usage

Using OpenAI API

GPU Acceleration

Custom Voice Cloning

📁 Project Structure

🧠 AI Models

Local LLM (Script Generation)

Whisper (Subtitles)

TTS

🐛 Troubleshooting

📊 Performance

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages