Skip to content

Mel-Raeven/vidgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VidGen - AI Video Generation Pipeline

Python 3.9+ License: CC BY-NC 4.0

🎬 VidGen is a powerful, local-first AI video editing pipeline that transforms raw video footage into engaging short-form content (TikTok/YouTube Shorts style) with automatic scene detection, AI-generated scripts, voice-over, and synchronized subtitles.

✨ Features

  • 🎯 Automatic Scene Detection: Intelligently identifies and scores interesting moments in your footage
  • πŸ€– AI Script Generation: Creates engaging narratives using local LLMs (Mistral, etc.) or OpenAI API
  • πŸŽ™οΈ Text-to-Speech: High-quality voice-over generation with Coqui TTS or Bark
  • πŸ“ Word-Level Subtitles: Precise subtitle timing using Whisper with word-by-word highlighting
  • 🎨 Professional Assembly: Automatic video composition with transitions, resizing, and effects
  • πŸ’» Local-First: Runs entirely on your machine - no cloud required (optional API support)
  • πŸ–₯️ Interactive UI: Beautiful terminal interface built with Textual
  • 🌐 Web UI: Modern web interface with real-time progress tracking (built with Bun + React)
  • ⚑ GPU Acceleration: Supports CUDA/Metal for faster processing

πŸŽ₯ Demo

🌐 Web Interface

Dashboard - Real-Time Job Monitoring:

VidGen Dashboard Dashboard showing input/output statistics, running jobs, and WebSocket connection status

Generate - Easy Video Creation:

VidGen Generate Interface Intuitive interface with drag & drop upload, configuration options, and real-time progress tracking

What VidGen Does

Transform this:

πŸ“Ή video1.mp4 (5 min)
πŸ“Ή video2.mp4 (8 min)
πŸ“Ή video3.mp4 (3 min)

Into this:

✨ engaging_video_20240115_143022.mp4 (60 seconds)
   - Auto-selected best moments
   - AI-generated script
   - Professional voice-over
   - Synced subtitles
   - Vertical 9:16 format ready for social media

πŸš€ Quick Start

Prerequisites

  • Python 3.9 or higher
  • FFmpeg installed and in PATH
  • (Optional) CUDA-capable GPU for acceleration

Installation

  1. Clone the repository

    git clone https://github.com/yourusername/vidgen.git
    cd vidgen
  2. Create virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Run setup

    python cli.py setup

Usage

🌐 Web UI (Recommended)

Start the web interface:

cd webui
./start.sh

Then open http://localhost:8000 in your browser.

Key Features:

  • ✨ Drag & drop video uploads - Intuitive file management
  • πŸ”„ Real-time progress tracking - WebSocket-powered live updates with progress bars
  • πŸ“Š Job management dashboard - Monitor all your video generation tasks
  • πŸ’Ύ Direct downloads - One-click access to generated videos
  • 🎨 Modern, responsive UI - Beautiful dark theme interface (see screenshots above)
  • πŸ” Secure & production-ready - Complete security hardening implemented
  • πŸ‘€ Optional authentication - Works with or without user accounts
  • πŸš€ Background processing - Queue system with Laravel jobs

Starting Services:

# All-in-one script (recommended)
./start.sh

# Or start individually in separate terminals:
php artisan serve          # Web server (port 8000)
php artisan reverb:start   # WebSocket (port 8080)
php artisan queue:work     # Background jobs

Documentation:

πŸ–₯️ Interactive Terminal UI

python app.py

This launches the full interactive terminal application with:

  • File browser for selecting videos
  • Configuration forms
  • Live progress tracking
  • Results display

πŸ”§ Command Line Interface

Generate a video:

python cli.py generate "Amazing Nature Facts" \
  -i data/input/video1.mp4 \
  -i data/input/video2.mp4 \
  -o output/nature.mp4 \
  --duration 60 \
  --style engaging

Analyze a video:

python cli.py analyze data/input/video.mp4

Generate just a script:

python cli.py script "Cooking Tips" --style educational

Check system info:

python cli.py info

πŸ“‹ Workflow

  1. Input: Place raw MP4/MOV files in data/input/
  2. Analysis: AI detects scenes and scores interesting moments
  3. Script: LLM generates an engaging narrative for your topic
  4. Voice: TTS creates professional voice-over
  5. Subtitles: Whisper provides word-level timing
  6. Assembly: Clips are composed with subtitles and audio
  7. Output: Final video saved to data/output/

βš™οΈ Configuration

Edit config/config.yaml to customize:

video:
  target_duration: 60  # seconds
  resolution:
    width: 1080
    height: 1920  # 9:16 vertical
  fps: 30

script:
  llm:
    provider: "local"  # or "openai"
    local_model: "mistralai/Mistral-7B-Instruct-v0.2"
  style: "engaging"  # engaging, educational, funny, dramatic

tts:
  engine: "coqui"  # or "bark"
  
subtitles:
  word_by_word: true
  style:
    font_size: 48
    position: "center"

🎨 Video Styles

  • Engaging: Hook-driven, conversational, perfect for general content
  • Educational: Informative, clear, great for tutorials
  • Funny: Lighthearted, humorous tone
  • Dramatic: Intense, suspenseful narration

πŸ”§ Advanced Usage

Using OpenAI API

Set your API key:

export OPENAI_API_KEY="your-key-here"

Update config:

script:
  llm:
    provider: "openai"

GPU Acceleration

The pipeline automatically detects and uses GPU when available:

  • NVIDIA: CUDA (h264_nvenc)
  • AMD: AMF (h264_amf)
  • Mac: VideoToolbox (h264_videotoolbox)

Custom Voice Cloning

tts:
  coqui:
    speaker_wav: "path/to/reference_voice.wav"

πŸ“ Project Structure

vidgen/
β”œβ”€β”€ app.py                    # Interactive Textual UI
β”œβ”€β”€ cli.py                    # Command-line interface
β”œβ”€β”€ config/
β”‚   └── config.yaml          # Main configuration
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ pipeline.py          # Main orchestrator
β”‚   β”œβ”€β”€ modules/
β”‚   β”‚   β”œβ”€β”€ video_analyzer.py       # Scene detection
β”‚   β”‚   β”œβ”€β”€ script_generator.py     # LLM script generation
β”‚   β”‚   β”œβ”€β”€ voice_generator.py      # TTS
β”‚   β”‚   β”œβ”€β”€ subtitle_generator.py   # Whisper subtitles
β”‚   β”‚   └── video_assembler.py      # Final composition
β”‚   └── utils/
β”‚       β”œβ”€β”€ logger.py        # Logging utilities
β”‚       β”œβ”€β”€ file_manager.py  # File operations
β”‚       └── tui_manager.py   # Progress display
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ input/              # Place raw videos here
β”‚   β”œβ”€β”€ output/             # Generated videos
β”‚   └── temp/               # Temporary files
β”œβ”€β”€ models/                 # Downloaded AI models
└── logs/                   # Application logs

🧠 AI Models

Local LLM (Script Generation)

  • Default: Mistral-7B-Instruct-v0.2
  • Alternatives: Llama-2, GPT-J, etc.
  • Requires: 8GB+ RAM/VRAM

Whisper (Subtitles)

  • Models: tiny, base, small, medium, large
  • Base model recommended (balance of speed/accuracy)

TTS

  • Coqui TTS: Fast, good quality
  • Bark: More natural, slower

πŸ› Troubleshooting

FFmpeg not found:

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Out of memory:

  • Use smaller Whisper model (tiny or base)
  • Switch to OpenAI API for script generation
  • Reduce video resolution in config

Slow processing:

  • Enable GPU acceleration
  • Use smaller AI models
  • Process shorter clips

Import errors:

pip install -r requirements.txt --force-reinstall

πŸ“Š Performance

Approximate processing times (60s output video):

Hardware Time
RTX 3080 + 16GB RAM ~3-5 min
M1 Mac + 16GB RAM ~5-8 min
CPU only (i7) + 16GB RAM ~15-25 min

🀝 Contributing

Contributions welcome! Areas for improvement:

  • Background music integration
  • More transition effects
  • Web UI
  • Direct social media upload
  • Real-time preview
  • Batch processing

πŸ“„ License

Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

This software is licensed for non-commercial use only. You are free to:

  • βœ… Use for personal projects
  • βœ… Use for education and research
  • βœ… Share and modify the code
  • βœ… Create derivative works

Commercial use is NOT permitted without written permission. This includes:

  • ❌ Selling videos generated by this software
  • ❌ Using in a business or for-profit organization
  • ❌ Monetizing content (ads, sponsorships, etc.)
  • ❌ Providing paid services using this software

For commercial licensing inquiries, please contact the copyright holders.

See LICENSE file for complete terms.

πŸ™ Acknowledgments

Built with:

About

Vidgen is a video generation tool using AI. Simply upload videos and a topic and Vidgen will produce a social media ready video!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors