Complete Bash + Python automation system to convert YouTube videos into viral short clips with interactive subtitles.
~/automation/
βββ scripts/ # All automation scripts
β βββ download_video.sh # Download YouTube videos
β βββ transcribe_video.sh # Transcribe videos using whisper.cpp
β βββ romanize_subtitles.py # Convert Urdu/Hindi to Roman script
β βββ suggest_clip.py # AI-powered viral clip suggestions
β βββ create_short.sh # Create short clips using ffmpeg
β βββ create_interactive_subtitles.py # Add interactive subtitles
β βββ process_all.sh # Complete pipeline
βββ download/ # Raw downloaded videos (MP4)
βββ transcribe/ # [Optional] Audio files (WAV)
βββ subtitles/ # Generated subtitles (.srt/.txt) and clip suggestions (.json)
βββ shorts/ # Final short videos with interactive subtitles
βββ requirements.txt # Python dependencies
βββ setup.sh # Installation script
βββ README.md # This file
cd ~/automation
./setup.sh# First activate the virtual environment (if using Arch Linux)
source ~/automation/venv/bin/activate
./scripts/process_all.sh "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"That's it! The system will:
- Download the video
- Transcribe it using Whisper
- Romanize non-English text
- Suggest viral clip segments using AI
- Create short clips
- Add interactive subtitles
./scripts/download_video.sh "https://www.youtube.com/watch?v=VIDEO_ID"./scripts/transcribe_video.sh# Activate virtual environment first (Arch Linux)
source ~/automation/venv/bin/activate
./scripts/run_python.sh romanize_subtitles.py# Activate virtual environment first (Arch Linux)
source ~/automation/venv/bin/activate
./scripts/run_python.sh suggest_clip.py./scripts/create_short.sh# Activate virtual environment first (Arch Linux)
source ~/automation/venv/bin/activate
./scripts/run_python.sh create_interactive_subtitles.pyThe system uses DeepSeek v3 (free) model via OpenRouter for:
- Language detection and romanization
- Viral clip suggestions
API key is configured in:
scripts/romanize_subtitles.pyscripts/suggest_clip.py
The system automatically detects and uses the best available model:
- medium (1.5 GB) - Best accuracy (if available)
- base.en (74 MB) - Good accuracy, English only
- base (74 MB) - Good accuracy, multilingual
Priority order: medium β base.en β base
You already have both ggml-medium.bin and ggml-base.en.bin models, so the system will use the medium model for better transcription accuracy!
Video Title.mp4- Original downloaded video
Video Title.txt- Text transcript (romanized if needed)Video Title.srt- SRT subtitle file with timestampsVideo Title.clip.json- AI-suggested clip with timestamps and reason
Video Title_short.mp4- Short clip without subtitlesVideo Title_with_subtitles.mp4- Final video with interactive subtitles
- Automatically detects Urdu/Hindi scripts
- Romanizes non-English text for better accessibility
- Preserves English content unchanged
- Analyzes transcript content for viral potential
- Suggests optimal 30-90 second segments
- Provides reasoning for each suggestion
- Looks for emotional moments, insights, and quotable content
- Word-level timing (when available)
- Highlighted current words
- Professional styling with stroke borders
- Optimized for mobile viewing
- Comprehensive logging with timestamps
- Graceful failure handling
- Detailed error messages
- Progress tracking
- Arch Linux (or other Linux distributions)
- Python 3.8+
- FFmpeg
- Git
yt-dlp- YouTube video downloaderwhisper.cpp- Fast speech recognitionffmpeg- Video processing- Python packages:
requests,moviepy,langdetect
- Use smaller Whisper models for speed
- Process shorter videos (< 30 minutes)
- Use SSD storage for faster I/O
- Use larger Whisper models (
mediumorlarge) - Ensure good audio quality in source videos
- Review AI suggestions before creating clips
cd ~
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
make
bash ./models/download-ggml-model.sh base.enEnsure videos are downloaded to ~/automation/download/ directory.
pip install --user moviepyCheck your internet connection and API key in the Python scripts.
All scripts provide detailed logging with timestamps. Check console output for specific error messages.
- Longer clips (60-90s) for complete thoughts
- Focus on key insights and actionable advice
- Shorter clips (30-60s) for quick engagement
- Highlight funny or surprising moments
- Find quotable statements and revelations
- Look for emotional or controversial moments
- API key is stored in script files (consider environment variables for production)
- Downloaded content respects YouTube's terms of service
- No personal data is transmitted to AI services
Output videos are optimized for:
- TikTok (vertical format compatible)
- YouTube Shorts
- Instagram Reels
- Twitter/X videos
Feel free to enhance the scripts:
- Add support for more languages
- Improve subtitle styling
- Add more AI models
- Optimize performance
This automation system is for personal and educational use. Respect YouTube's terms of service and content creators' rights.