- ✨ Smart Web Capturing - Download complete websites with all resources (images, CSS, JavaScript, fonts)
 - 🔄 Multiple Engines - Choose between standard requests, Selenium, or Playwright for perfect captures
 - 📚 Bulk Archive - Download multiple websites at once with the batch processor
 - 🔍 Content Search - Find exactly what you need with full text search across your archives
 - 🏷️ Tagging System - Organize websites with custom tags for efficient categorization
 - 📝 Notes & Annotations - Add context with your own notes for each saved website
 - ✏️ Built-in Editor - Modify archived content directly within the application
 - 📦 Import/Export - Share your archives with others or back them up securely
 
- Python 3.7 or higher
 - PyQt6
 - Internet connection for downloading websites
 
# Install from PyPI
pip install website-archiver
# Launch the application
website-archiver# Clone the repository
git clone https://github.com/Oliverwebdev/WebArchiver
cd website-archiver
# Create and activate virtual environment (recommended)
python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate
# Install requirements
pip install -r requirements.txt
# Run the application
python main.pyFor the best archiving experience, install additional engines:
# For Playwright support (recommended for complex websites)
pip install playwright
playwright install chromium
# For Selenium support
pip install selenium- Launch Website Archiver
 - Go to the Download tab
 - Enter the URL you want to archive
 - Select your preferred download options
 - Click Download
 - Your archived website will appear in the Home tab
 
- Search: Use the search bar to find websites by title, URL, or content
 - Filter by Tags: Select a tag from the dropdown to filter related websites
 - Edit Website: Click "Edit" to modify the website's content, tags, or properties
 - Add Notes: Record your thoughts or context about why you archived the site
 - Export: Share your archives with others using the export functionality
 
Visit the Settings tab to configure:
- Storage location for your archives
 - Default download engine
 - Resource options (images, CSS, JS, fonts)
 - Timeout and concurrency settings
 - And much more!
 
Website Archiver intelligently captures web content using a multi-step process:
- Analysis: Evaluates the target website structure
 - Download: Retrieves HTML content using the selected engine
 - Resource Collection: Gathers linked resources (images, styles, scripts)
 - Path Rewriting: Modifies resource paths to work offline
 - Storage: Organizes content in a structured filesystem
 - Indexing: Catalogs the archive in the searchable database
 
The application architecture includes:
config_manager.py: Manages application configurationdatabase_manager.py: Handles SQLite database operationsscraper.py: Core web scraping functionalitysession_manager.py: Manages application state between sessionsui/: PyQt6-based user interface components
Want to contribute to Website Archiver? Great! We welcome contributions of all kinds.
- Fork the repository
 - Clone your fork: 
git clonehttps://github.com/Oliverwebdev/WebArchiver - Create a virtual environment: 
python -m venv venv - Activate it and install dev dependencies: 
pip install -r requirements-dev.txt - Make your changes and submit a pull request!
 
This project is licensed under the MIT License - see the LICENSE file for details.
- Beautiful Soup for HTML parsing
 - PyQt for the GUI framework
 - Requests for HTTP functionality
 - Selenium and Playwright for browser automation
 - All the open source contributors who made this project possible
 
If you find Website Archiver useful, please consider:
- Star the repository on GitHub
 - Reporting issues or suggesting features
 - Contributing code or documentation improvements
 - Sharing the project with others
 
Website Archiver - Because the web is too important to lose.
