Website Archiver

Preserve, manage, and customize web content offline with this powerful archiving tool.

🚀 Key Features

✨ Smart Web Capturing - Download complete websites with all resources (images, CSS, JavaScript, fonts)
🔄 Multiple Engines - Choose between standard requests, Selenium, or Playwright for perfect captures
📚 Bulk Archive - Download multiple websites at once with the batch processor
🔍 Content Search - Find exactly what you need with full text search across your archives
🏷️ Tagging System - Organize websites with custom tags for efficient categorization
📝 Notes & Annotations - Add context with your own notes for each saved website
✏️ Built-in Editor - Modify archived content directly within the application
📦 Import/Export - Share your archives with others or back them up securely

🔧 Installation

Prerequisites

Python 3.7 or higher
PyQt6
Internet connection for downloading websites

Method 1: Using pip (Recommended)

# Install from PyPI
pip install website-archiver

# Launch the application
website-archiver

Method 2: From Source

# Clone the repository
git clone https://github.com/Oliverwebdev/WebArchiver
cd website-archiver

# Create and activate virtual environment (recommended)
python -m venv venv

# On Windows
venv\Scripts\activate

# On macOS/Linux
source venv/bin/activate

# Install requirements
pip install -r requirements.txt

# Run the application
python main.py

Optional Dependencies

For the best archiving experience, install additional engines:

# For Playwright support (recommended for complex websites)
pip install playwright
playwright install chromium

# For Selenium support
pip install selenium

📖 User Guide

Archiving Your First Website

Launch Website Archiver
Go to the Download tab
Enter the URL you want to archive
Select your preferred download options
Click Download
Your archived website will appear in the Home tab

Managing Your Archives

Search: Use the search bar to find websites by title, URL, or content
Filter by Tags: Select a tag from the dropdown to filter related websites
Edit Website: Click "Edit" to modify the website's content, tags, or properties
Add Notes: Record your thoughts or context about why you archived the site
Export: Share your archives with others using the export functionality

Customizing Your Experience

Visit the Settings tab to configure:

Storage location for your archives
Default download engine
Resource options (images, CSS, JS, fonts)
Timeout and concurrency settings
And much more!

⚙️ Technical Details

Website Archiver intelligently captures web content using a multi-step process:

Analysis: Evaluates the target website structure
Download: Retrieves HTML content using the selected engine
Resource Collection: Gathers linked resources (images, styles, scripts)
Path Rewriting: Modifies resource paths to work offline
Storage: Organizes content in a structured filesystem
Indexing: Catalogs the archive in the searchable database

The application architecture includes:

config_manager.py: Manages application configuration
database_manager.py: Handles SQLite database operations
scraper.py: Core web scraping functionality
session_manager.py: Manages application state between sessions
ui/: PyQt6-based user interface components

🛠️ Development

Want to contribute to Website Archiver? Great! We welcome contributions of all kinds.

Setting Up Development Environment

Fork the repository
Clone your fork: git clone https://github.com/Oliverwebdev/WebArchiver
Create a virtual environment: python -m venv venv
Activate it and install dev dependencies: pip install -r requirements-dev.txt
Make your changes and submit a pull request!

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

Beautiful Soup for HTML parsing
PyQt for the GUI framework
Requests for HTTP functionality
Selenium and Playwright for browser automation
All the open source contributors who made this project possible

🤝 Support

If you find Website Archiver useful, please consider:

Star the repository on GitHub
Reporting issues or suggesting features
Contributing code or documentation improvements
Sharing the project with others

Website Archiver - Because the web is too important to lose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Website Archiver

🚀 Key Features

🔧 Installation

Prerequisites

Method 1: Using pip (Recommended)

Method 2: From Source

Optional Dependencies

📖 User Guide

Archiving Your First Website

Managing Your Archives

Customizing Your Experience

⚙️ Technical Details

🛠️ Development

Setting Up Development Environment

📜 License

🙏 Acknowledgements

🤝 Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ui		ui
.gitignore		.gitignore
README.md		README.md
WebArchiver.jpg		WebArchiver.jpg
config_manager.py		config_manager.py
database_manager.py		database_manager.py
main.py		main.py
requirements.txt		requirements.txt
scraper.py		scraper.py
session_manager.py		session_manager.py

Oliverwebdev/WebArchiver

Folders and files

Latest commit

History

Repository files navigation

Website Archiver

🚀 Key Features

🔧 Installation

Prerequisites

Method 1: Using pip (Recommended)

Method 2: From Source

Optional Dependencies

📖 User Guide

Archiving Your First Website

Managing Your Archives

Customizing Your Experience

⚙️ Technical Details

🛠️ Development

Setting Up Development Environment

📜 License

🙏 Acknowledgements

🤝 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages