This a sample pastebin crawler that crawles the recent pastes from the homepage.
- python3
- a cup of coffee (mandatory!!!)
- It is recommended to create a virtual environment
python -m venv venv
source ./venv/bin/activate
- Installing dependencies
cd src/
pip -r requirements.txt
- Running the crawler
cd src/
python main.py
docker compose up
Use flake8 for linting and black for formatting
cd src/
pip -r requirements-dev.txt
black -l 79 <file>.py
flake8 .
- The crawler uses the following yaml file format
# location: ../assets/config.yaml
crawler:
# URL of the site to crawl
url: https://pastebin.com
# where to save the paste content
pastes_path: ../assets/pastes
# time to repeat the action
interval: 120
db:
file: ../assets/db/pastes_db.json
table: pastes
logger:
name: PastebinCrawler
level: INFO- To change the path of the config file, set the environment variable:
CONFIG_PATH=../assets/config.yaml