📘 ETL Problems – Open Source Learning Project

Welcome to ETL Problems, an open-source project designed for learning, experimenting, and contributing to real-world data engineering workflows.

This repository contains a deliberately broken ETL pipeline that mimics issues data engineers face daily. The goal is for contributors to identify, fix, and enhance the pipeline — while learning best practices in data extraction, transformation, and loading.

🚀 What’s Inside?

The pipeline follows a simple ETL flow:

Extract → Reads data from a CSV file (with encoding fallback).
Transform → Cleans, deduplicates, and prepares the dataset.
Load → Stores processed data into an SQLite database (with idempotency).

⚠️ Find and Fix Issues

These bugs are intentionally introduced and marked in the code with
# TODO (Find & Fix): ...
Contributors should search for these comments and fix the issues.

Examples:

Unused imports
Incorrect default values
Wrong file extension checks
Missing error handling
Print statements instead of logging
Missing idempotency in database load
No duplicate removal in transform
Missing actual logic in extract/transform/load steps

🎯 Ways to Contribute

Fix bugs marked with # TODO (Find & Fix): ...
Improve error handling and logging
Add tests and validation
Enhance documentation
Add new features (scrapers, data quality checks, visualizations)

🛠 Setup Instructions

Clone the repo and install dependencies:

git clone https://github.com/<your-username>/etl-problems.git
cd etl-problems
pip install -r requirements.txt
python main.py

🧪 Testing

Unit tests can be added in the tests/ folder.
Run them with:

pytest tests/

💡 Tips for Contributors

Search for # TODO (Find & Fix): ... in the codebase.
Check the Issues for tasks and guidance.
If you find a new bug, open an issue and suggest a fix.
All contributions, big or small, are welcome!

📬 Questions?

Open an issue or start a discussion in the repo. Happy hacking!

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
.husky		.husky
app		app
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
README.md		README.md
commitlint.config.js		commitlint.config.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📘 ETL Problems – Open Source Learning Project

🚀 What’s Inside?

⚠️ Find and Fix Issues

Examples:

🎯 Ways to Contribute

🛠 Setup Instructions

🧪 Testing

💡 Tips for Contributors

📬 Questions?

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

OPCODE-Open-Spring-Fest/ETL-opcode

Folders and files

Latest commit

History

Repository files navigation

📘 ETL Problems – Open Source Learning Project

🚀 What’s Inside?

⚠️ Find and Fix Issues

Examples:

🎯 Ways to Contribute

🛠 Setup Instructions

🧪 Testing

💡 Tips for Contributors

📬 Questions?

About

Topics

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages