This project demonstrates the use of Python and RDKit for drug discovery-related data analysis.
It involves collecting molecular datasets from ChEMBL, cleaning and curating them, applying filters like Lipinski’s Rule of Five, performing disruptor calculations, and generating visual plots.
-
Project work.ipynb
Main workflow notebook. Includes:- Data collection from ChEMBL
- Data cleaning and duplicate removal
- Lipinski’s Rule of Five analysis
- Disruptor calculation
- Basic visualizations with RDKit and matplotlib
-
aromatase.ipynb
Focused case study on compounds related to aromatase inhibitors:- Data preparation
- Rule of Five and disruptor screening
- Plotting molecular property distributions
- Python 3.8+
- RDKit
- Pandas
- Matplotlib
- ChEMBL Database
- Clone this repository:
git clone https://github.com/mike3119/https-github.com-mike3119-my-project.git
- Navigate into the folder:
cd https-github.com-mike3119-my-project
- Install dependencies:
pip install rdkit pandas matplotlib
- Open the notebooks:
jupyter notebook
📊 Example Outputs
Some of the results you can expect:
Distribution plots of molecular weight, LogP, and other drug-likeness properties
Filtering of compounds based on Lipinski’s Rule of Five
Data tables of curated compounds for further analysis
🎯 Project Goals
Learn and apply bioinformatics/cheminformatics techniques for drug discovery.
Explore the power of ChEMBL as a molecular dataset resource.
Practice molecular analysis with RDKit in Python.
Build a foundation for advanced drug design projects.
👤 Author
Michael Hemen Akosu
hemenmonterakosu@gmail.com B.Sc. Chemistry
PGD in Drug Analysis, Pharmaceutical Chemistry (University of Ibadan)
Open to internships, collaborations, and learning opportunities in bioinformatics and cheminformatics.