Skip to content

data collection and analysis project using ChEMBL and RDKit. Includes dataset cleaning, duplicate removal, Lipinski’s Rule of Five filtering, disruptor analysis, and molecular visualization with Python.

Notifications You must be signed in to change notification settings

mike3119/https-github.com-mike3119-my-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Bioinformatics & Cheminformatics Project

This project demonstrates the use of Python and RDKit for drug discovery-related data analysis.
It involves collecting molecular datasets from ChEMBL, cleaning and curating them, applying filters like Lipinski’s Rule of Five, performing disruptor calculations, and generating visual plots.


📂 Project Contents

  • Project work.ipynb
    Main workflow notebook. Includes:

    • Data collection from ChEMBL
    • Data cleaning and duplicate removal
    • Lipinski’s Rule of Five analysis
    • Disruptor calculation
    • Basic visualizations with RDKit and matplotlib
  • aromatase.ipynb
    Focused case study on compounds related to aromatase inhibitors:

    • Data preparation
    • Rule of Five and disruptor screening
    • Plotting molecular property distributions

🛠️ Tools & Libraries Used


🚀 How to Run

  1. Clone this repository:
    git clone https://github.com/mike3119/https-github.com-mike3119-my-project.git
    
    
  2. Navigate into the folder:

cd https-github.com-mike3119-my-project

  1. Install dependencies:

pip install rdkit pandas matplotlib

  1. Open the notebooks:

jupyter notebook


📊 Example Outputs

Some of the results you can expect:

Distribution plots of molecular weight, LogP, and other drug-likeness properties

Filtering of compounds based on Lipinski’s Rule of Five

Data tables of curated compounds for further analysis


🎯 Project Goals

Learn and apply bioinformatics/cheminformatics techniques for drug discovery.

Explore the power of ChEMBL as a molecular dataset resource.

Practice molecular analysis with RDKit in Python.

Build a foundation for advanced drug design projects.


👤 Author

Michael Hemen Akosu

hemenmonterakosu@gmail.com B.Sc. Chemistry

PGD in Drug Analysis, Pharmaceutical Chemistry (University of Ibadan)

Open to internships, collaborations, and learning opportunities in bioinformatics and cheminformatics.

About

data collection and analysis project using ChEMBL and RDKit. Includes dataset cleaning, duplicate removal, Lipinski’s Rule of Five filtering, disruptor analysis, and molecular visualization with Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published