Skip to content

automl/STRAND

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STRAND

RNA-protein interactions play crucial roles in cellular processes, from gene regulation to viral replication. While recent advances in structure prediction have revolutionized our ability to model macromolecular complexes, achieving accurate predictions of RNA-protein binding poses remains challenging. In this work, we present STRAND, a diffusion-based model for monomeric RNA-protein complex refinement that builds upon the success of DiffDock-PP in protein-protein docking. Unlike traditional docking, we develop STRAND as a modular extension to existing RNA-Protein complex prediction tools to improve their backbone predictions. We study the effect of different transformations by training models to learn either translation, rotation, torsion, or combinations of these during the diffusion process and initialize the backward process with a complex prediction at test time. Our experiments with AlphaFold 3 and ProRNA3D-single reveal that STRAND can improve the backbones of a large fraction of RNA-protein complex predictions.

Paper

Check out our GenBio@ICML'25 Workshop paper

Citation

If you use this repo in your own work, please consider citing us

@inproceedings{
al-zeqri2025strand,
title={{STRAND}: Structure Refinement of {RNA}-Protein Complexes via Diffusion},
author={Mohsen Al-zeqri and J{\"o}rg K.H. Franke and Frederic Runge},
booktitle={ICML 2025 Generative AI and Biology (GenBio) Workshop},
year={2025},
url={https://openreview.net/forum?id=NEealHqriE}
}

Installation

Clone the repo:

git clone https://github.com/automl/STRAND.git  

Create and activate conda environment:

conda create -n STRAND python=3.9.18 
conda activate STRAND 
pip install -r requirements.txt 

Replicating Paper Results

📁 Dataset Setup

Download the preprocessed structures used in our experiments:

👉 test_data.zip

  1. Download the test_data.zip file
  2. Place it in the datasets directory
  3. Extract the archive:
unzip datasets/test_data.zip -d datasets/

🚀 Running Experiments

Choose your inference mode based on whether you want to use the confidence model:

Manual Selection (Without Confidence Model)

Run structure refinement with manual selection on any of the three benchmark datasets:

# RNA-Pro dataset
sh src/inference_manual.sh rnapro

# Non-X-ray dataset  
sh src/inference_manual.sh nonxray

# X-ray dataset
sh src/inference_manual.sh xray

Automated Selection (With Confidence Model)

Run structure refinement using the confidence model for automated structure selection:

# RNA-Pro dataset
sh src/inference_conf.sh rnapro

# Non-X-ray dataset
sh src/inference_conf.sh nonxray

# X-ray dataset
sh src/inference_conf.sh xray

📊 Results

Refined structures and evaluation metrics will be saved in the results/ directory, organized by dataset and method used.

Training

Download Raw Data

Download PDB files containing RNA-protein complexe before the cutoff date 30.Sept.2021 via the website and store them into datasets/pdb_files or run the command:

mk dir datasets/pdb_files 
sh datasets/batch_download.sh -f  datasets/list_file.txt -p -o datasets/pdb_files  

All the data must be stored as dill files, to do so run:

python src/data/preprocessing/cache_data.py --dir_path datasets/pdb_files --save_path datasets/train/af3_1022P_1022R 

Strand tr+rot utalized data augmentation during training, to augment the data run:

sh src/data/preprocessing/data_aug.sh

Download Preprocessed Data

To avoid the process of processing the data, we provide the Training dataset processed directly.

Download the non-augmented data via the link and store it in datasets/train.

👉 af3_1022P_1022R.zip

Download the augmented data via the link and store it in datasets/train.

👉 af3_1022P_1022R_aug.zip

🎯 Default Training Configuration

By default, STRAND trains with translation + rotation (STRAND-tr+rot).

⚙️ Custom Training Configurations

To train with different spatial transformations, modify the boolean arguments in src/train.sh:

# Available options:
--translation  True  # Enable translation refinement
--rotation     True # Enable rotation refinement  
--torsion      True # Enable torsion angle refinement

🚀 Starting Training

Score model:

  1. Set Data_file and Data_path variables in src/train.sh.
  2. Configure your desired spatial transformations in src/train.sh.
  3. Run the training script:
sh src/train.sh

Note: Training requires preprocessed datasets and sufficient computational resources (GPU recommended).

Generate samples:

After obtaining an optimised SCORE MODEL, use it to generate samples via:

sh src/generate_samples.sh

Confidence model:

Use the generated samples to train the confidence model and run:

sh src/train_confidence.sh

Inference

Store the structures to be refined as dill files using src/data/preprocessing/cache_data.py.

Specify the path of the stored data set to be refined and it's corrosponding csv file in the variables Data_path and Data_file respectively in the file src/train_confidence.sh.

Set --run_inference_without_confidence_model to be True to run the inference without the confidence model.

Run the inferecne porcess via:

sh src/inference.sh

Set --run_inference_without_confidence_model to be False to run the inference without the confidence model.

sh src/inference.sh

After running the inference visualization directories are created containing the generated samples. Defualt path is visualization/STRAND

To assess how well the refined samples are, Downdload the Ground Truth files that were refined from the PDB as .pdb files and store them indatasets/gt_dir.

Manual Selection results:

To display manual selection results run:

python src/visualize_inf_manual.py  --gt_path datasets/gt_dir --samples_path visualization/STRAND

Selection via confidence model:

To display the confidence model selection results run:

python src/visualize_inf_conf.py  --gt_path datasets/gt_dir --samples_path visualization/STRAND

Contributions

This repo is copied from the original source code available at https://github.com/zeqri/STRAND for reasons of maintenance.

About

Structure Refinement of RNA-Protein Complexes via Diffusion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •