RNA-protein interactions play crucial roles in cellular processes, from gene regulation to viral replication. While recent advances in structure prediction have revolutionized our ability to model macromolecular complexes, achieving accurate predictions of RNA-protein binding poses remains challenging. In this work, we present STRAND, a diffusion-based model for monomeric RNA-protein complex refinement that builds upon the success of DiffDock-PP in protein-protein docking. Unlike traditional docking, we develop STRAND as a modular extension to existing RNA-Protein complex prediction tools to improve their backbone predictions. We study the effect of different transformations by training models to learn either translation, rotation, torsion, or combinations of these during the diffusion process and initialize the backward process with a complex prediction at test time. Our experiments with AlphaFold 3 and ProRNA3D-single reveal that STRAND can improve the backbones of a large fraction of RNA-protein complex predictions.
Check out our GenBio@ICML'25 Workshop paper
If you use this repo in your own work, please consider citing us
@inproceedings{
al-zeqri2025strand,
title={{STRAND}: Structure Refinement of {RNA}-Protein Complexes via Diffusion},
author={Mohsen Al-zeqri and J{\"o}rg K.H. Franke and Frederic Runge},
booktitle={ICML 2025 Generative AI and Biology (GenBio) Workshop},
year={2025},
url={https://openreview.net/forum?id=NEealHqriE}
}
Clone the repo:
git clone https://github.com/automl/STRAND.git
Create and activate conda environment:
conda create -n STRAND python=3.9.18
conda activate STRAND
pip install -r requirements.txt
Download the preprocessed structures used in our experiments:
- Download the
test_data.zipfile - Place it in the
datasetsdirectory - Extract the archive:
unzip datasets/test_data.zip -d datasets/Choose your inference mode based on whether you want to use the confidence model:
Run structure refinement with manual selection on any of the three benchmark datasets:
# RNA-Pro dataset
sh src/inference_manual.sh rnapro
# Non-X-ray dataset
sh src/inference_manual.sh nonxray
# X-ray dataset
sh src/inference_manual.sh xrayRun structure refinement using the confidence model for automated structure selection:
# RNA-Pro dataset
sh src/inference_conf.sh rnapro
# Non-X-ray dataset
sh src/inference_conf.sh nonxray
# X-ray dataset
sh src/inference_conf.sh xrayRefined structures and evaluation metrics will be saved in the results/ directory, organized by dataset and method used.
Download PDB files containing RNA-protein complexe before the cutoff date 30.Sept.2021 via the website and store them into datasets/pdb_files or run the command:
mk dir datasets/pdb_files
sh datasets/batch_download.sh -f datasets/list_file.txt -p -o datasets/pdb_files
All the data must be stored as dill files, to do so run:
python src/data/preprocessing/cache_data.py --dir_path datasets/pdb_files --save_path datasets/train/af3_1022P_1022R
Strand tr+rot utalized data augmentation during training, to augment the data run:
sh src/data/preprocessing/data_aug.sh
To avoid the process of processing the data, we provide the Training dataset processed directly.
Download the non-augmented data via the link and store it in datasets/train.
Download the augmented data via the link and store it in datasets/train.
By default, STRAND trains with translation + rotation (STRAND-tr+rot).
To train with different spatial transformations, modify the boolean arguments in src/train.sh:
# Available options:
--translation True # Enable translation refinement
--rotation True # Enable rotation refinement
--torsion True # Enable torsion angle refinementScore model:
- Set
Data_fileandData_pathvariables insrc/train.sh. - Configure your desired spatial transformations in
src/train.sh. - Run the training script:
sh src/train.shNote: Training requires preprocessed datasets and sufficient computational resources (GPU recommended).
Generate samples:
After obtaining an optimised SCORE MODEL, use it to generate samples via:
sh src/generate_samples.sh
Confidence model:
Use the generated samples to train the confidence model and run:
sh src/train_confidence.sh
Store the structures to be refined as dill files using src/data/preprocessing/cache_data.py.
Specify the path of the stored data set to be refined and it's corrosponding csv file in the variables Data_path and Data_file respectively in the file src/train_confidence.sh.
Set --run_inference_without_confidence_model to be True to run the inference without the confidence model.
Run the inferecne porcess via:
sh src/inference.sh
Set --run_inference_without_confidence_model to be False to run the inference without the confidence model.
sh src/inference.sh
After running the inference visualization directories are created containing the generated samples. Defualt path is visualization/STRAND
To assess how well the refined samples are, Downdload the Ground Truth files that were refined from the PDB as .pdb files and store them indatasets/gt_dir.
Manual Selection results:
To display manual selection results run:
python src/visualize_inf_manual.py --gt_path datasets/gt_dir --samples_path visualization/STRAND
Selection via confidence model:
To display the confidence model selection results run:
python src/visualize_inf_conf.py --gt_path datasets/gt_dir --samples_path visualization/STRAND
This repo is copied from the original source code available at https://github.com/zeqri/STRAND for reasons of maintenance.