STRAND

RNA-protein interactions play crucial roles in cellular processes, from gene regulation to viral replication. While recent advances in structure prediction have revolutionized our ability to model macromolecular complexes, achieving accurate predictions of RNA-protein binding poses remains challenging. In this work, we present STRAND, a diffusion-based model for monomeric RNA-protein complex refinement that builds upon the success of DiffDock-PP in protein-protein docking. Unlike traditional docking, we develop STRAND as a modular extension to existing RNA-Protein complex prediction tools to improve their backbone predictions. We study the effect of different transformations by training models to learn either translation, rotation, torsion, or combinations of these during the diffusion process and initialize the backward process with a complex prediction at test time. Our experiments with AlphaFold 3 and ProRNA3D-single reveal that STRAND can improve the backbones of a large fraction of RNA-protein complex predictions.

Paper

Check out our GenBio@ICML'25 Workshop paper

Citation

If you use this repo in your own work, please consider citing us

@inproceedings{
al-zeqri2025strand,
title={{STRAND}: Structure Refinement of {RNA}-Protein Complexes via Diffusion},
author={Mohsen Al-zeqri and J{\"o}rg K.H. Franke and Frederic Runge},
booktitle={ICML 2025 Generative AI and Biology (GenBio) Workshop},
year={2025},
url={https://openreview.net/forum?id=NEealHqriE}
}

Installation

Clone the repo:

git clone https://github.com/automl/STRAND.git

Create and activate conda environment:

conda create -n STRAND python=3.9.18 
conda activate STRAND 
pip install -r requirements.txt

Replicating Paper Results

📁 Dataset Setup

Download the preprocessed structures used in our experiments:

👉 test_data.zip

Download the test_data.zip file
Place it in the datasets directory
Extract the archive:

unzip datasets/test_data.zip -d datasets/

🚀 Running Experiments

Choose your inference mode based on whether you want to use the confidence model:

Manual Selection (Without Confidence Model)

Run structure refinement with manual selection on any of the three benchmark datasets:

# RNA-Pro dataset
sh src/inference_manual.sh rnapro

# Non-X-ray dataset  
sh src/inference_manual.sh nonxray

# X-ray dataset
sh src/inference_manual.sh xray

Automated Selection (With Confidence Model)

Run structure refinement using the confidence model for automated structure selection:

# RNA-Pro dataset
sh src/inference_conf.sh rnapro

# Non-X-ray dataset
sh src/inference_conf.sh nonxray

# X-ray dataset
sh src/inference_conf.sh xray

📊 Results

Refined structures and evaluation metrics will be saved in the results/ directory, organized by dataset and method used.

Training

Download Raw Data

Download PDB files containing RNA-protein complexe before the cutoff date 30.Sept.2021 via the website and store them into datasets/pdb_files or run the command:

mk dir datasets/pdb_files 
sh datasets/batch_download.sh -f  datasets/list_file.txt -p -o datasets/pdb_files

All the data must be stored as dill files, to do so run:

python src/data/preprocessing/cache_data.py --dir_path datasets/pdb_files --save_path datasets/train/af3_1022P_1022R

Strand tr+rot utalized data augmentation during training, to augment the data run:

sh src/data/preprocessing/data_aug.sh

Download Preprocessed Data

To avoid the process of processing the data, we provide the Training dataset processed directly.

Download the non-augmented data via the link and store it in datasets/train.

👉 af3_1022P_1022R.zip

Download the augmented data via the link and store it in datasets/train.

👉 af3_1022P_1022R_aug.zip

🎯 Default Training Configuration

By default, STRAND trains with translation + rotation (STRAND-tr+rot).

⚙️ Custom Training Configurations

To train with different spatial transformations, modify the boolean arguments in src/train.sh:

# Available options:
--translation  True  # Enable translation refinement
--rotation     True # Enable rotation refinement  
--torsion      True # Enable torsion angle refinement

🚀 Starting Training

Score model:

Set Data_file and Data_path variables in src/train.sh.
Configure your desired spatial transformations in src/train.sh.
Run the training script:

sh src/train.sh

Note: Training requires preprocessed datasets and sufficient computational resources (GPU recommended).

Generate samples:

After obtaining an optimised SCORE MODEL, use it to generate samples via:

sh src/generate_samples.sh

Confidence model:

Use the generated samples to train the confidence model and run:

sh src/train_confidence.sh

Inference

Store the structures to be refined as dill files using src/data/preprocessing/cache_data.py.

Specify the path of the stored data set to be refined and it's corrosponding csv file in the variables Data_path and Data_file respectively in the file src/train_confidence.sh.

Set --run_inference_without_confidence_model to be True to run the inference without the confidence model.

Run the inferecne porcess via:

sh src/inference.sh

Set --run_inference_without_confidence_model to be False to run the inference without the confidence model.

sh src/inference.sh

After running the inference visualization directories are created containing the generated samples. Defualt path is visualization/STRAND

To assess how well the refined samples are, Downdload the Ground Truth files that were refined from the PDB as .pdb files and store them indatasets/gt_dir.

Manual Selection results:

To display manual selection results run:

python src/visualize_inf_manual.py  --gt_path datasets/gt_dir --samples_path visualization/STRAND

Selection via confidence model:

To display the confidence model selection results run:

python src/visualize_inf_conf.py  --gt_path datasets/gt_dir --samples_path visualization/STRAND

Contributions

This repo is copied from the original source code available at https://github.com/zeqri/STRAND for reasons of maintenance.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
datasets		datasets
models		models
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STRAND

Paper

Citation

Installation

Replicating Paper Results

📁 Dataset Setup

🚀 Running Experiments

Manual Selection (Without Confidence Model)

Automated Selection (With Confidence Model)

📊 Results

Training

Download Raw Data

Download Preprocessed Data

🎯 Default Training Configuration

⚙️ Custom Training Configurations

🚀 Starting Training

Inference

Contributions

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

automl/STRAND

Folders and files

Latest commit

History

Repository files navigation

STRAND

Paper

Citation

Installation

Replicating Paper Results

📁 Dataset Setup

🚀 Running Experiments

Manual Selection (Without Confidence Model)

Automated Selection (With Confidence Model)

📊 Results

Training

Download Raw Data

Download Preprocessed Data

🎯 Default Training Configuration

⚙️ Custom Training Configurations

🚀 Starting Training

Inference

Contributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages