Dataloader speedup #12

filipekstrm · 2025-09-01T13:48:56Z

Trying to make training faster by improving dataloading. This has been developed on a GTX 3090 and A100, and experiences are hence from there:

Added argument --num_workers which is passed to dataloaders. Default is 0 (which is default also in DataLoader and hence what is used currently), but increasing it can substantially make training faster
Added argument --pin_memory which is passed to dataloaders. The default is True which is opposite of what is used in Pytorch
Added argument --persistent_workers which is passed to dataloaders if ---num_workers > 0. The default is True which is opposite of what is used in Pytorch
Creating a new instance of Data when getting the data (i.e., in the get method of WyckoffDataset). This instance only contains the bare minimum information necessary (e.g, it does not include the matrix x as that is not used)

I am a little unsure about using True as default for --pin_memory and --persistent_workers as False is the default in Pytorch. On the other hand, I think they help in improving speed for our use, and hence they can be True in our codebase. On the other hand, for --num_workers I think it can be system-specific what is suitable and hence I left 0 as the default. I did, however, include it in the training command example in the README together with a note

…they are called exactly that

oke464

After discussions with @filipekstrm we think setting parameters persistent_workers, pin_memory, num_workers to pytorch default and adding description in readme might be the best option. @rartino what do you think?

oke464 · 2025-09-12T13:44:17Z

wyckoff_generation/common/args_and_config.py

    "epochs": 1000,
    "val_interval": 1,
+    "num_workers": 0,
+    "pin_memory": True,


Suggested change

"pin_memory": True,

"pin_memory": False,

oke464 · 2025-09-12T13:44:28Z

wyckoff_generation/common/args_and_config.py

    "val_interval": 1,
+    "num_workers": 0,
+    "pin_memory": True,
+    "persistent_workers": True,


Suggested change

"persistent_workers": True,

"persistent_workers": False,

oke464 · 2025-09-12T13:54:23Z

README.md

 Warning: using logger ```none``` will not save any checkpoints (or anything else), but can be used for, e.g., debugging.

-This command will use the default values for all other parameters, which are the ones used in the paper.
+This command will use the default values for all other parameters, which are the ones used in the paper. **Note: It is not strictly necessary to set ```num_workers```, and if not it will default to 0. However, in our experience, increasing it can substantially speed up training**


Suggested change

This command will use the default values for all other parameters, which are the ones used in the paper. **Note: It is not strictly necessary to set ```num_workers```, and if not it will default to 0. However, in our experience, increasing it can substantially speed up training**

This command will use the default values for all other parameters, which are the ones used in the paper. **Note: It is not strictly necessary to set ```num_workers```, ``persistent_workers``, and ``pin_memory``. However, in our experience, increasing ``num_workers``, and setting ``persistent_workers=True``, and ``pin_memory=True`` can substantially speed up training.** Optimum ``num_workers`` value depends on your system, we have used the maximum suggested value from PyTorch warning.

oke464 · 2025-09-12T13:57:42Z

README.md

 To train a WyckoffDiff model on WBM, a minimal example is
 ```
-python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb]
+python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS]


Suggested change

python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS]

python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS] --persistent_workers [True/False] --pin_memory [True/False]

filipekstrm added 3 commits September 1, 2025 15:20

Attempting improved training speed with more efficient dataloading

8c4210c

Changed how arguments for pin_memory and persisten_workers work. Now …

73e5537

…they are called exactly that

Add num_workers to example in README

2b10089

filipekstrm requested review from Abhivega, oke464 and rartino September 1, 2025 13:48

Base automatically changed from docstring_fix to main September 4, 2025 07:46

filipekstrm added 2 commits September 4, 2025 10:00

Merge branch 'main' into dataloader_speedup

e357a65

pin_memory and persisten_workers defaults from default_args_dict

81ce666

oke464 reviewed Sep 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataloader speedup #12

Dataloader speedup #12

Uh oh!

filipekstrm commented Sep 1, 2025

Uh oh!

oke464 left a comment

Uh oh!

oke464 Sep 12, 2025

Uh oh!

oke464 Sep 12, 2025

Uh oh!

oke464 Sep 12, 2025

Uh oh!

oke464 Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS]
	python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS] --persistent_workers [True/False] --pin_memory [True/False]

Dataloader speedup #12

Are you sure you want to change the base?

Dataloader speedup #12

Uh oh!

Conversation

filipekstrm commented Sep 1, 2025

Uh oh!

oke464 left a comment

Choose a reason for hiding this comment

Uh oh!

oke464 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

oke464 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

oke464 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

oke464 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants