-
Notifications
You must be signed in to change notification settings - Fork 2
Dataloader speedup #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
oke464
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussions with @filipekstrm we think setting parameters persistent_workers, pin_memory, num_workers to pytorch default and adding description in readme might be the best option. @rartino what do you think?
| "epochs": 1000, | ||
| "val_interval": 1, | ||
| "num_workers": 0, | ||
| "pin_memory": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "pin_memory": True, | |
| "pin_memory": False, |
| "val_interval": 1, | ||
| "num_workers": 0, | ||
| "pin_memory": True, | ||
| "persistent_workers": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "persistent_workers": True, | |
| "persistent_workers": False, |
| Warning: using logger ```none``` will not save any checkpoints (or anything else), but can be used for, e.g., debugging. | ||
|
|
||
| This command will use the default values for all other parameters, which are the ones used in the paper. | ||
| This command will use the default values for all other parameters, which are the ones used in the paper. **Note: It is not strictly necessary to set ```num_workers```, and if not it will default to 0. However, in our experience, increasing it can substantially speed up training** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This command will use the default values for all other parameters, which are the ones used in the paper. **Note: It is not strictly necessary to set ```num_workers```, and if not it will default to 0. However, in our experience, increasing it can substantially speed up training** | |
| This command will use the default values for all other parameters, which are the ones used in the paper. **Note: It is not strictly necessary to set ```num_workers```, ``persistent_workers``, and ``pin_memory``. However, in our experience, increasing ``num_workers``, and setting ``persistent_workers=True``, and ``pin_memory=True`` can substantially speed up training.** Optimum ``num_workers`` value depends on your system, we have used the maximum suggested value from PyTorch warning. |
| To train a WyckoffDiff model on WBM, a minimal example is | ||
| ``` | ||
| python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] | ||
| python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS] | |
| python main.py --mode train_d3pm --d3pm_transition [uniform/marginal/zeros_init] --logger [none/model_only/local_only/tensorboard/wandb] --num_workers [NUM_WORKERS] --persistent_workers [True/False] --pin_memory [True/False] |
Trying to make training faster by improving dataloading. This has been developed on a GTX 3090 and A100, and experiences are hence from there:
--num_workerswhich is passed to dataloaders. Default is 0 (which is default also in DataLoader and hence what is used currently), but increasing it can substantially make training faster--pin_memorywhich is passed to dataloaders. The default is True which is opposite of what is used in Pytorch--persistent_workerswhich is passed to dataloaders if---num_workers > 0. The default is True which is opposite of what is used in PytorchDatawhen getting the data (i.e., in thegetmethod ofWyckoffDataset). This instance only contains the bare minimum information necessary (e.g, it does not include the matrixxas that is not used)I am a little unsure about using True as default for
--pin_memoryand--persistent_workersas False is the default in Pytorch. On the other hand, I think they help in improving speed for our use, and hence they can be True in our codebase. On the other hand, for--num_workersI think it can be system-specific what is suitable and hence I left 0 as the default. I did, however, include it in the training command example in the README together with a note