This repository was archived by the owner on Jan 26, 2022. It is now read-only.
  
  
  
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Description: I use 3 GPUs to train the network and interrupt at some point before the final step, which means I only save the checkpoint but not config. Then, I try to test the model, which unexpectedly failed and the error message is
start = subinds[i][0], list index out of range.Issue: I think at the line 64, instead of writing
gpu_inds = range(cfg.NUM_GPUS), I think it is much more reasonable to writegpu_inds = range(NUM_GPUS). Let me explain it.After import the yaml and config file in
subprocess.py, cfg.NUM_GPUs is 8 instead of 3 (well, in train_net_step, there is a statement which assigns cfg.NUM_GPUs = torch.cuda.device_count(), so it does not crash), and NUM_GPUs = torch.cuda.device_count() = 3 in my case, and it turns out that at line 56, the size ofsubinsis 3.I choose to let cuda see all my GPUs, Later, at line 64, if
gpu_inds = range(cfg.NUM_GPUS)is used, the size ofgpu_indxis 8, which then will crash at line 68. Therefore, at line 64,gpus_inds = range(NUM_GPUs)is much more reasonable.Please check and see if my solution is correct or not. Thanks.