Replies: 2 comments 4 replies
-
        Prototype - M (In Review)
 MVP - M (In Review)
 Phase 1: - n/a
 Phase 2: - M
  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    1 reply
                  
                
            -
| 
         This could be unwieldy for large input tensors where a user still wants to specify. IMO the core issue is that the data Torch-TRT does shape inference on is not representative of the end users, correct? Why don't we let the user provide input data? Give the option to provide a data loader which resolves this issue, as well as makes DS + fallback easier. Thoughts?  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    3 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Context
When using
Torch-TensorRTto compile and run inference with BERT models, some users were experiencing issues with a CUDA indexing error (Issue #1418, PR #1424). The error seemed to show up only when more than two arguments were passed into the model. The source of the bug related to the fact that the third argument to these BERT models was a tensor oftorch.Longtype, which required only0and1values (documentation here).The shape analysis portion of partitioning, however, was initializing random Tensor inputs, sometimes with values outside of that range:
TensorRT/core/partitioning/shape_analysis.cpp
Line 23 in 5a7f00e
As a result, calls to
aten::embeddingand other indexing operations would fail, as they would be searching out of bounds. A temporary fix was made in PR #1424, addressing the issue by decreasing the range of values selected for the tensor, but a more robust fix would allow the user to (optionally) specify the valid range of values for each input tensor.Discussion
A rough framework for accomplishing this is to allow the user to specify a "low-inclusive" and "high-exclusive" value for each input, to ensure that the forward pass conducted in partitioning does not provide invalid inputs to the module. These (optionally) user-provided values would then substitute the existing default choices:
TensorRT/core/partitioning/shape_analysis.cpp
Lines 17 to 18 in b494311
If the user does not specify values, the defaults will be used. The main framework changes that would be required to implement this change are:
Inputclass specifying a two-element tuple with the minimum-inclusive, and maximum-exclusive allowed input values to a Tensor, for example:partitioningBeta Was this translation helpful? Give feedback.
All reactions