Can inline assembly in sycl like cuda does? #13203
-
| You know, developers can inline PTX in the cuda kernel, which is much more efficient for expert developers. That is much more important for extreme optimization. | 
Beta Was this translation helpful? Give feedback.
      
      
          Answered by
          
            AlexeySachkov
          
      
      
        Apr 2, 2024 
      
    
    Replies: 1 comment 6 replies
-
| Yes it is exactly as in the nvcc compiler (see https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html) except you have to wrap ptx code in the following MACRO: In a parallel_for. | 
Beta Was this translation helpful? Give feedback.
                  
                    6 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
@alanzhai219, yes, this functionality is supported already, you can find some examples (but with Intel GPUs ASM) here: https://github.com/intel/llvm/tree/sycl/sycl/test-e2e/InlineAsm