0

My model runs much slower in onnx than in torch. During the session initialization, I get some of these messages.

 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal_1
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: ConstantOfShape node name: /ConstantOfShape_2
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal_2
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal_3
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Resize node name: /image_encoder/image_encoder.1/Resize
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Resize node name: /image_encoder/image_encoder.1/Resize_1
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Resize node name: /image_encoder/image_encoder.1/Resize_2
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: GridSample node name: /GridSample
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal_4

I'm wondering if this might be the cause. Does it mean that the operations Equal, Resize, and GridSample are being executed on the CPU? If so, how can I debug this? Looking at https://github.com/microsoft/onnxruntime/blob/rel-1.20.0/docs/OperatorKernels.md it looks like all these kernels should be implemented for the CUDA execution provider. My onnxruntime version is 1.20.1.

1 Answer 1

0

The issue had to do with the operator versioning. I exported with opset=20, which caused GridSample to be exported at version=20, for example. However, the CUDA provider has only implemented it for version=16+. Re-exporting at opset=17 fixed the issue.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.