CANN Execution Provider
Huawei Compute Architecture for Neural Networks (CANN) is a heterogeneous computing architecture for AI scenarios and provides multi-layer programming interfaces to help users quickly build AI applications and services based on the Ascend platform.
Using CANN Excution Provider for ONNX Runtime can help you accelerate ONNX models on Huawei Ascend hardware.
The CANN Execution Provider (EP) for ONNX Runtime is developed by Huawei.
Please reference table below for official CANN packages dependencies for the ONNX Runtime inferencing package.
ONNX Runtime | CANN |
v1.12.1 | 6.0.0 |
v1.13.1 | 6.0.0 |
v1.14.0 | 6.0.0 |
For build instructions, please see the BUILD page.
Pre-built binaries of ONNX Runtime with CANN EP are published for most language bindings. Please reference Install ORT.
Configuration Options
The CANN Execution Provider supports the following configuration options.
The device ID.
Default value: 0
The size limit of the device memory arena in bytes. This size limit is only for the execution provider’s arena. The total device memory usage may be higher.
The strategy for extending the device memory arena.
Value | Description |
kNextPowerOfTwo | subsequent extensions extend by larger amounts (multiplied by powers of two) |
kSameAsRequested | extend by the requested amount |
Default value: kNextPowerOfTwo
Whether to do copies in the default stream or use separate streams. The recommended setting is true. If false, there are race conditions and possibly better performance.
Default value: true
Whether to use the graph inference engine to speed up performance. The recommended setting is true. If false, it will fall back to the single-operator inference engine.
Default value: true
Currently, users can use C/C++ and Python API on CANN EP.
import onnxruntime as ort
model_path = '<path to model>'
options = ort.SessionOptions()
providers = [
"device_id": 0,
"arena_extend_strategy": "kNextPowerOfTwo",
"npu_mem_limit": 2 * 1024 * 1024 * 1024,
"do_copy_in_default_stream": True,
"enable_cann_graph": True
session = ort.InferenceSession(model_path, sess_options=options, providers=providers)
const static OrtApi *g_ort = OrtGetApiBase()->GetApi(ORT_API_VERSION);
OrtSessionOptions *session_options;
OrtCANNProviderOptions *cann_options = nullptr;
std::vector<const char *> keys{"device_id", "npu_mem_limit", "arena_extend_strategy", "do_copy_in_default_stream", "enable_cann_graph"};
std::vector<const char *> values{"1", "2147483648", "kSameAsRequested", "1", "1"};
g_ort->UpdateCANNProviderOptions(cann_options,,, keys.size());
g_ort->SessionOptionsAppendExecutionProvider_CANN(session_options, cann_options);
// Finally, don't forget to release the provider options and session options
Supported ops
Following ops are supported by the CANN Execution Provider in single-operator Inference mode.
Operator | Note |
ai.onnx:Abs | |
ai.onnx:Add | |
ai.onnx:AveragePool | Only 2D Pool is supported. |
ai.onnx:BatchNormalization | |
ai.onnx:Cast | |
ai.onnx:Ceil | |
ai.onnx:Conv | Only 2D Conv is supported. Weights and bias should be constant. |
ai.onnx:Cos | |
ai.onnx:Div | |
ai.onnx:Dropout | |
ai.onnx:Exp | |
ai.onnx:Erf | |
ai.onnx:Flatten | |
ai.onnx:Floor | |
ai.onnx:Gemm | |
ai.onnx:GlobalAveragePool | |
ai.onnx:GlobalMaxPool | |
ai.onnx:Identity | |
ai.onnx:Log | |
ai.onnx:MatMul | |
ai.onnx:MaxPool | Only 2D Pool is supported. |
ai.onnx:Mul | |
ai.onnx:Neg | |
ai.onnx:Reciprocal | |
ai.onnx:Relu | |
ai.onnx:Reshape | |
ai.onnx:Round | |
ai.onnx:Sin | |
ai.onnx:Sqrt | |
ai.onnx:Sub | |
ai.onnx:Transpose |
Additional Resources
Additional operator support and performance tuning will be added soon.