1. 问题
使用 onnxruntime-gpu 进行推理,解决运行时间久了显存被拉爆了
2. C++/Python 配置
运行时,配置 provder , gpu_mem_limit
参数来进行限制,比如2G显存
- 2147483648
- 2 * 1024 * 1024 * 1024
Python
providers = [
(
"TensorrtExecutionProvider",
{
"device_id": 0,
"trt_max_workspace_size": 2147483648,
"trt_fp16_enable": True,
},
),
(
"CUDAExecutionProvider",
{
"device_id": 0,
"arena_extend_strategy": "kNextPowerOfTwo",
"gpu_mem_limit": 2 * 1024 * 1024 * 1024,
"cudnn_conv_algo_search": "EXHAUSTIVE",
"do_copy_in_default_stream": True,
},
),
]
如运行时,使用 cuda 进行推理
self.session = onnxruntime.InferenceSession(
path_or_bytes=model_file,
providers=[
(
"CUDAExecutionProvider",
{
"device_id": 0,
"arena_extend_strategy": "kNextPowerOfTwo",
"gpu_mem_limit": 2 * 1024 * 1024 * 1024,
"cudnn_conv_algo_search": "EXHAUSTIVE",
"do_copy_in_default_stream": True,
},
)
],
)
C++
OrtSessionOptions* session_options = /* ... */;
OrtCUDAProviderOptions options;
options.device_id = 0;
options.arena_extend_strategy = 0;
options.gpu_mem_limit = 2 * 1024 * 1024 * 1024;
options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearchExhaustive;
options.do_copy_in_default_stream = 1;
SessionOptionsAppendExecutionProvider_CUDA(session_options, &options);
3. 参考
(1)https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html
(2)https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html