Saturday, February 13, 2016

cuDNN perf

nvvp in cuda 7.5 toolkit can not profile the cudnn.
but nvprof will work!


cudnn::detail::precomputed_convolve_sgemm
cudnn::detail::activation_fw_4d_kernel
kern_precompute_indices
add_tensor_kernel
cudnn::detail::softmax_fw_kernel
gemv2T_kernel_val
cudnn::detail::pooling_fw_4d_kernel