Is [numthreads(1, GROUP_SIZE, GROUP_SIZE)]
as efficient as [numthreads(GROUP_SIZE, GROUP_SIZE, 1)] ?
CUDA confused me by disabling their z dimension.
↧