I'm having an odd problem with D3D12 compute shaders. I have a very simple compute shader that does nothing but write the global ID of the current thread out to a buffer:
RWStructuredBuffer<uint> g_pathVisibility : register(u0, space1);
cbuffer cbPushConstants : register(b0)
{
uint g_count;
};
[numthreads(32, 1, 1)]
void main(uint3 DTid : SV_DispatchThreadID)
{
if(DTid.x < g_count)
{
g_pathVisibility[DTid.x] = DTid.x + 1;
}
}
I'm allocating 2 buffers with space or 128 integers. One buffer is the output buffer for the shader above and the other is a copy destination buffer for CPU readback. If I set numthreads() to any power of two, for example it's set to 32 above, I get a device reset error on NVIDIA hardware only. If I set numthreads() to any non-power of 2 value the shader works as expected. The exceptionally odd thing is that all of the compute shaders in the D3D12 samples work fine with numthreads() containing powers of 2. It doesn't matter if I execute the compute shader on a graphics queue or a compute queue - it's the same result either way. I've tested this on a GTX 1080 and a GTX 1070 with identical results. AMD cards seem to work as expected. Anyone have any idea what the hell could be going on? I tried asking NVIDIA on their boards but per-usual they never responded. I'm using their latest drivers. I've attached my sample application if anyone is interested, it's a UWP app since Visual Studio provides a nice D3D12 app template that I use to play around with simple ideas. The shader in question in the project is TestCompute.hlsl and the function where the magic happens is Sample3DSceneRenderer::TestCompute() line 1006 in Sample3DSceneRenderer.cpp.
PathTransform_2.zip
↧