(I've googled for the possibility to write assembler-like instructions for DirectX 12, but have found nothing more than hacking the bytecode. Something I don't have the patience to do. So I need to deal with HLSL but I need to make few things clear)
I have thought of a simple example that demonstrates what exactly I need to know:
For the example I have two textures of the same size, both 2D and of the type of DXGI_FORMAT_R8_UINT. Let say 128x128 "pixels". One is inputted to the pixel shader and the pixel shader outputs the result to the other.
I want the shader to take the input byte, XOR it with 44h and output it to the render target.
My doubt is which of those two will happen:
(assuming for the example that the registers used in the GPU's ALUs are 16 byte wide)
case 1: the HLSL compiler loads the byte to the first component of a 16-bytes-wide vector register(or to the lowest 8 bits of the first 4-dwords-wide vector register).
and the rest of 15 bytes are just zeroed
case 2: HLSL compiler is wise enough to read the data in chunks of 16 bytes and load them into the 16 bytes of a 16 components vector register, propagate that 44h to another 16 bytes wide register, XOR it once and write a single 16 bytes chunk to the render target. (but I can't see it happen, because it says in MSDN that the unused components are zeroed. Not clear if they are just hidden from me or they was zeroed for real as in the first case....)
What about variables:
Would this
float fVar = 3.1f;
occupy the same type of register as this
float4 fVector = { 0.2f, 0.3f, 0.4f, 0.1f }; ?
(i'm sorry if i'm posting too often, but it is very hard for me to find something helpful in google)
↧