cuda - Number of threads in a block -
i used x
& y
calculating cells of matrix in device. when used more 32 lena & lenb, breakpoint (in int x= threadidx.x;
in device code) can't work , output isn't correct.
in host code:
int lena=52; int lenb=52; dim3 threadsperblock(lena, lenb); dim3 numblocks(lena / threadsperblock.x, lenb / threadsperblock.y); kernel_matrix<<<numblocks,threadsperblock>>>(dev_a, dev_b);
in device code:
int x= threadidx.x; int y= threadidx.y; ...
your threadsperblock
dim3 variable must satisfy requirements compute capability targetting.
cc 1.x devices can handle 512 threads per block
cc 2.0 - 3.5 devices can handle 1024 threads per block.
your dim3 variable @ (32,32) specifying 1024 (=32x32) threads per block. when exceed getting kernel launch fail.
if did cuda error checking on kernel launch, see error.
since kernel doesn't launch type of error, breakpoints set in kernel code won't hit.
Comments
Post a Comment