cuda - Number of threads in a block -


i used x & y calculating cells of matrix in device. when used more 32 lena & lenb, breakpoint (in int x= threadidx.x; in device code) can't work , output isn't correct.

in host code:

int lena=52; int lenb=52;  dim3 threadsperblock(lena, lenb); dim3 numblocks(lena / threadsperblock.x, lenb / threadsperblock.y);  kernel_matrix<<<numblocks,threadsperblock>>>(dev_a, dev_b); 

in device code:

int x=  threadidx.x; int y=  threadidx.y; ... 

your threadsperblock dim3 variable must satisfy requirements compute capability targetting.

cc 1.x devices can handle 512 threads per block

cc 2.0 - 3.5 devices can handle 1024 threads per block.

your dim3 variable @ (32,32) specifying 1024 (=32x32) threads per block. when exceed getting kernel launch fail.

if did cuda error checking on kernel launch, see error.

since kernel doesn't launch type of error, breakpoints set in kernel code won't hit.


Comments

Popular posts from this blog

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -

Trying to import CSV file to a SQL Server database using asp.net and c# - can't find what I'm missing -