CUDA : unexpected printf behavior -


i don't understand behavior observe using printf in cuda kernel. can shed light on ? if normal why ? there way make sure printf data before modified inside kernel (debugging) ?

here code :

~>more * :::::::::::::: makefile :::::::::::::: all:     nvcc -o wtf.cu.o -arch=sm_21 -c wtf.cu     g++ -o wtf.exe -i/usr/local/cuda/include wtf.cpp wtf.cu.o -l/usr/local/cuda/lib64 -lcuda -lcudart :::::::::::::: wtf.cpp :::::::::::::: #include <iostream> // cout #include <cstdlib>  // rand, srand  #include <cuda_runtime_api.h> // cudaxxx void printongpu ( unsigned int const idatasize, int * const iopdata );  using namespace std;  int main () {   // allocate , initialize cpu data   unsigned int datasize = 4;   srand ( time ( null ) ); // random seed   int * pcpudata = ( int * ) malloc ( sizeof ( int ) * datasize );   ( unsigned int = 0; < datasize; i++ ) { pcpudata[i] = rand () % 100; cout << "cpu : " << pcpudata[i] << endl; }    // print gpu   int * pgpudata = null;   cudamalloc ( ( void ** ) &pgpudata, datasize * sizeof ( int ) );   cudamemcpy ( pgpudata, pcpudata, datasize * sizeof ( int ), cudamemcpyhosttodevice );   printongpu ( datasize, pgpudata );    // out   cudafree ( pgpudata );   if ( pcpudata ) { free ( pcpudata ); pcpudata = null; }   return 0; } :::::::::::::: wtf.cu :::::::::::::: #include "stdio.h"  __global__ void wtf ( unsigned int const idatasize, int * const iopdata ) {   if ( idatasize == 0 || !iopdata ) return;    // don't modify : print   unsigned long long int tidx = blockidx.x * blockdim.x + threadidx.x; // 1d grid   if ( tidx == 0 )   {     ( unsigned int = 0; < idatasize; i++ )       printf ( "gpu : %i \n", iopdata[i] );   }   __syncthreads();    // modify   // iopdata[tidx] = 666; // wtf ?... }  void printongpu ( unsigned int const idatasize, int * const iopdata ) {   wtf<<<2,2>>> ( idatasize, iopdata ); } 

and, expected, no value above 100 (line 15 in cpp file : rand () % 100) :

~>make; ./wtf.exe nvcc -o wtf.cu.o -arch=sm_21 -c wtf.cu g++ -o wtf.exe -i/usr/local/cuda/include wtf.cpp wtf.cu.o -l/usr/local/cuda/lib64 -lcuda -lcudart cpu : 38 cpu : 73 cpu : 28 cpu : 82 gpu : 38  gpu : 73  gpu : 28  gpu : 82  

now uncomment line 17 in cu file (iopdata[tidx] = 666) : modify values 666 (that above 100). have 4 data (datasize = 4 in cpp file), 2 x 2 grid , __syncthreads () before data modification in cuda kernel, should never printf modified data, right ? however, (print modified data value 666) :

 ~>make; ./wtf.exe nvcc -o wtf.cu.o -arch=sm_21 -c wtf.cu g++ -o wtf.exe -i/usr/local/cuda/include wtf.cpp wtf.cu.o -l/usr/local/cuda/lib64 -lcuda -lcudart cpu : 29 cpu : 72 cpu : 66 cpu : 90 gpu : 29  gpu : 72  gpu : 666  gpu : 666  

i don't understand why these 666 appear : me, should not appear ?! if behavior normal, why ?

fh

it's because launching 2 threadblocks, , threadblocks can execute in order, simultaneously or sequentially.

suppose have troublesome line uncommented. suppose threadblock 1 runs first , completes before threadblock 0. threadblock 0 runs. threadblock 0 doing printing, , printing 4 values. values set threadblock 1 666 printed out threadblock 0.

this not happen if threadblock 0 runs first, correspondingly guess never see first 2 gpu values listed 666, last 2 (emanating threadblock 1). never see if launching 1 block, regardless of number of threads (at least posted kernel code).

you may confused thinking __syncthreads() device-wide sync. not. acts barrier threads in threadblock only. there no synchronization between separate threadblocks.


Comments

Popular posts from this blog

php - cannot display multiple markers in google maps v3 from traceroute result -

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -