CUDA : unexpected printf behavior -
i don't understand behavior observe using printf in cuda kernel. can shed light on ? if normal why ? there way make sure printf data before modified inside kernel (debugging) ?
here code :
~>more * :::::::::::::: makefile :::::::::::::: all: nvcc -o wtf.cu.o -arch=sm_21 -c wtf.cu g++ -o wtf.exe -i/usr/local/cuda/include wtf.cpp wtf.cu.o -l/usr/local/cuda/lib64 -lcuda -lcudart :::::::::::::: wtf.cpp :::::::::::::: #include <iostream> // cout #include <cstdlib> // rand, srand #include <cuda_runtime_api.h> // cudaxxx void printongpu ( unsigned int const idatasize, int * const iopdata ); using namespace std; int main () { // allocate , initialize cpu data unsigned int datasize = 4; srand ( time ( null ) ); // random seed int * pcpudata = ( int * ) malloc ( sizeof ( int ) * datasize ); ( unsigned int = 0; < datasize; i++ ) { pcpudata[i] = rand () % 100; cout << "cpu : " << pcpudata[i] << endl; } // print gpu int * pgpudata = null; cudamalloc ( ( void ** ) &pgpudata, datasize * sizeof ( int ) ); cudamemcpy ( pgpudata, pcpudata, datasize * sizeof ( int ), cudamemcpyhosttodevice ); printongpu ( datasize, pgpudata ); // out cudafree ( pgpudata ); if ( pcpudata ) { free ( pcpudata ); pcpudata = null; } return 0; } :::::::::::::: wtf.cu :::::::::::::: #include "stdio.h" __global__ void wtf ( unsigned int const idatasize, int * const iopdata ) { if ( idatasize == 0 || !iopdata ) return; // don't modify : print unsigned long long int tidx = blockidx.x * blockdim.x + threadidx.x; // 1d grid if ( tidx == 0 ) { ( unsigned int = 0; < idatasize; i++ ) printf ( "gpu : %i \n", iopdata[i] ); } __syncthreads(); // modify // iopdata[tidx] = 666; // wtf ?... } void printongpu ( unsigned int const idatasize, int * const iopdata ) { wtf<<<2,2>>> ( idatasize, iopdata ); }
and, expected, no value above 100 (line 15 in cpp file : rand () % 100) :
~>make; ./wtf.exe nvcc -o wtf.cu.o -arch=sm_21 -c wtf.cu g++ -o wtf.exe -i/usr/local/cuda/include wtf.cpp wtf.cu.o -l/usr/local/cuda/lib64 -lcuda -lcudart cpu : 38 cpu : 73 cpu : 28 cpu : 82 gpu : 38 gpu : 73 gpu : 28 gpu : 82
now uncomment line 17 in cu file (iopdata[tidx] = 666) : modify values 666 (that above 100). have 4 data (datasize = 4 in cpp file), 2 x 2 grid , __syncthreads () before data modification in cuda kernel, should never printf modified data, right ? however, (print modified data value 666) :
~>make; ./wtf.exe nvcc -o wtf.cu.o -arch=sm_21 -c wtf.cu g++ -o wtf.exe -i/usr/local/cuda/include wtf.cpp wtf.cu.o -l/usr/local/cuda/lib64 -lcuda -lcudart cpu : 29 cpu : 72 cpu : 66 cpu : 90 gpu : 29 gpu : 72 gpu : 666 gpu : 666
i don't understand why these 666 appear : me, should not appear ?! if behavior normal, why ?
fh
it's because launching 2 threadblocks, , threadblocks can execute in order, simultaneously or sequentially.
suppose have troublesome line uncommented. suppose threadblock 1 runs first , completes before threadblock 0. threadblock 0 runs. threadblock 0 doing printing, , printing 4 values. values set threadblock 1 666 printed out threadblock 0.
this not happen if threadblock 0 runs first, correspondingly guess never see first 2 gpu values listed 666, last 2 (emanating threadblock 1). never see if launching 1 block, regardless of number of threads (at least posted kernel code).
you may confused thinking __syncthreads()
device-wide sync. not. acts barrier threads in threadblock only. there no synchronization between separate threadblocks.
Comments
Post a Comment