cuda - Lock reading/writing for rows in two dimension array in global memory -
i have 2 dimensional array , there threads update rows in array. may 2 or more threads need update 1 row @ time. need lock threads trying access same row if there thread updating it.
if threads need update 1 element in row simple atomic operation (e.g. atomicadd()
) should it.
if related operations must performed on multiple elements of row, need implement sort of "critical section" control. example:
#define row_count 10 __global__ void the_kernel( ... ) { // block-level shared array indicating locking state of each row __shared__ int locks[ row_count ]; // initialize shared array if ( threadidx.x == 0 ) memset( locks, 0, sizeof( int ) * row_count ); __syncthreads(); // suppose current thread need update row #3 int row_idx = 3; // thread-local variable indicating whether current thread has access target row bool updating = false; { // return value atomiccas 0 when no other thread updating row #3 // otherwise current thread should loop until row lock released other threads updating = (atomiccas( locks + row_idx, 0, -1 ) == 0); if (updating) { // entered critical section, work! // before release lock, should make changes made current thread visible other threads // can not use __syncthreads() inside conditional branch, have use __threadfence() __threadfence(); // releasing lock atomicexch( locks + row_idx, 0 ); } } while ( !updating ); }
that being said, design serial 1 thread updating row loop , wait turn. there performance penalties use when absolutely necessary.
please note structure works when threads can fit single block (e.g. total of 1024 threads) because shared memory array locking not work across blocks. on gpus multiple sms (i.e. multiple blocks can scheduled on different sms) threads different blocks try access rows in race.
Comments
Post a Comment