cuda - Can many threads set bit on the same word simultaneously? -


i need each thread of warp deciding on setting or not respective bit in 32 bits word. multiple setting take 1 memory access, or 1 memory access each bit set?

there no independent bit-setting capability in cuda. (there bit-field-insert instruction in ptx, nevertheless operates on 32-bit quantity.)

each thread set bit doing full 32-bit write. such write need atomic rmw operation in order preserve other bits. therefore accesses serialized, @ whatever throughput of atomics are.

if memory space not concern, breaking bits out separate integers allow avoid atomics.

a 32-bit packed quantity assembled using __ballot() warp vote function. example given in answer here.

(in fact, warp vote function may allow avoid memory transactions altogether; can handled in registers, if result need 32-bit packed quantity.)


Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -