cuda - Can many threads set bit on the same word simultaneously? -
i need each thread of warp deciding on setting or not respective bit in 32 bits word. multiple setting take 1 memory access, or 1 memory access each bit set?
there no independent bit-setting capability in cuda. (there bit-field-insert instruction in ptx, nevertheless operates on 32-bit quantity.)
each thread set bit doing full 32-bit write. such write need atomic rmw operation in order preserve other bits. therefore accesses serialized, @ whatever throughput of atomics are.
if memory space not concern, breaking bits out separate integers allow avoid atomics.
a 32-bit packed quantity assembled using __ballot()
warp vote function. example given in answer here.
(in fact, warp vote function may allow avoid memory transactions altogether; can handled in registers, if result need 32-bit packed quantity.)
Comments
Post a Comment