multithreading - multiple streams in one GPU device -


i have multi-threaded program supposed run on 6 gpu devices. want open on each device 6 streams reuse during lifetime of program (36 in total).

i'm using cudastreamcreate() cublascreate() cublassetstream() create each stream , handle. use gpu memory monitor see memory usage each handle. however, when @ gpu memory usage on each device, grow on first stream creation, , doesn't change in rest of streams create.

as far know there isn't limitation on amount of streams want use. can't figure out why memory usage of handles , streams don't show on gpu memory usage.

all streams create residing within single context on given device, there no context related overhead creating additional streams after first one. streams lightweight , (mostly) host side scheduler abstraction. have observed, don't in consume (if any) device memory.


Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -