multithreading - multiple streams in one GPU device -
i have multi-threaded program supposed run on 6 gpu devices. want open on each device 6 streams reuse during lifetime of program (36 in total).
i'm using cudastreamcreate() cublascreate() cublassetstream() create each stream , handle. use gpu memory monitor see memory usage each handle. however, when @ gpu memory usage on each device, grow on first stream creation, , doesn't change in rest of streams create.
as far know there isn't limitation on amount of streams want use. can't figure out why memory usage of handles , streams don't show on gpu memory usage.
all streams create residing within single context on given device, there no context related overhead creating additional streams after first one. streams lightweight , (mostly) host side scheduler abstraction. have observed, don't in consume (if any) device memory.
Comments
Post a Comment