multithreading - multiple streams in one GPU device -


i have multi-threaded program supposed run on 6 gpu devices. want open on each device 6 streams reuse during lifetime of program (36 in total).

i'm using cudastreamcreate() cublascreate() cublassetstream() create each stream , handle. use gpu memory monitor see memory usage each handle. however, when @ gpu memory usage on each device, grow on first stream creation, , doesn't change in rest of streams create.

as far know there isn't limitation on amount of streams want use. can't figure out why memory usage of handles , streams don't show on gpu memory usage.

all streams create residing within single context on given device, there no context related overhead creating additional streams after first one. streams lightweight , (mostly) host side scheduler abstraction. have observed, don't in consume (if any) device memory.


Comments

Popular posts from this blog

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

matlab - error with cyclic autocorrelation function -

php - Using grpc in Laravel, "Class 'Grpc\ChannelCredentials' not found." -