How to parallelize this embarrassingly parallel loop with Python -
i have embarrassingly parallel loop:
# definitions def exhaustiveexplorationswithsimilarityall(inputfolder, outputfolder, similaritymeasure): phasesspeedupdictfolder=parsephasesspeedupdictfolder(inputfolder) avgspeedupprogramdict=computeavgspeedupprogram(phasesspeedupdictfolder) parameters={ programsphasesspeedupdicts:phasesspeedupdictfolder, programsavgspeedupdict:avgspeedupprogramdict } similarityhandler= similarityhandler(similaritymeasure,parameters) # sequential running filename in os.listdir(inputfolder): print filename exhaustiveexplorationswithsimilarity(inputfolder + filename, outputfolder + filename, similarityhandler)
and make parallel using joblib
parallel:
# parallel version num_cores = multiprocessing.cpu_count() parallel= parallel(n_jobs=num_cores) filename in os.listdir(inputfolder): print filename parallel(delayed(exhaustiveexplorationswithsimilarity(inputfolder + filename, outputfolder + filename, similarityhandler)))
or other version:
arg_generator = ((inputfolder + filename, outputfolder + filename, similarityhandler) filename in os.listdir(inputfolder)) parallel(delayed(exhaustiveexplorationswithsimilarity)(arg_generator))
but upon running complaints :
parallel(delayed(exhaustiveexplorationswithsimilarity(inputfolder + filename, outputfolder + filename, similarityhandler))) file "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 516, in __call__ function, args, kwargs in iterable: typeerror: 'function' object not iterable
what missing here? appreciated.
you still calling exhaustiveexplorationswithsimilarity
(serially) inside loop, passing result delayed
from docs https://pythonhosted.org/joblib/parallel.html#common-usage, looks need like:
parallel = parallel(n_jobs=num_cores) parallel(delayed(exhaustiveexplorationswithsimilarity)(inputfolder + filename, outputfolder + filename, similarityhandler) filename in os.listdir(inputfolder))
Comments
Post a Comment