matrix - Serialize iterator object to be passed between processes in Python -
i have python script calculates eigenvalues of matrices list, , insert these eigenvalues collection in same order original matrix , spawning multiple processes.
here code:
import time import collections import numpy np scipy import linalg la joblib import parallel, delayed def computeeigenv(unit_of_work): current_index = unit_of_work[0] current_matrix = unit_of_work[1] e_vals, e_vecs = la.eig(current_matrix) finished_unit = (current_index, lowev[::-1]) return finished_unit def run(work_list): pool = parallel( n_jobs = -1, verbose = 1, pre_dispatch = 'all') results = pool(delayed(computeeigenv)(unit_of_work) unit_of_work in work_list) return results if __name__ == '__main__': # create original array of matrices original_matrix_list = [] work_list = [] #basic set can run test in range(0, 100): # generate matrix & unit or work matrix = np.random.random_integers(0, 100, (500, 500)) #insert respective resources original_matrix_list.append(matrix) i, matrix in enumerate(original_matrix_list): unit_of_work = [i, matrix] work_list.append(unit_of_work) work_result = run(work_list)
so work_result
should hold eigenvalues each matrix after processes finish. , iterator using unit_of_work
list containing index of matrix (from original_matrix_list
) , matrix itself.
the weird thing is, if run code doing python matrix.py
works perfectly. when use auto (a program calculations differential equations?) run script, typing auto matrix.py
gives me following error:
traceback (most recent call last): file "matrix.py", line 50, in <module> work_result = run(work_list) file "matrix.py", line 27, in run results = pool(delayed(computeeigenv)(unit_of_work) unit_of_work in work_list) file "/library/python/2.7/site-packages/joblib/parallel.py", line 805, in __call__ while self.dispatch_one_batch(iterator): file "/library/python/2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch tasks = batchedcalls(itertools.islice(iterator, batch_size)) file "/library/python/2.7/site-packages/joblib/parallel.py", line 69, in __init__ self.items = list(iterator_slice) file "matrix.py", line 27, in <genexpr> results = pool(delayed(computeeigenv)(unit_of_work) unit_of_work in work_list) file "/library/python/2.7/site-packages/joblib/parallel.py", line 162, in delayed pickle.dumps(function) typeerror: expected string or unicode object, nonetype found
note: when ran auto
had change if __name__ == '__main__':
if __name__ == '__builtin__':
i looked error , seems not serializing iterator unit_of_work
correctly when passing around different processes. have tried use serialized_unit_of_work = pickle.dumps(unit_of_work)
, pass around, , pickle.loads
when need use iterator, still same error.
can please point me in right direction how can fix this? hesitate use pickle.dump(obj, file[, protocol])
because running calculate eigenvalues of thousands of matrices , don't want create many files store serialized iterator if possible.
thanks!! :)
you can't pickle iterator in python2.7
(but can 3.4
onward).
also, pickling works differently in __main__
different when not in __main__
, , seem auto
doing odd __main__
. observe when pickling fails on particular object if instead of running script object in directly, run script main imports portion of script "difficult-to-serialize" object, pickling succeed. because object pickle reference @ namespace level above "difficult" object lives… it's never directly pickled.
so, can away pickling want, adding reference layer… file import or class. but, if want pickle iterator, out of luck unless move @ least python3.4
.
Comments
Post a Comment