pyspark - Matrix multiplication in py-spark using RDD -
i have 2 matrices
# 3x3 matrix x = [[10,7,3],[3 ,2,6],[5 ,8,7]] # 3x4 matrix y = [[3,7,11,2],[2,7,4,10],[8,7,6,11]]
i want multiply these 2 in spark using rdd. can 1 me on this. multiplication should not use inbuilt function.
i able multiply 2 using loop in python follows
in range(len(x)): # iterate through columns of y j in range(len(y[0])): # iterate through rows of y k in range(len(y)): output[i][j] += x[i][k] * y[k][j] #output 3*4 empty matrix
i new spark , using pyspark.
it not hard, have write matrix using different notation.
x = [[10,7,3],[3 ,2,6],[5 ,8,7]]
can written as
x = (0,0,10),(0,1,7),(0,2,3)...
rdd_x = sc.parallelize((0,0,10),(0,1,7),(0,2,3)...) rdd_y = sc.parallelize((0,0,3),(0,1,7),(0,2,11)...)
now can make multiplication both using join or cartesian. e.g.,
rdd_x.cartesian(rdd_y)\ .filter(lambda x: x [0][0] == x[1][1] , x[0][1] == x[1][0])\ .map(lambda x: (x[0][0],x[0][2] * x[1][2])).reducebykey(lambda x,y: x+y).collect()
Comments
Post a Comment