pyspark - Matrix multiplication in py-spark using RDD -


i have 2 matrices

# 3x3 matrix     x = [[10,7,3],[3 ,2,6],[5 ,8,7]] # 3x4 matrix     y = [[3,7,11,2],[2,7,4,10],[8,7,6,11]] 

i want multiply these 2 in spark using rdd. can 1 me on this. multiplication should not use inbuilt function.

i able multiply 2 using loop in python follows

      in range(len(x)):     # iterate through columns of y         j in range(len(y[0])):     # iterate through rows of y             k in range(len(y)):                 output[i][j] += x[i][k] * y[k][j]      #output 3*4 empty matrix  

i new spark , using pyspark.

it not hard, have write matrix using different notation.

 x = [[10,7,3],[3 ,2,6],[5 ,8,7]] 

can written as

  x = (0,0,10),(0,1,7),(0,2,3)... 

rdd_x = sc.parallelize((0,0,10),(0,1,7),(0,2,3)...) rdd_y = sc.parallelize((0,0,3),(0,1,7),(0,2,11)...)

now can make multiplication both using join or cartesian. e.g.,

    rdd_x.cartesian(rdd_y)\ .filter(lambda x: x [0][0] == x[1][1] , x[0][1] == x[1][0])\ .map(lambda x: (x[0][0],x[0][2] * x[1][2])).reducebykey(lambda x,y: x+y).collect() 

Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -