apache spark - How to write a transformation function to transform RDD with reference to a Graphframe object? -


i have graphframe object: g , rdd object: candidate:

g = graphframe(v,e) candidates_rdd.collect()  #  [row(source=u'a', target=u'b'), #   row(source=u'a', target=u'c'), #   row(source=u'e', target=u'a')] 

i want compute path "source" "target" in candidates_rdd , generate result rdd key, value pairs ((source, target), path_list) using graphframe's breadth first search, path_list list of paths source target.

example outputs:

(('a','b'),['a-c-b','a-d-e-b']),  (('f','c'),[]), (('a',d'),['a-b-e-d'] 

i wrote below function:

def bfs_(row):         arg1 = "id = '" + row.source + "'"     arg2 = "id = '" + row.target + "'"             return ((row.source, row.target), g.bfs(arg1,arg2).rdd)  results = candidates_rdd.map(bfs_) 

i got error:

py4jerror: error occurred while calling o274.__getnewargs__. trace: py4j.py4jexception: method __getnewargs__([]) not exist 

i have tried make graph global or broadcast it, neither works.

could me on this?

thanks much!!

tl;dr not possible.

spark doesn't support nested operations this. outer loop has not-distributed:

>>> [g.bfs(arg1, arg2) arg1, arg2 in candidates_rdd.collect()] 

Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -