apache spark - How to write a transformation function to transform RDD with reference to a Graphframe object? -
i have graphframe object: g , rdd object: candidate:
g = graphframe(v,e) candidates_rdd.collect() # [row(source=u'a', target=u'b'), # row(source=u'a', target=u'c'), # row(source=u'e', target=u'a')]
i want compute path "source" "target" in candidates_rdd , generate result rdd key, value pairs ((source, target), path_list) using graphframe's breadth first search, path_list list of paths source target.
example outputs:
(('a','b'),['a-c-b','a-d-e-b']), (('f','c'),[]), (('a',d'),['a-b-e-d']
i wrote below function:
def bfs_(row): arg1 = "id = '" + row.source + "'" arg2 = "id = '" + row.target + "'" return ((row.source, row.target), g.bfs(arg1,arg2).rdd) results = candidates_rdd.map(bfs_)
i got error:
py4jerror: error occurred while calling o274.__getnewargs__. trace: py4j.py4jexception: method __getnewargs__([]) not exist
i have tried make graph global or broadcast it, neither works.
could me on this?
thanks much!!
tl;dr not possible.
spark doesn't support nested operations this. outer loop has not-distributed:
>>> [g.bfs(arg1, arg2) arg1, arg2 in candidates_rdd.collect()]
Comments
Post a Comment