apache spark - How to write a transformation function to transform RDD with reference to a Graphframe object? -

- March 15, 2014

i have graphframe object: g , rdd object: candidate:

g = graphframe(v,e) candidates_rdd.collect()  #  [row(source=u'a', target=u'b'), #   row(source=u'a', target=u'c'), #   row(source=u'e', target=u'a')]

i want compute path "source" "target" in candidates_rdd , generate result rdd key, value pairs ((source, target), path_list) using graphframe's breadth first search, path_list list of paths source target.

example outputs:

(('a','b'),['a-c-b','a-d-e-b']),  (('f','c'),[]), (('a',d'),['a-b-e-d']

i wrote below function:

def bfs_(row):         arg1 = "id = '" + row.source + "'"     arg2 = "id = '" + row.target + "'"             return ((row.source, row.target), g.bfs(arg1,arg2).rdd)  results = candidates_rdd.map(bfs_)

i got error:

py4jerror: error occurred while calling o274.__getnewargs__. trace: py4j.py4jexception: method __getnewargs__([]) not exist

i have tried make graph global or broadcast it, neither works.

could me on this?

thanks much!!

tl;dr not possible.

spark doesn't support nested operations this. outer loop has not-distributed:

>>> [g.bfs(arg1, arg2) arg1, arg2 in candidates_rdd.collect()]

Search This Blog

Prevent

apache spark - How to write a transformation function to transform RDD with reference to a Graphframe object? -

Comments

Post a Comment

Popular posts from this blog

github - Git errors while pushing -

php - Using grpc in Laravel, "Class 'Grpc\ChannelCredentials' not found." -

c++ - How can I delete a QSharedPointer -