本文共 987 字,大约阅读时间需要 3 分钟。
PageRank 程序:
file contents:
page1 page3
page2 page1 page4 page1 page3 page1 page4 page2 page3 page4 def computeContribs(neighbors,rank): for neighbor in neighbors: yield( neighbor, rank/len(neighbors) )
links = sc.textFile("tst001.txt").map(lambda line: line.split()).map(lambda pages: (pages[0],pages[1]))\
.distinct().groupByKey().persist()ranks=links.map(lambda (page,neighbors): (page,1.0) )
In [4]: for x in range(1): ...: print "links count:"+links.count() ...: print "ranks count:" ranks.count() In [11]: for x in range(3): ....: contribs=links.join(ranks).flatMap( lambda (page,(neighbors,rank)): computeContribs(neighbors,rank) ) ....: ranks=contribs.reduceByKey(lambda v1,v2: v1+v2).map(lambda (page,contrib): (page,contrib*0.85+0.15)) ....: for rank in ranks.collect(): print rank(u'page2', 0.394375)
(u'page3', 1.2619062499999998) (u'page4', 0.8820624999999999) (u'page1', 1.4616562499999997)本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/p/7614711.html,如需转载请自行联系原作者