码迷,mamicode.com
首页 > 其他好文 > 详细

<Spark><Programming><Key/Value Pairs><RDD>

时间:2017-05-08 21:51:30      阅读:200      评论:0      收藏:0      [点我收藏+]

标签:can   net   eating   type   exp   split   from   ram   building   

Working with key/value Pairs

Motivation

  • Pair RDDs are a useful building block in many programs, as they expose operations that allow u to act on each key in parallel or regroup data across network.
  • Eg: pair RDDs have a reduceByKey() method that can aggeragate data separately for each key; join() method that can merge two RDDs together by grouping elements with the same key.

Creating Pair RDDs

  • Many formats we loading from will directly return pair RDDs for their k/v values.
  • By turning a regular RDD into a pair RDD  --> Using map() function
val pairs = lines.map(x => (x.split(" ")(0), x))

Transformation on Pair RDDs

  • 我们同样可以给Spark传送函数,不过由于pair RDDs包含的是元组tuple,所以我们要传送的函数式操作在tuples之上的。实际上Pair RDDs就是RDDs of Tuple2 object。

Aggregations

  • reduceByKey()和reduce()很相似:它们都接收一个函数并使用该函数来combine values。它们的不同在于:
    1. reduceByKey()并行地为数据集中每个key运行reduce操作。
    2. reduceByKey()属于transformation,它返回一个新的RDD。这样做是考虑到数据集有大量的keys。

<Spark><Programming><Key/Value Pairs><RDD>

标签:can   net   eating   type   exp   split   from   ram   building   

原文地址:http://www.cnblogs.com/wttttt/p/6827870.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!