Spark:reduceByKey函数的用法

时间：2017-10-28 22:05:41 阅读：1156 评论：0 收藏：0 [点我收藏+]

标签：local clu 返回值 one default shuffle cal stand cas

reduceByKey函数ＡＰＩ：

def reduceByKey(partitioner: Partitioner, func: JFunction2[V, V, V]): JavaPairRDD[K, V]

def reduceByKey(func: JFunction2[V, V, V], numPartitions: Int): JavaPairRDD[K, V]

该函数利用映射函数将每个K对应的V进行运算。

其中参数说明如下：
- func：映射函数，根据需求自定义；
- partitioner：分区函数；
- numPartitions：分区数，默认的分区函数是HashPartitioner。

返回值：可以看出最终是返回了一个ＫＶ键值对。

使用示例：

ｌｉｎｕｘ:/$ spark-shell
。。。
17/10/28 20:33:54 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
17/10/28 20:33:55 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
Spark context available as sc.
17/10/28 20:33:57 WARN SessionState: load mapred-default.xml, HIVE_CONF_DIR env not found!
17/10/28 20:33:58 WARN SessionState: load mapred-default.xml, HIVE_CONF_DIR env not found!
SQL context available as sqlContext.

scala> val x = sc.parallelize(List(
     |     ("a", "b", 1),
     |     ("a", "b", 1),
     |     ("c", "b", 1),
     |     ("a", "d", 1))
     | )
x: org.apache.spark.rdd.RDD[(String, String, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:21

scala> val byKey = x.map({case (id,uri,count) => (id,uri)->count})
byKey: org.apache.spark.rdd.RDD[((String, String), Int)] = MapPartitionsRDD[1] at map at <console>:23

scala> val reducedByKey = byKey.reduceByKey(_ + _)
reducedByKey: org.apache.spark.rdd.RDD[((String, String), Int)] = ShuffledRDD[2] at reduceByKey at <console>:25


scala> reducedByKey.collect.foreach(println)
((c,b),1)
((a,d),1)
((a,b),2)

Spark:reduceByKey函数的用法

标签：local clu 返回值 one default shuffle cal stand cas

原文地址：http://www.cnblogs.com/yy3b2007com/p/7748074.html

踩

(1)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行