RDD是个抽象类,定义了诸如map()、reduce()等方法,但实际上继承RDD的派生类一般只要实现两个方法: def getPartitions: Array[Partition] def compute(thePart: Partition, context: TaskContext): Ne ...
分类:
其他好文 时间:
2016-08-02 23:43:14
阅读次数:
137
spark内核揭秘-10-RDD源码分析
**
* A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable,
* partitioned collection of elements that can be operated on in parallel. This class contains the
* basic operations available on a...
分类:
其他好文 时间:
2015-01-21 16:37:59
阅读次数:
196