码迷,mamicode.com
首页 > 其他好文 > 详细

Hadoop Mapreduce之WordCount实现

时间:2017-06-11 15:40:38      阅读:224      评论:0      收藏:0      [点我收藏+]

标签:apach   cep   long   auto   ble   ora   tin   override   on()   

1.新建一个WCMapper继承Mapper
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
      @Override
      protected void map(LongWritable key, Text value, Context context)
                  throws IOException, InterruptedException {
            //接收数据V1
            String line = value.toString();
            //切分数据
            String[] wordsStrings = line.split(" ");
            //循环
            for (String w: wordsStrings) {
                  //出现一次,记一个一,输出
                  context.write(new Text(w), new LongWritable(1));
            }
      }
}
 
2.新建一个WCReducer继承Reducer
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
      @Override
      protected void reduce(Text key, Iterable<LongWritable> v2s, Context context)
                  throws IOException, InterruptedException {
            // TODO Auto-generated method stub
            //接收数据
            //Text k3 = k2;
            //定义一个计算器
            long counter = 0;
            //循环v2s
            for (LongWritable i : v2s)
            {
                  counter += i.get();
            }
            //输出
            context.write(key, new LongWritable(counter));
      }
}
3.WordCount类实现Main方法
/*
 * 1.分析具体的业力逻辑,确定输入输出数据样式
 * 2.自定义一个类,这个类要继承import org.apache.hadoop.mapreduce.Mapper;
 * 重写map方法,实现具体业务逻辑,将新的kv输出
 * 3.自定义一个类,这个类要继承import org.apache.hadoop.mapreduce.Reducer;
 * 重写reduce,实现具体业务逻辑
 * 4.将自定义的mapper和reducer通过job对象组装起来
 */
public class WordCount {
      public static void main(String[] args) throws Exception {
            // 构建Job对象
            Job job = Job.getInstance(new Configuration());
            
            // 注意:main方法所在的类
            job.setJarByClass(WordCount.class);
            
            // 设置Mapper相关属性
            job.setMapperClass(WCMapper.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(LongWritable.class);
            FileInputFormat.setInputPaths(job, new Path("/words.txt"));
            
            // 设置Reducer相关属性
            job.setReducerClass(WCReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(LongWritable.class);
            FileOutputFormat.setOutputPath(job, new Path("/wcount619"));
            // 提交任务
            job.waitForCompletion(true);
      }     
}
 
 
4.打包为wc.jar,并上传到linux,并在Hadoop下运行
     hadoop jar /root/wc.jar

Hadoop Mapreduce之WordCount实现

标签:apach   cep   long   auto   ble   ora   tin   override   on()   

原文地址:http://www.cnblogs.com/dulixiaoqiao/p/6985237.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!