码迷,mamicode.com
首页 > 其他好文 > 详细

spark学习02天-scala读取文件,词频统计

时间:2019-06-09 00:22:34      阅读:137      评论:0      收藏:0      [点我收藏+]

标签:beautiful   school   img   安装   inter   环境   rom   import   style   

1.在本地安装jdk环境和scala环境

技术图片

 

2.读取本地文件:

 

scala> import scala.io.Source
import scala.io.Source

scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
lines: List[String]
= List("With the development of civilization, it is the chil drens duty to study in school since they were small. As the young kids, it is t heir nature to hang out for fun. ", "", "While for them, most of the time have b een limited in the class. So they feel frustrated and dont have much passion to study. It is of great importance to develop ", "", "interest. The first thing i s to broaden vision. The students can read travel books or watch tourist show, f or anyone who cannot resist the charm of beautiful scenery ", "", and delicious food. The second thing is taking the right attitude to exams. Never giving too m uch pressure on getting high marks. The only thing we should do is to enjoy gain ing knowledge.)

3.词频topN计算

scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=>
(x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse
res0: List[(String, Int)] = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin
g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o
nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study
.,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were,
1), (time,1), (them,,1), (childrens,1), (development,1), (knowledge.,1), (It,1)
, (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat
ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma
ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1),
(travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil
ization,,1), (broaden,1), (out,1), (food.,1), (dont,1), (importance,1), (kid...

 

 

spark学习02天-scala读取文件,词频统计

标签:beautiful   school   img   安装   inter   环境   rom   import   style   

原文地址:https://www.cnblogs.com/students/p/10992149.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!