1.英文词频统
下载一首英文的歌词或文章
news=‘‘‘
Passion is sweetLove makes weakYou said you cherised freedom soYou refused to let it goFollow your faithLove and hatenever failed to seize the dayDon‘t give yourself awayOh when the night fallsAnd your all aloneIn your deepest sleepWhat are you dreeeming ofMy skin‘s still burning from your touchOh I just can‘t get enoughI said I wouldn‘t ask for muchBut your eyes are dangerousSo the tought keeps spinning in my headCan we drop this masqueradeI can‘t predict where it endsIf you‘re the rock I‘ll crush againstTrapped in a crowdMusic‘s loudI said I loved my freedom tooNow im not so sure i doAll eyes on youWings so trueBetter quit while your aheadNow im not so sure i amOh when the night fallsAnd your all aloneIn your deepest sleepWhat are you dreaming ofMy skin‘s still burning from your touchOh I just can‘t get enoughI said I wouldn‘t ask for muchBut your eyes are dangerousSo the thought keeps spinning in my headCan we drop this masqueradeI can‘t predict where it endsIf you‘re the rock I‘ll crush againstMy soul, my heartIf your near or if your farMy life, my loveYou can have it allOh when the night fallsAnd your all aloneIn your deepest sleepWhat are you dreaming ofMy skin‘s still burning from your touchOh I just can‘t get enoughI said I wouldn‘t ask for muchBut your eyes are dangerousSo the thought keeps spinning in my headCan we drop this masqueradeI can‘t predict where it endsIf you‘re the rock I‘ll crush againstIf you‘re the rock i‘ll crush against‘‘‘将所有,.?!’:等分隔符全部替换为空格
sep = ‘‘‘:.,?!‘‘‘
for i in sep:
news = news.replace(i,‘ ‘);
将所有大写转换为小写
news = news.lower();
生成单词列表
news_list = news.split(); print(news_list);
生成词频统计
news_dict={}
news_set =set(news_list)-exclude
for w in news_set:
news_dict[w] = news_list.count(w)
for w in news_dict:
print(w,news_dict[w])
news_dict={}
for w in news_list:
news_dict[w] =news_dict.get(w,0)+1
for w in exclude:
del (news_dict[w]);
for w in news_dict:
print(w,news_dict[w])
排序
dictList = list(news_dict.items()) dictList.sort(key=lambda x:x[1],reverse=True);
排除语法型词汇,代词、冠词、连词
exclude = {‘the‘,‘to‘,‘is‘,‘and‘}
for w in exclude:
del (news_dict[w]);
输出词频最大TOP20
for i in range(20):
print(dictList[i])
将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容
file = open("test.txt", "r",encoding=‘utf-8‘);
news = file.read();
file.close(