码迷,mamicode.com
首页 > 编程语言 > 详细

python100题目的爬取

时间:2017-07-11 00:55:00      阅读:189      评论:0      收藏:0      [点我收藏+]

标签:content   requests   ret   ext   .text   bs4   stat   list   exe   

import requests
from bs4 import BeautifulSoup

def getHTMLText(url):
try:
r = requests.get(url)
r.raise_for_status()
r.encoding = ‘utf-8‘
return r.text
except:
return ‘‘

def fillUnivList(ulist, html):
soup = BeautifulSoup(html, ‘html.parser‘)
meta = soup.find_all(‘meta‘, attrs={‘name‘: ‘description‘})
ulist.append(meta[0].attrs[‘content‘])


def main():
start_url = ‘http://www.runoob.com/python/python-exercise-example‘
uinfo = []
for i in range(101):
url = start_url + str(i) +‘.html‘
try:
html = getHTMLText(url)
fillUnivList(uinfo, html)
except:
continue
for i in range(101):
try:
with open(‘100.txt‘, ‘a‘) as f:
f.write(uinfo[i] + ‘\n‘)
except:
continue

print(uinfo)

main()

python100题目的爬取

标签:content   requests   ret   ext   .text   bs4   stat   list   exe   

原文地址:http://www.cnblogs.com/wskzwsj/p/7148303.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!