简单的爬虫

时间：2019-12-09 19:44:10 阅读：129 评论：0 收藏：0 [点我收藏+]

标签：com pre div htm decode str 网站 url any

对某一视频共享网站电影名进行爬取

 1 # Author：Winter Liu
 2 import time
 3 import urllib.request
 4 import re
 5 
 6 start_time = time.time()
 7 html_start = ‘https://yanghuanyu.com/dy‘
 8 result = []
 9 for i in range(2,31):
10     response = urllib.request.urlopen(html_start)
11     buff = response.read()
12     html = buff.decode(‘utf-8‘)
13     # with open(‘hpage.txt‘,‘w‘,encoding=‘UTF-8‘) as f:
14     #     f.write(html)
15     print(html_start)
16     data = re.findall(r‘\[.+\]\[\d\d\d\d\]\[.+\]\[.+\]‘, html)
17     data = list(set(data))
18     print(data)
19     result.extend(data)
20     html_start = "https://yanghuanyu.com/dy/page/"+str(i)
21 print(len(result))
22 print(result)
23 
24 end_time = time.time()
25 print(end_time  - start_time)

简单的爬虫

标签：com pre div htm decode str 网站 url any

原文地址：https://www.cnblogs.com/nmucomputer/p/12012736.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行