爬虫的基本流程

时间：2018-05-06 12:16:28 阅读：159 评论：0 收藏：0 [点我收藏+]

标签：xxxx ges zip data The 信息 xxxxx pre for

1、把页面放入到BeautifulSoup容器当中

with open(‘D:/xxxxx/the_blah.html‘,
          ‘r‘)as web_data:

soup = BeautifulSoup(web_data, ‘lxml‘)

2、获取页元素

images = soup.select(‘body > div.main-content > ul > li > img‘)
titles = soup.select(‘body > div.main-content > ul > li > h3 > a‘)
info = soup.select(‘body > div.main-content > ul > li > p‘)

3、筛选元素的具体信息

for image, title, info in zip(images, titles, infos):
    data = {
        ‘title‘: title.get_text(),  #获取标签的值
        ‘image‘: image.get(‘src‘),   #获取标签中的属性
        ‘info‘: info.get_text()
    }

爬虫的基本流程

标签：xxxx ges zip data The 信息 xxxxx pre for

原文地址：https://www.cnblogs.com/onlyhold/p/8997594.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行