Python2 抓取百度贴吧图片

时间：2016-06-13 19:30:50 阅读：249 评论：0 收藏：0 [点我收藏+]

标签：python

我这里抓取的百度贴吧的地址是http://tieba.baidu.com/p/2460150866?pn=1。以下是源码，使用的是python2。

import re
import urllib

#抓取页面的源码
def getHtml(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html

#下载源码中指定的图片    
def getImg(html):
    reg = r‘src="(.+?\.jpg)" pic_ext‘
    imgre = re.compile(reg)
    imglist = imgre.findall(html)
    x = 0
    for imgurl in imglist:
        print(imgurl)
        urllib.urlretrieve(imgurl,r‘C:\Users\Water\PycharmProjects\test\image\%s-%s.jpg‘ % (i,x) )
        x = x + 1

#循环抓取所有的页面        
i = 1
while i < 74:
    html = getHtml("http://tieba.baidu.com/p/2460150866?pn=" + str(i))
    getImg(html)
    i+=1
    print(i)

下面是抓取的结果

本文出自 “小小水滴” 博客，请务必保留此出处http://wangzan18.blog.51cto.com/8021085/1788735

Python2 抓取百度贴吧图片

标签：python

原文地址：http://wangzan18.blog.51cto.com/8021085/1788735

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行