码迷,mamicode.com
首页 > 其他好文 > 详细

requests模块的高级用法

时间:2020-01-11 18:25:33      阅读:80      评论:0      收藏:0      [点我收藏+]

标签:list   lan   name   hashlib   request   ref   class   图片   utf8   

1.代理

  代理服务器,可以接受请求然后将其转发

1.匿名度

  1. 高匿:不知道你使用了代理,也不知道你的真实ip
  2. 匿名: 知道你使用了代理,但是不知道你的真实ip
  3. 透明:知道你使用了代理并且知道你的真实ip

2.类型

http
https

3.免费代理的网站

- http://www.goubanjia.com/
- 快代理
- 西祠代理
- http://http.zhiliandaili.cn/

构建代理池:

# 构建代理池
import requests
from lxml import etree

headers = {
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36,
    Connection: "close"
}

#从代理精灵中提取代理ip
ip_url = http://t.11jsq.com/index.php/api/entry?method=proxyServer.generate_api_url&packid=1&fa=0&fetch_key=&groupid=0&qty=4&time=1&pro=&city=&port=1&format=html&ss=5&css=&dt=1&specialTxt=3&specialJson=&usertype=2
page_text = requests.get(ip_url, headers=headers).text
tree = etree.HTML(page_text)
ip_list = tree.xpath(//body//text())
print(ip_list)

# 爬取西祠代理的ip
url = "https://www.xicidaili.com/nn/%d"
proxy_list_http = []
proxy_list_https = []
for page in range(1, 2):
    new_url = format(url % page)
    page_text = requests.get(url=new_url, headers=headers).text
    tree = etree.HTML(page_text)
    tree_list = tree.xpath(//*[@id="ip_list"]//tr)[1:]
    for tr in tree_list:
        ip = tr.xpath("./td[2]/text()")[0]
        port = tr.xpath("./td[3]/text()")[0]
        ip_type = tr.xpath("./td[6]/text()")[0]
        if ip_type == "HTTP":
            dic = {
                ip_type: ip + ":" + port
            }
            proxy_list_http.append(dic)
        else:
            dic = {
                ip_type: ip + ":" + port
            }
            proxy_list_https.append(dic)
print(len(proxy_list_http), len(proxy_list_https))

# 检测
url = "https://www/sogou.com"
for ip in proxy_list_http:
    response = requests.get(url=url, headers=headers, proxies={https: ip})
    if response.status_code == "200":
        print("检测到可用的ip")

2.cookie的处理

  手动处理:jiangcookie封装到headers中

  自动处理:session对象,可以创建一个session对象,该对象的可以像requests一样进行请求发送,不同之处在于如果使用session进行请求发送的过程中产生了cookie,则cookie会被自动存储在session对象中。

#对雪球网中的新闻数据进行爬取https://xueqiu.com/
import requests
headers = {
    User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36,
    Cookie:aliyungf_tc=AQAAAAl2aA+kKgkAtxdwe3JmsY226Y+n; acw_tc=2760822915681668126047128e605abf3a5518432dc7f074b2c9cb26d0aa94; xq_a_token=75661393f1556aa7f900df4dc91059df49b83145; xq_r_token=29fe5e93ec0b24974bdd382ffb61d026d8350d7d; u=121568166816578; device_id=24700f9f1986800ab4fcc880530dd0ed
}
url = https://xueqiu.com/v4/statuses/public_timeline_by_category.json?since_id=-1&max_id=20349203&count=15&category=-1
page_text = requests.get(url=url,headers=headers).json()
print(page_text)
import requests
headers = {
    User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36
}

#创建session对象
session = requests.Session()
session.get(https://xueqiu.com, headers=headers)

url = https://xueqiu.com/v4/statuses/public_timeline_by_category.json?since_id=-1&max_id=20349203&count=15&category=-1
page_text = session.get(url=url, headers=headers).json()
print(page_text)

3.模拟登陆

1.验证码识别

相关网站

  - 超级鹰:http://www.chaojiying.com/about.html

    • 注册:(用户中心身份)
    • 登陆:
      • 创建一个软件:
      • 下载实例代码

  - 打码兔

  - 云打码

识别古诗文网中的验证码

# 超级鹰的实例代码 
import requests
from hashlib import md5
from lxml import etree

class Chaojiying_Client(object):

    def __init__(self, username, password, soft_id):
        self.username = username
        password =  password.encode(utf8)
        self.password = md5(password).hexdigest()
        self.soft_id = soft_id
        self.base_params = {
            user: self.username,
            pass2: self.password,
            softid: self.soft_id,
        }
        self.headers = {
            Connection: Keep-Alive,
            User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0),
        }

    def PostPic(self, im, codetype):
        """
        im: 图片字节
        codetype: 题目类型 参考 http://www.chaojiying.com/price.html
        """
        params = {
            codetype: codetype,
        }
        params.update(self.base_params)
        files = {userfile: (ccc.jpg, im)}
        r = requests.post(http://upload.chaojiying.net/Upload/Processing.php, data=params, files=files, headers=self.headers)
        return r.json()

    def ReportError(self, im_id):
        """
        im_id:报错题目的图片ID
        """
        params = {
            id: im_id,
        }
        params.update(self.base_params)
        r = requests.post(http://upload.chaojiying.net/Upload/ReportError.php, data=params, headers=self.headers)
        return r.json()
    
# 1.识别古诗文网站中的验证码    
def tranformImgData(imgpath,t_type):
    chaojiying = Chaojiying_Client(15879478962, 15879478962, 901492)  # 超级鹰用户名,超级鹰用户名密码,软件id
    im = open(imgPath, rb).read()
    return chaojiying.PostPic(im, t_type)[pic_str]
url = "https://so.gushiwen.org/user/login.aspx?from=http://so.gushiwen.org/user/collect.aspx"
page_text = requests.get(url,headers=headers).text
tree = etree.HTML(page_text)
img_src = https://so.gushiwen.org+tree.xpath(//*[@id="imgCode"]/@src)[0]
img_data = requests.get(img_src,headers=headers).content
with open("./code.jpg","wb") as fp:
    fp.write(img_data)
tranformImgData(./code.jpg,1004)

古诗文的模拟登陆

# 超级鹰的实例代码 
import requests
from hashlib import md5
from lxml import etree

class Chaojiying_Client(object):

    def __init__(self, username, password, soft_id):
        self.username = username
        password =  password.encode(utf8)
        self.password = md5(password).hexdigest()
        self.soft_id = soft_id
        self.base_params = {
            user: self.username,
            pass2: self.password,
            softid: self.soft_id,
        }
        self.headers = {
            Connection: Keep-Alive,
            User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0),
        }

    def PostPic(self, im, codetype):
        """
        im: 图片字节
        codetype: 题目类型 参考 http://www.chaojiying.com/price.html
        """
        params = {
            codetype: codetype,
        }
        params.update(self.base_params)
        files = {userfile: (ccc.jpg, im)}
        r = requests.post(http://upload.chaojiying.net/Upload/Processing.php, data=params, files=files, headers=self.headers)
        return r.json()

    def ReportError(self, im_id):
        """
        im_id:报错题目的图片ID
        """
        params = {
            id: im_id,
        }
        params.update(self.base_params)
        r = requests.post(http://upload.chaojiying.net/Upload/ReportError.php, data=params, headers=self.headers)
        return r.json()
    
# 1.识别古诗文网站中的验证码    
def tranformImgData(imgpath,t_type):
     chaojiying = Chaojiying_Client(‘3089693229@qq.com, 3089693229, 901492)
    im = open(imgPath, rb).read()
    return chaojiying.PostPic(im, t_type)[pic_str]
url = "https://so.gushiwen.org/user/login.aspx?from=http://so.gushiwen.org/user/collect.aspx"
page_text = requests.get(url,headers=headers).text
tree = etree.HTML(page_text)
img_src = https://so.gushiwen.org+tree.xpath(//*[@id="imgCode"]/@src)[0]
img_data = requests.get(img_src,headers=headers).content
with open("./code.jpg","wb") as fp:
    fp.write(img_data)

# 模拟登陆 s
= requests.Session() url = https://so.gushiwen.org/user/login.aspx?from=http://so.gushiwen.org/user/collect.aspx page_text = s.get(url,headers=headers).text tree = etree.HTML(page_text) img_src = https://so.gushiwen.org+tree.xpath(//*[@id="imgCode"]/@src)[0] img_data = s.get(img_src,headers=headers).content with open("./code.jpg",wb) as fp: fp.write(img_data) # 动态获取变化的请求参数 # - 动态变化的请求参数 # - 通常情况下动态变化的请求参数都会被隐藏在前台页面源码中 __VIEWSTATE = tree.xpath(//*[@id="__VIEWSTATE"]/@value)[0] __VIEWSTATEGENERATOR = tree.xpath(//*[@id="__VIEWSTATEGENERATOR"]/@value)[0] code_text = tranformImgData(./code.jpg,1004) login_url = "https://so.gushiwen.org/user/login.aspx?from=http%3a%2f%2fso.gushiwen.org%2fuser%2fcollect.aspx" data = { "__VIEWSTATE": __VIEWSTATE, "__VIEWSTATEGENERATOR": __VIEWSTATEGENERATOR, "from": "http://so.gushiwen.org/user/collect.aspx", "email": "15879478962", "pwd": "15879478962", "code": code_text, "denglu": "登录", } page_text = s.post(url=login_url,headers=headers,data=data).text with open("login.html","w",encoding="utf-8") as fp: fp.write(page_text)

requests模块的高级用法

标签:list   lan   name   hashlib   request   ref   class   图片   utf8   

原文地址:https://www.cnblogs.com/zangyue/p/12180488.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!