码迷,mamicode.com
首页 > 其他好文 > 详细

scrapy-middlewares

时间:2019-04-27 19:49:24      阅读:124      评论:0      收藏:0      [点我收藏+]

标签:spider   rap   ade   mac   处理异常   nload   mac os x   scrapy   spi   

1.scrapy中间件 下载中间件Downloader Middlewares

下载器中间件处理请求与响应,对应两个方法

process_request(self, request, spider)

  每个request通过下载器中间件时,该方法被调用

process_response(self, requst, response, spider)

  当下载器完成http请求,传递响应给引擎的时候调用

process_exception(self, request, exception, spider)

  处理异常,比如代理ip不可用

通过下载器中间件请求的时候随机选择user agent 

需要在settings开启 DOWNLOADER_MIDDLEWARES

import random


class RandomUAMiddleware:
    def process_request(self, request, spider):
        # 随机选择ua
        ua = random.choice(spider.settings.get(USER_AGENT_LIST))
        request.headers["User-Agent"] = ua


class CheckUserAgent:
    def process_response(self, request, response, spider):
        # 打印ua
        print(request.headers[User-Agent])
        return response
class ProxyMiddleware:
def process_request(self, request, spider):
# 添加代理(选择随机代理)
request.meta["proxy"] = "http://124.115.126.76:808" # http协议 + ip + 端口
settings.py
USER_AGENT_LIST = [
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
"Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
]

 

 

2.开发代理中间件

scrapy-middlewares

标签:spider   rap   ade   mac   处理异常   nload   mac os x   scrapy   spi   

原文地址:https://www.cnblogs.com/tangpg/p/10779776.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!