码迷,mamicode.com
首页 > 编程语言 > 详细

python-scrapy-中间件的学习

时间:2021-01-14 11:23:09      阅读:0      评论:0      收藏:0      [点我收藏+]

标签:对象   代理   爬虫   class   nal   font   cookie   pre   method   

middlewares.py


class MiddlewareDownloaderMiddleware:

@classmethod
def from_crawler(cls, crawler):
# This method is used by Scrapy to create your spiders.
s = cls()
crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
return s

def process_request(self, request, spider):
# spider就是爬虫类的实例化对象
# spider.name
# 拦截所有的请求对象,包括正常与不正常
# 参数:request就是请求到的对象
# 获取或者修改请求头信息
# request.headers[‘Cookie‘] = ‘xxx‘
print(‘i am process_request ‘)
return None

def process_response(self, request, response, spider):
# 拦截所有的响应对象
# 参数:response就是响应对象
print(‘i am process_response ‘)
return response

def process_exception(self, request, exception, spider):
# 拦截发生异常的请求对象
# 需要对异常的请求进行修正,然后将其重新发送即可
print(‘i am process_exception ‘)
# 代理操作
# request.meta[‘proxy‘] = ‘https://ip:port‘
return request

settings.py 开启中间件
DOWNLOADER_MIDDLEWARES = {
‘middleware.middlewares.MiddlewareDownloaderMiddleware‘: 543,
}

python-scrapy-中间件的学习

标签:对象   代理   爬虫   class   nal   font   cookie   pre   method   

原文地址:https://www.cnblogs.com/shiyi525/p/14274418.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!