码迷,mamicode.com
首页 > 其他好文 > 详细

Flask开发系列之Flask+redis实现IP代理池

时间:2019-06-11 13:17:50      阅读:258      评论:0      收藏:0      [点我收藏+]

标签:inf   rate   rem   dex   int   rom   import   route   pytho   

Flask开发系列之Flask+redis实现IP代理池

 

6.11-6.15号完善...

简易实现版

import requests
import re
import time
import redis
from bloom_filter import BloomFilter
import ast

pool = redis.ConnectionPool(host=localhost,password=xxx, port=6379, decode_responses=True)
r = redis.Redis(connection_pool=pool)
bloombloom = BloomFilter(max_elements=10000, error_rate=0.1)
bloombloom.add(str({http: 117.91.232.53:9999}))


def get_ip(i):
    ip_list=[]
    url = https://www.kuaidaili.com/free/inha/
    url = url + str(i + 1)
    html = requests.get(url=url, ).text
    regip = <td.*?>(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})</td>.*?<td.*?>(\d{1,5})</td>
    matcher = re.compile(regip, re.S)
    ipstr = re.findall(matcher, html)
    time.sleep(1)
    for j in ipstr:
        ip_list.append(j[0] + : + j[1])
    print(共收集到%d个代理ip % len(ip_list))
    print(ip_list)
    return ip_list



def valVer(proxys):
    global badNum,goodNum,good_list
    good = []
    for proxy in proxys:
        try:
            proxy_host = proxy
            protocol = https if https in proxy_host else http
            proxies = {protocol: proxy_host}
            print(现在正在测试的IP:, proxies)
            response = requests.get(http://www.baidu.com, proxies=proxies, timeout=2)
            if response.status_code != 200:
                badNum += 1
                print(proxy_host, bad proxy)
            else:
                goodNum += 1
                good.append(proxies)
                good_list.append(proxies)
                print(proxy_host, success proxy)
        except Exception as e:
            print(e)
            # print proxy_host, ‘bad proxy‘
            badNum += 1
            continue
    print(success proxy num : , goodNum)
    print(bad proxy num : , badNum)
    print("这次:",good)
    print("此时全部:",good_list)
    return good


def time_valVer(proxys):
    good = []
    for proxy in proxys:
        try:
            print(现在正在定时测试的IP:,proxy)
            proxy = ast.literal_eval(proxy)
            response = requests.get(http://www.baidu.com, proxies=proxy, timeout=2)
            if response.status_code != 200:
                r.lrem("ip_list", proxy, 1)
                print(proxy, bad proxy)
            else:
                good.append(proxy)
                good_list.append(proxy)
                print(proxy, success proxy)
        except Exception as e:
            print(e)
            continue

def stone(good):
    for IP in good:
        if str(IP) in bloombloom:
            print("%s不能存储,有相同的IP",IP)
            continue
        else:
            print("存储的IP:", IP)
            bloombloom.add(str(IP))
            r.rpush("ip_list", str(IP))

if __name__ == __main__:

    badNum = 0
    goodNum = 0
    good_list = []
    for i in range(0,10):
        if i%10 == 0 and i!=0:
            proxy_list = []
            for i in range(0, r.llen("ip_list")):
                proxy_list.append(r.lindex("ip_list", i))
            time_valVer(proxy_list)
        else:
            ip_list = get_ip(i)
            good = valVer(ip_list)
            stone(good)

 

 

from flask import Flask
import redis   # 导入redis模块,通过python操作redis 也可以直接在redis主机的服务端操作缓存数据库


r = redis.Redis(host=localhost, port=6379,password=xxx,decode_responses=True)
app = Flask(__name__)
@app.route(/ip/<int:index>)
def reponse(index):
    print(index)
    print(r.lindex("ip_list", index))
    return r.lindex("ip_list", index)
if __name__ == __main__:
    app.run(debug=True)

 

 获取ip:

技术图片

 

Flask开发系列之Flask+redis实现IP代理池

标签:inf   rate   rem   dex   int   rom   import   route   pytho   

原文地址:https://www.cnblogs.com/-wenli/p/11002902.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!