码迷,mamicode.com
首页 > 其他好文 > 详细

爬虫爬取全国历史天气数据

时间:2018-10-10 23:59:12      阅读:875      评论:0      收藏:0      [点我收藏+]

标签:exec   rom   pytho   函数   style   off   历史   int   get   

一段很简单的爬虫程序,爬取的网站为http://www.tianqihoubao.com,可以自己修改爬取城市以及爬取的月份,这里爬取的是1到7月的数据

from bs4 import BeautifulSoup
import requests
import pymysql
import warnings
# import pinyin
# from pinyin import PinYin
from pypinyin import pinyin, lazy_pinyin
import pypinyin
warnings.filterwarnings("ignore")

conn = pymysql.connect(host=localhost, user=root, passwd=root, db=test2, port=3306, charset=utf8)
cursor = conn.cursor()
def get_temperature(url,city):
    headers = {
        User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,  like Gecko) Chrome/63.0.3239.132 Safari/537.36}           # 设置头文件信息
    response = requests.get(url,  headers=headers).content    # 提交requests get 请求
    soup = BeautifulSoup(response,  "lxml")       # 用Beautifulsoup 进行解析

    conmid2 = soup.findAll(div,  class_=wdetail)
    # conmid2 = conmid.findAll(‘div‘,  class_=‘wdetail‘)

    for info in conmid2:
        tr_list = info.find_all(tr)[1:]       # 使用切片取到第三个tr标签
        for index,  tr in enumerate(tr_list):     # enumerate可以返回元素的位置及内容
            td_list = tr.find_all(td)
            # if index == 0:

            date = td_list[0].text.strip().replace("\n", "")  # 取每个标签的text信息,并使用replace()函数将换行符删除
            weather = td_list[1].text.strip().replace("\n", "").split("/")[0].strip()
            temperature = td_list[2].text.strip().replace("\n",  "").split("/")[0].strip()
            wind = td_list[3].text.strip().replace("\n",  "").split("/")[0].strip()

            # else:
            #     city_name = td_list[0].text.replace(‘\n‘,  ‘‘)
            #     weather = td_list[4].text.replace(‘\n‘,  ‘‘)
            #     wind = td_list[5].text.replace(‘\n‘,  ‘‘)
            #     max = td_list[3].text.replace(‘\n‘,  ‘‘)
            #     min = td_list[6].text.replace(‘\n‘,  ‘‘)

            print(city,date,  weather,  wind,  temperature)
            cursor.execute(insert into weather(city, date, weather, wind, temp) values(%s, %s, %s, %s, %s)
                           ,  (city,  date,  weather,  wind,  temperature ))
if __name__==__main__:

    # citys1= ["成都市","广元市","绵阳市","德阳市","南充市","广安市","遂宁市","内江市","乐山市","自贡市","泸州市","宜宾市","攀枝花市","巴中市","达州市","资阳市","眉山市","雅安市","崇州市","邛崃市","都江堰市","彭州市","江油市","什邡市","广汉市","绵竹市","阆中市","华蓥市","峨眉山市","万源市","简阳市","西昌市","康定市","马尔康市","隆昌市"]
    # citys1= ["郑州市","开封市","洛阳市","平顶山市","安阳市","鹤壁市","新乡市","焦作市","濮阳市","许昌市","漯河市","三门峡市","南阳市","商丘市","周口市","驻马店市","信阳市","荥阳市","新郑市","登封市","新密市","偃师市","孟州市","沁阳市","卫辉市","辉县市","林州市","禹州市","长葛市","舞钢市","义马市","灵宝市","项城市","巩义市","邓州市","永城市","汝州市","济源市"]
    # citys1= ["呼和浩特市","包头市","乌海市","赤峰市","通辽市","鄂尔多斯市","呼伦贝尔市","巴彦淖尔市","乌兰察布市","霍林郭勒市","满洲里市","牙克石市","扎兰屯市","额尔古纳市","根河市","丰镇市","乌兰浩特市","阿尔山市","二连浩特市","锡林浩特市"]
    # citys1= ["沈阳市","大连市","鞍山市","抚顺市","本溪市","丹东市","锦州市","营口市","阜新市","辽阳市","盘锦市","铁岭市","朝阳市","葫芦岛市","新民市","瓦房店市","庄河市","海城市","东港市","凤城市","凌海市","北镇市","盖州市","大石桥市","灯塔市","调兵山市","开原市","北票市","凌源市","兴城市"]
    # citys1= ["葫芦岛市","新民市","瓦房店市","庄河市","海城市","东港市","凤城市","凌海市","北镇市","盖州市","大石桥市","灯塔市","调兵山市","开原市","北票市","凌源市","兴城市"]
    citys1= ["开原市","北票市","凌源市","兴城市"]


    for city in citys1:
        city1 = ‘‘.join(lazy_pinyin(city[:-1]))
        print(city1)
        urls = [http://www.tianqihoubao.com/lishi/+city1+/month/201801.html,
                http://www.tianqihoubao.com/lishi/+city1+/month/201802.html,
                http://www.tianqihoubao.com/lishi/+city1+/month/201803.html,
                http://www.tianqihoubao.com/lishi/+city1+/month/201804.html,
                http://www.tianqihoubao.com/lishi/+city1+/month/201805.html,
                http://www.tianqihoubao.com/lishi/+city1+/month/201806.html,
                http://www.tianqihoubao.com/lishi/+city1+/month/201807.html]
        for url in urls:
            get_temperature(url, city)
        conn.commit()



 

 

 

爬虫爬取全国历史天气数据

标签:exec   rom   pytho   函数   style   off   历史   int   get   

原文地址:https://www.cnblogs.com/xixilili/p/9769535.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!