码迷,mamicode.com
首页 > 其他好文 > 详细

股票数据爬虫

时间:2020-05-31 15:56:18      阅读:65      评论:0      收藏:0      [点我收藏+]

标签:mon   lte   写入   ast   获取   pre   ext   info   ade   

 

技术图片

 

 

 老虎社区

 ‘https://www.laohu8.com/stock/‘

技术图片

百度股票不行了

技术图片

 

 

 

 

import requests
import re
from bs4 import  BeautifulSoup
import collections
import traceback
def getHtmlText(url):
    try:
      kv = {user-agent:Mozilla/5.0}
      r = requests.get(url,headers = kv)
      # print(r.status_code)
      r.raise_for_status()
      #print(r.apparent_encoding) GB2312  
      #r.encoding = r.apparent_encoding() GB2312 无法获取信息
      r.encoding = utf-8#这步必须要
      return r.text
    except:
      return ""


def getstocklist(list,stock_url):
     html = getHtmlText(stock_url)
     soup = BeautifulSoup(html,html.parser)
     a = soup.find_all(a)
     for  i in a:
         try:
            href = i.attrs[href]
            list.append(re.findall(r"\d{6}",href)[0]) #查找股票代码
         except:
               continue
     print(len(list))    

def getstockinfo(list,stock_url,path):
    cnt  = 0
    for stock in list:
        url = stock_url+stock
        html = getHtmlText(url)
        try:
            if html == ‘‘:
                 continue
            infodict = collections.OrderedDict()#为了后面按照插入顺序写入文件
            soup = BeautifulSoup(html,html.parser)
            stock_name =  soup.find_all(h1,attrs = {class:name})[0]
            
            name = stock_name.text.split()[0]
            infodict[股票名称]  = name

            stockinfo = soup.find(div,attrs = {class:detail-data})
            key_list = stockinfo.find_all(dt)
            value_list = stockinfo.find_all(dd)
            for  i in  range(len(key_list)):
                key = key_list[i].text
                value = value_list[i].text
                infodict[key] = value
            
            with open(path,a,encoding=utf-8) as f:#‘a‘:新的内容会加到已有内容的后面
                f.write(str(infodict)+\n)
                cnt = cnt+1
                print(\r当前进度:{:.2f}%.format(cnt*100/len(list)),end=‘‘)#\r 表示将光标的位置回退到本行的开头位置
        except:
               cnt  = cnt  +1
               print(\r当前进度:{:.2f}%.format(cnt*100/len(list)),end=‘‘)
               continue
      
def main():
     stock_list_url = http://quote.eastmoney.com/stock_list.html
     stock_info_url = https://www.laohu8.com/stock/
     output_file = laohu_stock.txt
     list = []
     getstocklist(list,stock_list_url)
     getstockinfo(list,stock_info_url,output_file)

main()

getstockinfo():

技术图片

getstockinfo

 

 

技术图片

 

 laohu_stock.txt  部分截图

技术图片

 

股票数据爬虫

标签:mon   lte   写入   ast   获取   pre   ext   info   ade   

原文地址:https://www.cnblogs.com/tingtin/p/13018966.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!