码迷,mamicode.com
首页 > 其他好文 > 详细

今日头条

时间:2018-08-16 23:43:18      阅读:1485      评论:0      收藏:0      [点我收藏+]

标签:webkit   图片   else   转化   iss   app   web   safari   data   

import requests
import re
import json
import os
from urllib import request
for i in range(0,60,20):
url = ‘https://www.toutiao.com/search_content/?offset={}&format=json&keyword=%E8%A1%97%E6%8B%8D&autoload=true&count=20&cur_tab=1&from=search_tab‘.format(i)
response = requests.get(url)

# 可以通过response.json 直接获取转化后的对象(dict)
html_json_dict = response.json()
#print(html_json_dict)

# 获取dict中的data key对应的列表
data_list = html_json_dict[‘data‘]

# 如果列表中的每一项,有article_url我们就取这个值
for data_item in data_list:
if ‘article_url‘ in data_item:
article_url = data_item[‘article_url‘]
headers = {
‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Safari/537.36 Core/1.63.5733.400 QQBrowser/10.2.2050.400‘
}
response = requests.get(article_url, headers=headers)
# 然后就是之前那段代码
html_str = response.text
pattern = r‘gallery: JSON\.parse\((.*)\),‘
match_res = re.search(pattern, html_str)

# 新建文件夹
if not os.path.exists(‘jiepaiss‘):
os.mkdir(‘jiepaiss‘)

if match_res:
# print(match_res.group(1))
# print(type(match_res.group(1)))
json_origin = match_res.group(1)
res_str = json.loads(json_origin)
# print(type(res_str))
res_dict = json.loads(res_str)
# print(type(res_dict))

sub_images_list = res_dict[‘sub_images‘]
for image in sub_images_list:
image_url = image[‘url‘]
filename = ‘jiepaiss/‘ + image_url.split(‘/‘)[-1] + ‘.jpg‘
print(filename)
# 下载图片
request.urlretrieve(image_url, filename)
else:

          print(‘你写错了, 不应该来我这‘)

今日头条

标签:webkit   图片   else   转化   iss   app   web   safari   data   

原文地址:https://www.cnblogs.com/huangming17/p/9490674.html

(1)
(1)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!