码迷,mamicode.com
首页 > 其他好文 > 详细

爬1 429

时间:2019-04-29 21:12:48      阅读:174      评论:0      收藏:0      [点我收藏+]

标签:编译   格式   cond   safari   getx   user   use   数据   list   

爬 爬 爬 -- 

两个软件 
Anaconda  内置 Jupyter   编译器
Fiddler4     一个代理软件
案例1  获取整个页面   搜狗首页
import requests
url = https://www.sogou.com/  # 1指定url
res = requests.get(url=url)     # 2 请求得到相应对象
# print(res.text)
page_text = res.text        # 3 text属性返回的是字符串形式的响应数据
with open(./sg.html,w,encoding=utf-8) as f:    #4 持久化数据
    f.write(page_text)
案例二   搜狗搜索的结果页面
# 搜索结果    UA检测认证 会报错 解决办法headers请求头里加User-Agent(浏览器标识)
import requests
url = https://www.sogou.com/web
wd = input(你要搜啥:)
param = {
    query:wd
}
res = requests.get(url=url,params=param)
# print(res.encoding)  # ISO-8859-1  查看响应的编码格式
res.encoding = utf-8 # 编码格式改变
page_text = res.text
name = wd + .html
with open(name,w,encoding=utf-8) as f:
    f.write(page_text)
    print(name,爬取结束!)
案例二  更新 添加 请求头 User-Agent 键值对
import requests
url = https://www.sogou.com/web
wd = input(你要搜啥:)
headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36}
param = {
    query:wd
}
res = requests.get(url=url,params=param,headers=headers)  # 参数,请求头 UA检测  反爬机制
# print(res.encoding)  # ISO-8859-1  查看响应的编码格式
res.encoding = utf-8 # 编码格式改变
page_text = res.text
name = wd + .html
with open(name,w,encoding=utf-8) as f:
    f.write(page_text)
    print(name,爬取结束!)
案例3 
# 获取 百度翻译的结果数据 
# 页面中有可能存在动态加载的数据
import requests
url = https://fanyi.baidu.com/sug
wd = input(enter a word: )
data = {kw:wd}
headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36}
res = requests.post(url=url,data=data,headers=headers) # post 请求
obj_json = res.json()  # json
for i in obj_json[data]:
    print(i[k], ,i[v])
案例4
#
豆瓣电影详情数据 # 页面中有些情况会包含动态加载的数据 鼠标滚轮下滑 数据持续加载 import requests url = https://movie.douban.com/j/chart/top_list headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36} param = { "type": "5", "interval_id": "100:90", "action":"", "start": "0", "limit": "50", } obj_json = requests.get(url=url,params=param,headers=headers).json() # get请求 params参数 # print(obj_json) print(len(obj_json))
案例五
#
药监局 化妆品公司数据 import requests post_url = http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsList headers = {User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36} all_data = [] IDs =[] for page in range(1,3): data = { "on": "true", "page": str(page), "pageSize": "15", "productName":"", "conditionType": "1", "applyname": "", "applysn": "", } # 首页ajax 请求返回的响应数据 json_obj = requests.post(url=post_url,data=data,headers=headers).json() for dic in json_obj["list"]: IDs.append(dic[ID]) print(len(IDs)) for id in IDs: detail_post_url = http://125.35.6.84:81/xk/itownet/portalAction.do?method=getXkzsById data = {id:id} detail_dic = requests.post(url=detail_post_url,data=data,headers=headers).json() all_data.append(detail_dic) print(all_data[0]) print(len(all_data))

 

爬1 429

标签:编译   格式   cond   safari   getx   user   use   数据   list   

原文地址:https://www.cnblogs.com/zhangchen-sx/p/10792461.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!