深圳建设培训中心网站,百度网页提交入口,怎么申请免费的网站,专门做化妆品平台的网站有哪些华子目录 Requests介绍安装requests模块常用方法常用属性实例引入各种请求方式基于get请求带参数的get请求推荐写法 基于post请求添加headers信息content获取二进制数据bytes类型获取json数据第一种方式第二种方式 response响应状态码判断 高级操作会话维持通过cookie维持会话通… 华子目录 Requests介绍安装requests模块常用方法常用属性实例引入各种请求方式基于get请求带参数的get请求推荐写法 基于post请求添加headers信息content获取二进制数据bytes类型获取json数据第一种方式第二种方式 response响应状态码判断 高级操作会话维持通过cookie维持会话通过session维持会话 代理设置超时设置异常处理 Requests介绍
作用发送网络请求获得响应数据官方文档https://requests.readthedocs.io/zh_CN/latest/index.htmlRequests是用python语言基于urllib编写的采用的是Apache2 Licensed开源协议的http库它比urllib更加方便可以节约大量的工作完全满足http测试需求的库。
安装requests模块
输入cmd打开命令行模式输入
windows操作系统pip install requests
Linux操作系统sodo pip install requests常用方法 其中最常用的方法是get和post方法分别用于发送get请求和post请求返回响应体对象响应源码响应状态码响应url
常用属性 实例引入
import requests
# https://www.baidu.com/
response requests.get(https://www.baidu.com/)
print(response) # 响应体对象响应源码响应状态码响应url
print(response.text) # 响应体内容
print(type(response.text)) # 响应体内容类型为str
print(response.status_code) # 响应状态码
print(response.url) # 查看响应方的urlResponse [200]
!DOCTYPE html
!--STATUS OK--html headmeta http-equivcontent-type contenttext/html;charsetutf-8meta http-equivX-UA-Compatible contentIEEdgemeta contentalways namereferrerlink relstylesheet typetext/css hrefhttps://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.csstitleç™¾åº¦ä¸€ä¸‹ï¼Œä½ å°±çŸ¥é“/title/head body link#0000cc div idwrapper div idhead div classhead_wrapper div classs_form div classs_form_wrapper div idlg img hidefocustrue src//www.baidu.com/img/bd_logo1.png width270 height129 /div form idform namef action//www.baidu.com/s classfm input typehidden namebdorz_come value1 input typehidden nameie valueutf-8 input typehidden namef value8 input typehidden namersv_bp value1 input typehidden namersv_idx value1 input typehidden nametn valuebaiduspan classbg s_ipt_wrinput idkw namewd classs_ipt value maxlength255 autocompleteoff autofocusautofocus/spanspan classbg s_btn_wrinput typesubmit idsu value百度一下 classbg s_btn autofocus/span /form /div /div div idu1 a hrefhttp://news.baidu.com nametj_trnews classmnavæ–°é—»/a a hrefhttps://www.hao123.com nametj_trhao123 classmnavhao123/a a hrefhttp://map.baidu.com nametj_trmap classmnav地图/a a hrefhttp://v.baidu.com nametj_trvideo classmnav视频/a a hrefhttp://tieba.baidu.com nametj_trtieba classmnavè´´å§/a noscript a hrefhttp://www.baidu.com/bdorz/login.gif?loginamp;tplmnamp;uhttp%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 nametj_login classlb登录/a /noscript scriptdocument.write(a hrefhttp://www.baidu.com/bdorz/login.gif?logintplmnu encodeURIComponent(window.location.href (window.location.search ? ? : ) bdorz_come1) nametj_login classlb登录/a);/script a href//www.baidu.com/more/ nametj_briicon classbri styledisplay: block;更多产å“/a /div /div /div div idftCon div idftConw p idlh a hrefhttp://home.baidu.comå
³äºŽç™¾åº¦/a a hrefhttp://ir.baidu.comAbout Baidu/a /p p idcpcopy;2017nbsp;Baidunbsp;a hrefhttp://www.baidu.com/duty/使用百度å‰å¿
读/anbsp; a hrefhttp://jianyi.baidu.com/ classcp-feedbackæ„è§å馈/anbsp;京ICPè¯030173å·nbsp; img src//www.baidu.com/img/gs.gif /p /div /div /div /body /htmlclass str
200
https://www.baidu.com/各种请求方式
import requests
url http://httpbin.org/put
print(requests.get(url))
print(requests.post(url))
print(requests.put(url))
print(requests.delete(url))
print(requests.head(url))
print(requests.options(url))基于get请求
import requests
url http://httpbin.org/get # 目标站点
re requests.get(url)
print(re.status_code)
print(re.text)
print(type(re.text))200
{args: {}, headers: {Accept: */*, Accept-Encoding: gzip, deflate, Host: httpbin.org, User-Agent: python-requests/2.31.0, X-Amzn-Trace-Id: Root1-6550ee3e-1138be3d1596f4b820f87a82}, origin: 111.18.40.246, url: http://httpbin.org/get
}class str带参数的get请求
import requests
url http://httpbin.org/get?age21namehuazi # 目标站点
re requests.get(url)
print(re.status_code)
print(re.text)
print(type(re.text))200
{args: {age: 21, name: huazi}, headers: {Accept: */*, Accept-Encoding: gzip, deflate, Host: httpbin.org, User-Agent: python-requests/2.31.0, X-Amzn-Trace-Id: Root1-6550eff0-11976ad80c73c287054a519e}, origin: 111.18.40.239, url: http://httpbin.org/get?age21namehuazi
}class str推荐写法
把参数单独构建在字典里
import requests
param {name:huazi,age:10
}
url http://httpbin.org/get?age21namehuazi # 目标站点
re requests.get(url,paramsparam) # params携带get的参数
print(re.status_code)
print(re.text)
print(type(re.text))200
{args: {age: [21, 10], name: [huazi, huazi]}, headers: {Accept: */*, Accept-Encoding: gzip, deflate, Host: httpbin.org, User-Agent: python-requests/2.31.0, X-Amzn-Trace-Id: Root1-6550f2a3-7e41a0ad12af5b99601cefda}, origin: 111.18.40.234, url: http://httpbin.org/get?age21namehuazinamehuaziage10
}class str基于post请求
import requests
url http://httpbin.org/post
d {age:10,name:huazi
}
re requests.post(url, datad) # data携带post请求的参数
print(re.status_code)
print(re.url)
print(re.text)200
http://httpbin.org/post
{args: {}, data: , files: {}, form: {age: 10, name: huazi}, headers: {Accept: */*, Accept-Encoding: gzip, deflate, Content-Length: 17, Content-Type: application/x-www-form-urlencoded, Host: httpbin.org, User-Agent: python-requests/2.31.0, X-Amzn-Trace-Id: Root1-6550f5eb-73f133fb497a4aca38ae755c}, json: null, origin: 111.18.40.243, url: http://httpbin.org/post
}添加headers信息
浏览器用户身份的标识缺少的话服务器会认为你不是一个正常的浏览器用户而是一个爬虫程序。
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36import requests# 将参数name和age定义到字典params中
params {name: tony,age: 20
}
url http://httpbin.org/get# 定义HTTP头信息cookie,UA和referer
headers {User-agent: Mozilla/5.0 (Linux; Android 8.1.0; SM-P585Y) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36,referer: https://www.abidu.com,Cookies: 1234565678
}# 发送请求参数
res requests.get(url url,params params,headers headers) # headers携带伪装参数# 输出返回对象的文本结果
print(res.text){args: {age: 20, name: tony}, headers: {Accept: */*, Accept-Encoding: gzip, deflate, Cookies: 1234565678, Host: httpbin.org, Referer: https://www.abidu.com, User-Agent: Mozilla/5.0 (Linux; Android 8.1.0; SM-P585Y) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36, X-Amzn-Trace-Id: Root1-6550fcb0-7316ea826ef4c4664b0c1dff}, origin: 111.18.40.215, url: http://httpbin.org/get?nametonyage20
}content获取二进制数据
import requests
# 目标站点百度logo图片https://www.baidu.com/img/baidu_igylogo3.gif
url https://www.baidu.com/img/baidu_jgylogo3.gif
re requests.get(url)
print(re.text)我们可以看到结果是一堆乱码
(ɨt{,w|
BZaK7|MPh
%n8FN:F|V1~wyr 9khlOj!s\m\AZPQ~yXRż WEz85
.Da,L
vٱ#Uamf*L03]x\y2)JhiHtHKDK ;这是我们就要用到response.content获取二进制数据
import requests
# 目标站点百度logo图片https://www.baidu.com/img/baidu_igylogo3.gif
url https://www.baidu.com/img/baidu_jgylogo3.gif
re requests.get(url)
print(re.content) # content:获取二进制数据
with open(./baidu.png, wb)as f: # 在当前同级目录中创建baidu.png照片f.write(re.content)bytes类型
bytes类型是指一推字节的集合在python中以b开头的字符串都是bytes类型bytes类型的作用1.在python中数据转成二进制后不是直接以010101的形式表示的而是用一种叫bytes(字节)的类型来表示2.计算机只能存储二进制数据我们的字符图片视频音乐等想存到硬盘上也必须以正确的方式编码成二进制后再存储。3.记住一句话再python中字符串必须编码成bytes后才能存到硬盘上。
获取json数据
第一种方式
使用json自带的函数json.loads()反序列化将…转为…对象(dict,list,tuple,set)
import requests
import jsonurl http://httpbin.org/get
re requests.get(url)
a re.text # 返回json数据
# 利用内置模块json
print(a)
dict_data json.loads(a) # str 转为dict
print(dict_data)
print(type(dict_data)) # 为字典类型的数据
res dict_data[url]
print(res)
response dict_data[headers][Host]
print(response)第二种方式
使用response.json()方法将响应体对象转为字典对象
import requests
import jsonurl http://httpbin.org/get
re requests.get(url)
dict_data re.json() # 将响应体对象转为字典对象
print(dict_data)
print(type(dict_data))注为什么两种方法都是将json数据转为dict类型 因为dict类型的数据便于及进行提取
response响应
url https://www.jianshu.com
h {
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
}
re requests.get(url,headersh)
print(re.status_code) # 状态码
print(re.headers) # 查看响应体信息
print(re.url) # 查看url
print(re.history) # 查看网页是否跳转为[],则没有发生跳转200
{Date: Sun, 12 Nov 2023 17:21:03 GMT, Content-Type: text/html; charsetutf-8, Transfer-Encoding: chunked, Connection: keep-alive, Vary: Accept-Encoding, X-Frame-Options: SAMEORIGIN, X-XSS-Protection: 1; modeblock, X-Content-Type-Options: nosniff, ETag: W/41ecb3f916a6731629ac139b5e2cc204, Cache-Control: max-age0, private, must-revalidate, Set-Cookie: localezh-CN; path/, X-Request-Id: 4b3cc972-e9c3-4326-859d-d13ad5a7b556, X-Runtime: 0.003260, Strict-Transport-Security: max-age31536000; includeSubDomains; preload, Content-Encoding: gzip}
https://www.jianshu.com/
[]状态码判断
200 请求成功
301、302 请求发生跳转
404 页面没找到
500 502 503服务器内部错误100: (continue,),
101: (switching_protocols,),
102: (processing,),
103: (checkpoint,),
122: (uri_too_long, request_uri_too_long),
200: (ok, okay, all_ok, all_okay, all_good, \\o/, ✓),
201: (created,),
202: (accepted,),
203: (non_authoritative_info, non_authoritative_information),
204: (no_content,),
205: (reset_content, reset),
206: (partial_content, partial),
207: (multi_status, multiple_status, multi_stati, multiple_stati),
208: (already_reported,),
226: (im_used,),# Redirection.
300: (multiple_choices,),
301: (moved_permanently, moved, \\o-),
302: (found,),
303: (see_other, other),
304: (not_modified,),
305: (use_proxy,),
306: (switch_proxy,),
307: (temporary_redirect, temporary_moved, temporary),
308: (permanent_redirect,resume_incomplete, resume,), # These 2 to be removed in 3.0# Client Error.
400: (bad_request, bad),
401: (unauthorized,),
402: (payment_required, payment),
403: (forbidden,),
404: (not_found, -o-),
405: (method_not_allowed, not_allowed),
406: (not_acceptable,),
407: (proxy_authentication_required, proxy_auth, proxy_authentication),
408: (request_timeout, timeout),
409: (conflict,),
410: (gone,),
411: (length_required,),
412: (precondition_failed, precondition),
413: (request_entity_too_large,),
414: (request_uri_too_large,),
415: (unsupported_media_type, unsupported_media, media_type),
416: (requested_range_not_satisfiable, requested_range, range_not_satisfiable),
417: (expectation_failed,),
418: (im_a_teapot, teapot, i_am_a_teapot),
421: (misdirected_request,),
422: (unprocessable_entity, unprocessable),
423: (locked,),
424: (failed_dependency, dependency),
425: (unordered_collection, unordered),
426: (upgrade_required, upgrade),
428: (precondition_required, precondition),
429: (too_many_requests, too_many),
431: (header_fields_too_large, fields_too_large),
444: (no_response, none),
449: (retry_with, retry),
450: (blocked_by_windows_parental_controls, parental_controls),
451: (unavailable_for_legal_reasons, legal_reasons),
499: (client_closed_request,),# Server Error.
500: (internal_server_error, server_error, /o\\, ✗),
501: (not_implemented,),
502: (bad_gateway,),
503: (service_unavailable, unavailable),
504: (gateway_timeout,),
505: (http_version_not_supported, http_version),
506: (variant_also_negotiates,),
507: (insufficient_storage,),
509: (bandwidth_limit_exceeded, bandwidth),
510: (not_extended,),
511: (network_authentication_required, network_auth, network_authentication),高级操作
会话维持
通过cookie维持会话
通过session维持会话
代理设置
超时设置
异常处理