文章詳情頁(yè)

python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)

瀏覽：63日期：2022-07-02 08:14:28

requests庫(kù)

利用pip安裝:pip install requests

基本請(qǐng)求

req = requests.get('https://www.baidu.com/')req = requests.post('https://www.baidu.com/')req = requests.put('https://www.baidu.com/')req = requests.delete('https://www.baidu.com/')req = requests.head('https://www.baidu.com/')req = requests.options(https://www.baidu.com/)1.get請(qǐng)求

參數(shù)是字典，我們可以傳遞json類(lèi)型的參數(shù)：

import requestsfrom fake_useragent import UserAgent#請(qǐng)求頭部庫(kù)headers = {'User-Agent':UserAgent().random}#獲取一個(gè)隨機(jī)的請(qǐng)求頭url = 'https://www.baidu.com/s'#網(wǎng)址params={ 'wd':'豆瓣' #網(wǎng)址的后綴}requests.get(url,headers=headers,params=params)

python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)

返回了狀態(tài)碼，所以我們要想獲取內(nèi)容，需要將其轉(zhuǎn)成text：

#get請(qǐng)求headers = {'User-Agent':UserAgent().random}url = 'https://www.baidu.com/s'params={ 'wd':'豆瓣'}response = requests.get(url,headers=headers,params=params)response.text2.post 請(qǐng)求

參數(shù)也是字典，也可以傳遞json類(lèi)型的參數(shù)：

import requests from fake_useragent import UserAgentheaders = {'User-Agent':UserAgent().random}url = 'https://www.baidu.cn/index/login/login' #登錄賬號(hào)密碼的網(wǎng)址params = { 'user':'1351351335',#賬號(hào) 'password':'123456'#密碼}response = requests.post(url,headers=headers,data=params)response.text

python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)

因?yàn)檫@里需要一個(gè)登錄的網(wǎng)頁(yè)，我這里就隨便用了一個(gè)，沒(méi)有登錄，所以顯示的結(jié)果是這樣的，如果想要測(cè)試登錄的效果，請(qǐng)找一個(gè)登錄的頁(yè)面去嘗試一下。

3.IP代理

采集時(shí)為避免被封IP，經(jīng)常會(huì)使用代理，requests也有相應(yīng) 的proxies屬性。

#IP代理import requests from fake_useragent import UserAgentheaders = {'User-Agent':UserAgent().random}url = 'http://httpbin.org/get' #返回當(dāng)前IP的網(wǎng)址proxies = { 'http':'http://yonghuming:[email protected]:8088'#http://用戶(hù)名:密碼@IP:端口號(hào) #'http':'https://182.145.31.211:4224'# 或者IP：端口號(hào)}requests.get(url,headers=headers,proxies=proxies)

代理IP可以去：快代理去找，也可以去購(gòu)買(mǎi)。http://httpbin.org/get。這個(gè)網(wǎng)址是查看你現(xiàn)在的信息：

python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)

4.設(shè)置訪(fǎng)問(wèn)超時(shí)時(shí)間

可以通過(guò)timeout屬性設(shè)置超時(shí)時(shí)間，一旦超過(guò)這個(gè)時(shí)間還沒(méi)獲取到響應(yīng)內(nèi)容，就會(huì)提示錯(cuò)誤。

#設(shè)置訪(fǎng)問(wèn)時(shí)間requests.get('http://baidu.com/',timeout=0.1)

python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)

5.證書(shū)問(wèn)題(SSLError:HTTP)

ssl驗(yàn)證。

import requests from fake_useragent import UserAgent #請(qǐng)求頭部庫(kù)url = 'https://www.12306.cn/index/' #需要證書(shū)的網(wǎng)頁(yè)地址headers = {'User-Agent':UserAgent().random}#獲取一個(gè)隨機(jī)請(qǐng)求頭requests.packages.urllib3.disable_warnings()#禁用安全警告response = requests.get(url,verify=False,headers=headers)response.encoding = 'utf-8' #用來(lái)顯示中文，進(jìn)行轉(zhuǎn)碼response.text

python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)

6.session自動(dòng)保存cookies

import requestsfrom fake_useragent import UserAgentheaders = {'User-Agent':UserAgent().chrome}login_url = 'https://www.baidu.cn/index/login/login' #需要登錄的網(wǎng)頁(yè)地址params = { 'user':'yonghuming',#用戶(hù)名 'password':'123456'#密碼}session = requests.Session() #用來(lái)保存cookie#直接用session 歹意requests response = session.post(login_url,headers=headers,data=params)info_url = 'https://www.baidu.cn/index/user.html' #登錄完賬號(hào)密碼以后的網(wǎng)頁(yè)地址resp = session.get(info_url,headers=headers)resp.text

因?yàn)槲疫@里沒(méi)有使用需要賬號(hào)密碼的網(wǎng)頁(yè)，所以顯示這樣：

python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)

我獲取了一個(gè)智慧樹(shù)的網(wǎng)頁(yè)

#cookie import requestsfrom fake_useragent import UserAgentheaders = {'User-Agent':UserAgent().chrome}login_url = 'https://passport.zhihuishu.com/login?service=https://onlineservice.zhihuishu.com/login/gologin' #需要登錄的網(wǎng)頁(yè)地址params = { 'user':'12121212',#用戶(hù)名 'password':'123456'#密碼}session = requests.Session() #用來(lái)保存cookie#直接用session 歹意requests response = session.post(login_url,headers=headers,data=params)info_url = 'https://onlne5.zhhuishu.com/onlinWeb.html#/stdetInex' #登錄完賬號(hào)密碼以后的網(wǎng)頁(yè)地址resp = session.get(info_url,headers=headers)resp.encoding = 'utf-8'resp.text

python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)

7.獲取響應(yīng)信息

代碼含義 resp.json() 獲取響應(yīng)內(nèi)容（以json字符串） resp.text 獲取相應(yīng)內(nèi)容（以字符串） resp.content 獲取響應(yīng)內(nèi)容（以字節(jié)的方式） resp.headers 獲取響應(yīng)頭內(nèi)容 resp.url 獲取訪(fǎng)問(wèn)地址 resp.encoding 獲取網(wǎng)頁(yè)編碼 resp.request.headers 請(qǐng)求頭內(nèi)容 resp.cookie 獲取cookie

到此這篇關(guān)于python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)的文章就介紹到這了,更多相關(guān)python爬蟲(chóng)requests庫(kù)用法內(nèi)容請(qǐng)搜索好吧啦網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持好吧啦網(wǎng)！

Python 編程

上一條：Python 使用SFTP和FTP實(shí)現(xiàn)對(duì)服務(wù)器的文件下載功能下一條：python使用smtplib模塊發(fā)送郵件

相關(guān)文章：

1. python 如何在 Matplotlib 中繪制垂直線(xiàn)2. bootstrap select2 動(dòng)態(tài)從后臺(tái)Ajax動(dòng)態(tài)獲取數(shù)據(jù)的代碼3. ASP常用日期格式化函數(shù) FormatDate()4. python中@contextmanager實(shí)例用法5. html中的form不提交（排除）某些input 原創(chuàng)6. CSS3中Transition屬性詳解以及示例分享7. js select支持手動(dòng)輸入功能實(shí)現(xiàn)代碼8. 如何通過(guò)python實(shí)現(xiàn)IOU計(jì)算代碼實(shí)例9. 開(kāi)發(fā)效率翻倍的Web API使用技巧10. vue使用moment如何將時(shí)間戳轉(zhuǎn)為標(biāo)準(zhǔn)日期時(shí)間格式

排行榜

					
					Python數(shù)據(jù)相關(guān)系數(shù)矩陣和熱力圖輕松實(shí)現(xiàn)教程
如何在PHP中讀寫(xiě)文件
vue-drag-chart 拖動(dòng)/縮放圖表組件的實(shí)例代碼
PHP正則表達(dá)式函數(shù)preg_replace用法實(shí)例分析
如何使用repr調(diào)試python程序
php redis setnx分布式鎖簡(jiǎn)單原理解析
Java xml數(shù)據(jù)格式返回實(shí)現(xiàn)操作
Spring @Primary和@Qualifier注解原理解析
Django使用channels + websocket打造在線(xiàn)聊天室
Spring Boot 功能整合的實(shí)現(xiàn)
一個(gè) 2 年 Android 開(kāi)發(fā)者的 18 條忠告
				

亚洲精品久久久中文字幕-亚洲精品久久片久久-亚洲精品久久青草-亚洲精品久久婷婷爱久久婷婷-亚洲精品久久午夜香蕉

python爬蟲(chóng)利器之requests庫(kù)的用法(超全面的爬取網(wǎng)頁(yè)案例)