文章詳情頁

Python urllib request模塊發送請求實現過程解析

瀏覽：2日期：2022-07-02 16:08:07

1.Request()的參數

import urllib.request

request=urllib.request.Request(’https://python.org’)response=urllib.request.urlopen(request)print(response.read().decode(’utf-8’))

通過構造這個數據結構，一方面可以我們可以將請求獨立成一個對象，另一方面可以更加豐富和靈活地配置參數。

它的構造方法如下：

class.urllib.request.Request(url,data=None,headers={},origin_rep_host=None,unverifiable=False,method=None)

參數：

1.url必傳參數

2.data，必須傳bytes類型。如果是字典，先使用urllib.parse里的urlencode()

3.headers，是一個字典，請求頭，直接構造或者用add_header()方法添加

4.origin_rep_host，請求方的名稱或者ip地址

5.unverifiable，默認為false，表示這個請求是否無法驗證。如果沒有抓取的權限，此時值就是true。

6.method，用來指示請求使用的方法。

嘗試傳入多個參數構建請求：

from urllib import request,parseurl=’http://httpbin.org/post’headers={ ’Url-Agent’:’Mozilla/4.0(compatible;MSIE 5.5;Windows NT)’, ’Host’:’httpbin.org’}#也可以使用add_header()方法添加headers：#req=request.Request(url=url,data=data,method=’POST’)#req.add_header(’User-Agent’,’Mozilla/4.0(compatible;MSIE 5.5;Windows NT)’)dict={ ’name’:’Germey’}data=bytes(parse.urlencode(dict),encoding=’utf-8’)#用urlencode()將dict轉換成bytes類型，傳遞給datareq=request.Request(url=url,data=data,headers=headers,method=’POST’)response=request.urlopen(req)print(response.read().decode(’utf-8’))

運行結果：

Python urllib request模塊發送請求實現過程解析

2.Handler與Opener

Handler：

它是各種處理器，幾乎可以做到HTTP請求中的所有事情。

urllib.request模塊里的BaseHandler類，它是所有其他Headler的父類，它提供了最基本的方法。

Opener：

例如urlopen()就是一個Opener，它是urllib為我們提供的。

它們的關系是：使用Handler來構建Opener。

3.用法

驗證：

創建一個需要驗證的網站，我這里使用的是IIS

Python urllib request模塊發送請求實現過程解析

遇到的問題：

IIS怎樣安裝與配置-百度經驗 (baidu.com)

IIS網站如何設置基本身份驗證-百度經驗 (baidu.com)

window10家庭版解決IIS中萬維網服務的安全性中無Windows身份驗證 - enjoryWeb - 博客園 (cnblogs.com)

代碼：

from urllib.request import HTTPPasswordMgrWithDefaultRealm,HTTPBasicAuthHandler,build_openerfrom urllib.error import URLErrorusername=’username’#填上自己的用戶名和密碼password=’password’url=’http://localhost:5000/’p=HTTPPasswordMgrWithDefaultRealm()p.add_password(None,url,username,password)#添加用戶名和密碼，建立了一個處理驗證的Handlerauth_handler=HTTPBasicAuthHandler(p)#基本認證opener=build_opener(auth_handler)#利用Handler構建一個Openertry: result=opener.open(url)#打開鏈接 html=result.read().decode(’utf-8’) print(html)#結果打印html源碼內容except URLError as e: print(e.reason)

代理：

添加代理，在本地搭建一個代理，運行在9743端口上。

代碼：

from urllib.request import ProxyHandler,build_openerfrom urllib.error import URLErrorproxy_handler=ProxyHandler({ ’http’:’http://127.0.0.1:9743’, ’https’:’https://127.0.0.1:9743’})#構建一個Handleropener=build_opener(proxy_handler)#構建一個Openertry: response=opener.open(’https://www.baidu.com’) print(response.read().decode(’utf-8’))except URLError as e: print(e.reason)

Cookies：

將網站的Cookies獲取下來：

代碼：

import http.cookiejar,urllib.requestcookie=http.cookiejar.CookieJar()#聲明一個CookieJar對象handler=urllib.request.HTTPCookieProcessor(cookie)#構建一個Handleropener=urllib.request.build_opener(handler)#構建一個Openerresponse=opener.open(’http://www.baidu.com’)for item in cookie: print(item.name+'='+item.value)

運行結果：

Python urllib request模塊發送請求實現過程解析

將Cookie輸出成文件格式：

代碼：

import http.cookiejar,urllib.request

filename=’cookies.txt’

cookie=http.cookiejar.MozillaCookieJar(filename)#MozillaCookieJar()生成文件時用到，用來處理Cookie和文件相關的事件#如果要保存LWP格式的Cookies文件，可以改為：#cookie=http.cookiejar.LWPCookieJar(filename)

handler=urllib.request.HTTPCookieProcessor(cookie)opener=urllib.request.build_opener(handler)response=opener.open(’http://www.baidu.com’)cookie.save(ignore_discard=True,ignore_expires=True)

運行結果：

# Netscape HTTP Cookie File# http://curl.haxx.se/rfc/cookie_spec.html# This is a generated file! Do not edit..baidu.com TRUE / FALSE 1638359640 BAIDUID 9BB1BA4FDD840EBD956A3D2EFB6BF883:FG=1.baidu.com TRUE / FALSE 3754307287 BIDUPSID 9BB1BA4FDD840EBD25D00EE8183D1125.baidu.com TRUE / FALSE H_PS_PSSID 1445_33119_33059_31660_33099_33101_26350_33199.baidu.com TRUE / FALSE 3754307287 PSTM 1606823639www.baidu.com FALSE / FALSE BDSVRTM 7www.baidu.com FALSE / FALSE BD_HOME 1

LWP格式：

#LWP-Cookies-2.0Set-Cookie3: BAIDUID='DDF5CB401A1543ED614CE42962D48099:FG=1'; path='/'; domain='.baidu.com'; path_spec; domain_dot; expires='2021-12-01 12:04:18Z'; comment=bd; version=0Set-Cookie3: BIDUPSID=DDF5CB401A1543ED00860C3997C3282C; path='/'; domain='.baidu.com'; path_spec; domain_dot; expires='2088-12-19 15:18:25Z'; version=0Set-Cookie3: H_PS_PSSID=1430_33058_31254_33098_33101_33199; path='/'; domain='.baidu.com'; path_spec; domain_dot; discard; version=0Set-Cookie3: PSTM=1606824257; path='/'; domain='.baidu.com'; path_spec; domain_dot; expires='2088-12-19 15:18:25Z'; version=0Set-Cookie3: BDSVRTM=0; path='/'; domain='www.baidu.com'; path_spec; discard; version=0Set-Cookie3: BD_HOME=1; path='/'; domain='www.baidu.com'; path_spec; discard; version=0

以LWP格式的文件為示例，展示讀取和利用的方法：

代碼：

import http.cookiejar,urllib.request

cookie=http.cookiejar.LWPCookieJar()#如果文件保存為Mozilla型瀏覽器格式，可以改為：#cookie=http.cookiejar.MozillaCookieJar()

cookie.load(’cookies.txt’,ignore_discard=True,ignore_expires=True)#調用load()方法來讀取本地的Cookies文件，獲取Cookies的內容

handler=urllib.request.HTTPCookieProcessor(cookie)opener=urllib.request.build_opener(handler)response=opener.open(’http://www.baidu.com’)print(response.read().decode(’utf-8’))

運行結果：輸出網頁源代碼。

以上就是本文的全部內容，希望對大家的學習有所幫助，也希望大家多多支持好吧啦網。

Python 編程

上一條：Python urlopen()參數代碼示例解析下一條：Python APScheduler執行使用方法詳解

相關文章：

1. Django ORM實現按天獲取數據去重求和例子2. 解決ajax的delete、put方法接收不到參數的問題方法3. asp知識整理筆記4（問答模式）4. IntelliJ IDEA 統一設置編碼為utf-8編碼的實現5. 詳解idea中web.xml默認版本問題解決6. chat.asp聊天程序的編寫方法7. IntelliJ IDEA 2020最新激活碼(親測有效，可激活至 2089 年)8. Jsp中request的3個基礎實踐9. jsp EL表達式詳解10. 怎樣才能用js生成xmldom對象，并且在firefox中也實現xml數據島？

排行榜

					
					IntelliJ IDEA設置自動提示功能快捷鍵的方法
idea修改背景顏色樣式的方法
IntelliJ IDEA配置Tomcat服務器的方法
IntelliJ IDEA 統一設置編碼為utf-8編碼的實現
asp知識整理筆記4（問答模式）
怎樣才能用js生成xmldom對象，并且在firefox中也實現xml數據島？
chat.asp聊天程序的編寫方法
Jsp中request的3個基礎實踐
解決ajax的delete、put方法接收不到參數的問題方法
jsp EL表達式詳解
IntelliJ IDEA 2020最新激活碼(親測有效，可激活至 2089 年)