Python獲取HTTP請求的狀態碼(200,404等)
問題描述
Python獲取HTTP請求的狀態碼(200,404等),不訪問整個頁面源碼,那樣太浪費資源:
輸入:segmentfault.com 輸出:200輸入:segmentfault.com/nonexistant 輸出:404
問題解答
回答1:參考文章:Python實用腳本清單
http不只有get方法(請求頭部+正文),還有head方法,只請求頭部。
import httplibdef get_status_code(host, path='/'): ''' This function retreives the status code of a website by requestingHEAD data from the host. This means that it only requests the headers.If the host cannot be reached or something else goes wrong, it returnsNone instead. ''' try:conn = httplib.HTTPConnection(host)conn.request('HEAD', path)return conn.getresponse().status except StandardError:return Noneprint get_status_code('segmentfault.com') # prints 200print get_status_code('segmentfault.com', '/nonexistant') # prints 404回答2:
你用get請求就會請求整個頭部+正文, 可以試下head方法, 直接訪問頭部!
import requestshtml = requests.head(’http://segmentfault.com’) # 用head方法去請求資源頭部print html.status_code # 狀態碼html = requests.head(’/nonexistant’) # 用head方法去請求資源頭部print html.status_code # 狀態碼# 輸出:200404
