python 爬虫12306 城市json python爬虫模块

转载

hackernew 2023-10-23 21:43:32

文章标签 json get请求 Server 文章分类 Python 后端开发

爬虫需要掌握的一些知识图谱：

如果不使用爬虫框架scrapy，也利用一些爬虫模块也可以自定义爬虫的过程，比如 Python标准库中提供的urllib、urllib2、httplib，但是这些模块已经有些过时了，而Requests是使用Apache2 Licensed许可证的，其在Python内置模块的基础上进行了高度的封装，从而使得Pythoner进行网络请求时，变得美好了许多。

1、requests模块的一些基本参数：

可以看下Requests模块的源码，主要包含get、post、put、delete、head、options参数，参数部分的源码如下：

get：

python 爬虫12306 城市json python爬虫模块_Server

options

python 爬虫12306 城市json python爬虫模块_json_02

head

python 爬虫12306 城市json python爬虫模块_get请求_03

post

python 爬虫12306 城市json python爬虫模块_json_04

python 爬虫12306 城市json python爬虫模块_get请求_05

patch

python 爬虫12306 城市json python爬虫模块_get请求_06

delete

python 爬虫12306 城市json python爬虫模块_json_07

实现爬虫时，有两种方式，一种是直接拿公开的信息，一种是需要登录才能拿到的信息。

案例1：发送最简单的get请求，获取页面信息

python 爬虫12306 城市json python爬虫模块_Server_08

案例2：拼接构建url后，使用request发送get请求。

url的拼接：

如果需要在搜索框中输入关键词进行搜索，可以使用"query=关键词"进行拼接，类似于在百度搜索框中直接写关键词。

python 爬虫12306 城市json python爬虫模块_get请求_09

比如以下网址会被拼接为：https://www.sogou.com/web？query=楼市政策&q=b。

同时下案例也是以request的方式发送get请求

python 爬虫12306 城市json python爬虫模块_get请求_10

案例3：发送post请求

右击--检查--network就可以看到post的请求

python 爬虫12306 城市json python爬虫模块_get请求_11

案例3-1：使用post直接发送请求

import requests
form_data ={
'phone':'861。。。',
'password':'123456',
'oneMonth':'1',
}
r=requests.post(
url='http://dig.chouti.com/login',
data=form_data
)
print(r.text)

执行结果：

会出来验证的信息，要么用户名和密码错误，要么登录成功，如果ip被封就会显示其他的信息

案例3-2：data参数，发送post请求

import requests
response=requests.request(
method='POST',
url='http://www.sogou.com/web',
params={'query':'房价','q':'b'},#"query=fangjia&q=b"
    # 以data发送数据，则请求头会是content-type:application/x-www-form-urlencoded
   data={'user':'hh','pwd':'sdh'},#"user=hh; pwd=sdh"
)

案例3-3：json参数的应用

import requests
import json
response=requests.request(
method='POST',
url='http://www.sogou.com/web',
params={'query':'房价','q':'b'},#"query=fangjia&q=b"
    # 以json发送数据，则请求头会是content-type:application/json
    json = json.dumps({'user': 'hh', 'pwd': 'sdh'}) 
)

案例4：设置head头，添加user-agent

案例4-1：出现500 Server Error问题，一种可能性是需要在head头中添加user_agent

import requests
response=requests.get(url='https://www.zhihu.com')
print(response.text)#输出标题

效果：

python 爬虫12306 城市json python爬虫模块_get请求_12

案例4-2：解决500 Server Error问题，添加header头信息，可以右键----检查----网络，找到对应的Referer、User-Agent，信息复制进header中即可。

python 爬虫12306 城市json python爬虫模块_get请求_13

import requests
response=requests.get(url='https://www.zhihu.com/',
headers={'Referer':'https://www.zhihu.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36'
                               }
                      )
print(response.text)#输出标题

即可打印一些内容。

案例4-3：动态设置user-agent

python 爬虫12306 城市json python爬虫模块_Server_14