程序设计需求:爬取百度指数目标城市的年均值。
该程序使用抓包方法获取百度指数对应搜索目标的年均值数据。通过观察百度指数网页的api调用情况,百度指数像后端请求的api构成应当为:1
http://index.baidu.com/api/SearchApi/index?area={area}&word={words}&startDate={startDate}&endDate={endDate}
其中,area为搜索发起地,words为搜索关键字,至多包含5个,startDate和endDate分别为起始时间与结束时间。
这些变量的构建方式如下:1
2
3
4
5
6
7words = [[{"name": key, "wordType": 1}] for key in keys]
words = str(words).replace(" ", "").replace("'", "\"")
startDate = f"{year}-01-01"
endDate = f"{year}-12-31"
其中,keys是搜索的各对象。
通过request构建请求头,并使用get方法请求数据:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25headers = {
"Connection": "keep-alive",
"Accept": "application/json, text/plain, */*",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Cipher-Text": "1698156005330_1698238860769_ZPrC2QTaXriysBT+5sgXcnbTX3/lW65av4zgu9uR1usPy82bArEg4m9deebXm7/O5g6QWhRxEd9/r/hqHad2WnVFVVWybHPFg3YZUUCKMTIYFeSUIn23C6HdTT1SI8mxsG5mhO4X9nnD6NGI8hF8L5/G+a5cxq+b21PADOpt/XB5eu/pWxNdwfa12krVNuYI1E8uHQ7TFIYjCzLX9MoJzPU6prjkgJtbi3v0X7WGKDJw9hwnd5Op4muW0vWKMuo7pbxUNfEW8wPRmSQjIgW0z5p7GjNpsg98rc3FtHpuhG5JFU0kZ6tHgU8+j6ekZW7+JljdyHUMwEoBOh131bGl+oIHR8vw8Ijtg8UXr0xZqcZbMEagEBzWiiKkEAfibCui59hltAgW5LG8IOtBDqp8RJkbK+IL5GcFkNaXaZfNMpI=",
"Referer": "https://index.baidu.com/v2/main/index.html",
"Accept-Language": "zh-CN,zh;q=0.9",
'Cookie': cookie}
res = requests.get(url, headers=headers)
res_json = res.json()
返回的年均值与请求关键词如下两个列表所示:1
2
3
4
5retuen_keys_num = len(res_json['data']['generalRatio'])
avg_list = [res_json['data']['generalRatio'][i]['all']['avg'] for i in range(retuen_keys_num)]
destination_list = [res_json['data']['generalRatio'][i]['word'][0]['name'] for i in range(retuen_keys_num)]