B站弹幕简单爬虫

功能:获取视频弹幕并保存到txt文档

使用方法:找到b站视频所在的aid,传入到main函数下的av='一串数字,即aid'即可

找aid方法:视频下有个转发按钮,鼠标放上去可以看到有个嵌入代码iframe,那里就有aid。或者检查、network、刷新一下,在Name栏可以找到包含aid的网址

"""
首先获取avid
从pagelist里获取,先将av空格去掉,然后获取url,获取res里的text,用json转换成字典,返回
然后根据cid获取弹幕
首先获取url,然后解码成utf-8,然后用正则表达式,然后用findall寻找所有,然后返回
存储
"""
import requests
import json
import re
def getcid(av):
av = av.strip('av')
url = f'https://api.bilibili.com/x/player/pagelist?aid={av}&jsonp=jsonp'
res = requests.get(url)
res = res.text
print(res)
res_dict = json.loads(res)
cid = res_dict['data'][0]['cid']
return cid
def getdanmu(cid):
url = f'https://api.bilibili.com/x/v1/dm/list.so?oid={cid}'
res = requests.get(url)
de = res.content.decode('utf-8')
ch = re.compile('<d.*?>(.*?)</d>')
danmu = ch.findall(de)
return danmu
def savedanmu(danmu, filename):
with open(filename, mode='w', encoding='utf-8') as f:
for w in danmu:
f.write(w)
f.write(' ')
#  <iframe src="//player.bilibili.com/player.html?aid=545259775&bvid=BV1Sq4y1E7HQ&cid=331019617&page=1"
# scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>
if __name__ == '__main__':
av = '890452353'
cid = getcid(av)
danmu = getdanmu(cid)
savedanmu(danmu, f'{av}.txt')

****