没想到上面好看的跳舞小姐姐蛮多的，【Python爬虫】采集微博视频数据

316 0 0

作者:松鼠爱吃饼干

前言

随时随地发现新鲜事！微博带你欣赏世界上每一个精彩瞬间，了解每一个幕后故事。分享你想表达的，让全世界都能听到你的心声！今天我们通过python去采集微博当中好看的视频！

没错，今天的目标是微博数据采集，爬的是那些好看的小姐姐视频

对于本篇文章有疑问的同学可以加【资料白嫖、解答交流群：910981974】

知识点

requests
pprint

开发环境

版本：python 3.8
-编辑器：pycharm 2021.2

爬虫原理

作用：批量获取互联网数据(文本, 图片, 音频, 视频)
本质：一次次的请求与响应

案例实现

1. 导入所需模块

import requests
import pprint

2. 找到目标网址

打开开发者工具，选中Fetch/XHR，选中数据所在的标签，找到目标所在url

 https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor

3. 发送网络请求

headers = {
    'cookie': '',
    'referer': 'https://weibo.com/tv/channel/4379160563414111/editor',
    'user-agent': '',
}
data = {
    'data': '{"Component_Channel_Editor":{"cid":"4379160563414111","count":9}}'
}
url = 'https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor'
json_data = requests.post(url=url, headers=headers, data=data).json()

4. 获取数据

json_data_2 = requests.post(url=url_1, headers=headers, data=data_1).json()

5. 筛选数据

dict_urls = json_data_2['data']['Component_Play_Playinfo']['urls']
video_url = "https:" + dict_urls[list(dict_urls.keys())[0]]
print(title + "t" + video_url)

6. 保存数据

video_data = requests.get(video_url).content
with open(f'video\{title}.mp4', mode='wb') as f:
    f.write(video_data)
print(title, "爬取成功................")

完整代码

import requests
import pprint

headers = {
    'cookie': '添加自己的',
    'referer': 'https://weibo.com/tv/channel/4379160563414111/editor',
    'user-agent': '',
}
data = {
    'data': '{"Component_Channel_Editor":{"cid":"4379160563414111","count":9}}'
}
url = 'https://www.weibo.com/tv/api/component?page=/tv/channel/4379160563414111/editor'
json_data = requests.post(url=url, headers=headers, data=data).json()
print(json_data)

ccs_list = json_data['data']['Component_Channel_Editor']['list']
next_cursor = json_data['data']['Component_Channel_Editor']['next_cursor']
for ccs in ccs_list:
    oid = ccs['oid']
    title = ccs['title']
    data_1 = {
        'data': '{"Component_Play_Playinfo":{"oid":"' + oid + '"}}'
    }
    url_1 = 'https://weibo.com/tv/api/component?page=/tv/show/' + oid
    json_data_2 = requests.post(url=url_1, headers=headers, data=data_1).json()
    dict_urls = json_data_2['data']['Component_Play_Playinfo']['urls']
    video_url = "https:" + dict_urls[list(dict_urls.keys())[0]]
    print(title + "t" + video_url)

    video_data = requests.get(video_url).content
    with open(f'video\{title}.mp4', mode='wb') as f:
        f.write(video_data)
    print(title, "爬取成功................")

内容来源于网络如有侵权请私信删除

文章来源: 博客园

原文链接: https://www.cnblogs.com/qshhl/p/15637804.html

标签： Python Python开发 Python语言

你还没有登录，请先登录或注册！

还没有人评论，欢迎说说您的想法！

没想到上面好看的跳舞小姐姐蛮多的，【Python爬虫】采集微博视频数据

没想到上面好看的跳舞小姐姐蛮多的，【Python爬虫】采集微博视频数据

前言

对于本篇文章有疑问的同学可以加【资料白嫖、解答交流群：910981974】

知识点

开发环境

爬虫原理

案例实现

1. 导入所需模块

2. 找到目标网址

3. 发送网络请求

4. 获取数据

5. 筛选数据

6. 保存数据

完整代码

相关课程

热门标签

没想到上面好看的跳舞小姐姐蛮多的，【Python爬虫】采集微博视频数据

没想到上面好看的跳舞小姐姐蛮多的，【Python爬虫】采集微博视频数据

前言

相关课程

热门标签

推荐文章