爬虫实战|井川里予抖音热舞高清无水印视频,十行代码教你下载!
文 | 酷头
来源:印象python「ID: python_logic」
~~~~~文末赠送数据分析书籍~~~~~
接下来我们就一起来看看吧
首先我们打开抖音网页版,找一个自己喜欢的视频
链接如下:
https://www.douyin.com/video/7004173561606753539
网页分析
我们F12打开浏览器开发者模式,这里有一个小小的技巧不知道大家有没有注意到哈。
平时我们下载图片、评论、小说等我们都是在XHR中找数据源
但是我们如果要获取的是音频或者视频文件就要在Media中找数据源
我们在meida中找到数据源如下复制到浏览器打开这个就是视频播放地址
大家可以将其复制到浏览器进行测试如下:
接下来我们要找的就是视频播放的来源了
我们复制其中一部分进行搜索
发送请求
我们模拟浏览器发送请求,这里需要添加headers防止被网站反爬而无法获取到数据。
url = f'https://www.douyin.com/video/7004173561606753539'
# 1. 发送请求
headers = {
'Referer': 'https://www.douyin.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)pleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4651.0 Sa7.36',
'cookie': 'ttwid=1%7Cy07zoYkoxh90nEDN0p46kPksuWyfQRvWojXibL1Mecc%7C1634437652%7C7a03eab7646b1dece5830e3ddd3d7d62ada886079ffb431640d58afe006ed2c4; _tea_utm_cache_6383=undefined; douyin.com; MONITOR_WEB_ID=09929d96-a1aa-422b-949e-8a4b87b58f21; passport_csrf_token_default=5cfabdc8a6737d87e9aa41c6c6877832; passport_csrf_token=5cfabdc8a6737d87e9aa41c6c6877832; _tea_utm_cache_1300=undefined; s_v_web_id=verify_kuulzc63_0o8IxtyR_8e2a41afbc2b9db9e00a230fbe5a25c9; sid_tt=c5302d49f9d884bb29753fbc3058d50f; sessionid=c5302d49f9d884bb29753fbc3058d50f; sessionid_ss=c5302d49f9d884bb29753fbc3058d50f; sid_ucp_v1=1.0.0-KDhjNjVhNDdlMDVjZTVhODNkYWI3ZjkwNzRmYmE1NTlkYjZhOGViZWIKFwi3iJCHnYyUARDHlK6LBhjvMTgGQPQHGgJscSIgYzUzMDJkNDlmOWQ4ODRiYjI5NzUzZmJjMzA1OGQ1MGY; ssid_ucp_v1=1.0.0-KDhjNjVhNDdlMDVjZTVhODNkYWI3ZjkwNzRmYmE1NTlkYjZhOGViZWIKFwi3iJCHnYyUARDHlK6LBhjvMTgGQPQHGgJscSIgYzUzMDJkNDlmOWQ4ODRiYjI5NzUzZmJjMzA1OGQ1MGY; passport_auth_status=236a0c6330546e8e55c22d3db7013446%2C; FOLLOW_YELLOW_POINT_USER=MS4wLjABAAAA6ZQ5xEpEBo8tspMAC7ehXEcHs7JybDRoyOQcEKaKMXI; FOLLOW_YELLOW_POINT_STATUE_INFO=1%2F1634470738878; __ac_nonce=0616c06b2005ed9a5632c; __ac_signature=_02B4Z6wo00f01ZnqTEwAAIDA-uCMJvP7NJWZzkjAAAcXE4YuvsWFv3wiF3TJO-s7WL5tp2CUupZDFmQqhK6ySy4EYKpeboqmefiuMuI3yfWitSlNmX1UoEAIhufR0gVqpXmJruGF4Ef72ZyGe0; msToken=aBPLK2LYEfdc4C3i_mx9JKzoA_PFxDDSDkBds2_9WnczlzmUp7iJqh4CfIvP8mshpsHoJ8z49QMTIFmo1Y-1_fs2Am4AS1lnYEvNOnYhODSdfRT2vY4nTIKL; tt_scid=cuKv4W7ln1d.N5a5GA8CK2WVzRDdkBdxg-MQ05JR-f--9wDNugz-qUPl7c7REHOP74ac; msToken=UOTNJ7Pqv9YU9AFpN47U3UlvWw3204vsYiJ79MprL8TgK6sIvJ44vKe6GjAluKkwEitlMdNMqkptO8ocGEfFR7-YpnubRYUcIvW4Tmll5yBubJnGrMdFRA=='
}
获取数据
请求发送完毕之后我们接下来查看浏览器响应数据
如下:
# 2.获取数据
resp = requests.get(url, headers = headers)
ic(resp.text)
数据转码
数据已经成功地额获取到,我们可以进入到下一步数据的提取
我们复制如上图部分在我们获取到的数据中进行链接查找,
我们的目标很简单,就是标题和视频链接
我们在获取到数据源中已经成功的找到了我们的视频链接
对比发现浏览器响应给我们的链接是经过编码的,通过这个链接我们是无法直接下载视频的。
所以我们需要先在网站中解码后进行测试
接下来先将这个链接提取出来在进行解码操作,这里就要用到正则了
标题的提取很简单,中间的内容我们直接使用万能的(.*?)即可获取
链接是因为经过解码的,但是我们可以对比mediea中的链接进行提取
视频链接是以url开头的,%3d是等于号,所以我们可以构造如下:
'src(.*?)vr%3D%22'
# 3.解析数据
title = re.findall('<title data-react-helmet="true"> (.*?)</title>', resp.text)[0]
href = re.findall('src(.*?)vr%3D%22', resp.text)
获取到了这么多的href,那个才是我们需要的href呢?
来吧,直接遍历这个列表
ic| h: ('="https://lf1-cdn-tos.bytescm.com/obj/static/log-sdk/collect/collect.js"></script><meta '
'data-react-helmet="true" charset="UTF-8"/><meta data-react-helmet="true" '
'name="viewport" content="width=device-width,initial-scale=1"/><script '
'defer="defer" '
'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/xgplayerLive.7572286b.js"></script><script '
'defer="defer" '
'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/lib.ui.9df85a56.js"></script><script '
'defer="defer" '
'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/lib.util.43069e5f.js"></script><script '
'defer="defer" '
'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/react.vendor2.443c54c8.js"></script><script '
'defer="defer" '
'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/lottie.c89672be.js"></script><script '
'defer="defer" '
'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/bytejs.b9eaa346.js"></script><script '
'defer="defer" '
'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/vendors2.6038bfa3.js"></script><script '
'defer="defer" '
'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/vendors.58c74f55.js"></script><script '
'defer="defer" '
'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/index.16edc129.js"></script><link '
'href="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/vendors.dbe96361.css" '
'rel="stylesheet"><link '
'href="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/index.86bc1c1a.css" '
'rel="stylesheet"><script id="RENDER_DATA" '
'type="application/json">%7B%22_location%22%3A%22%2Fvideo%2F7004173561606753539%22%2C%22C_0%22%3A%7B%22abTestData%22%3A%7B%22navTabRecommendType%22%3A1%2C%22navTabFollowType%22%3A1%2C%22navTabHotType%22%3A1%7D%2C%22odin%22%3A%7B%22user_id%22%3A%22651331000075319%22%2C%22user_type%22%3A12%2C%22user_is_auth%22%3A1%2C%22user_unique_id%22%3A%227019856247298475561%22%7D%2C%22user%22%3A%7B%22isLogin%22%3Atrue%2C%22info%22%3A%7B%22uid%22%3A%22651331000075319%22%2C%22secUid%22%3A%22MS4wLjABAAAA6ZQ5xEpEBo8tspMAC7ehXEcHs7JybDRoyOQcEKaKMXI%22%2C%22shortId%22%3A%224105451342%22%2C%22nickname%22%3A%22%E4%B8%8D%E5%8D%96%E8%90%8C%E7%9A%84%E9%82%93%E8%82%AF%22%2C%22desc%22%3A%22%22%2C%22gender%22%3A2%2C%22avatarUrl%22%3A%22%2F%2Fp3.douyinpic.com%2Faweme%2F100x100%2Ftos-cn-i-0813%2Fd2cba90a048c46e7a3e55302e940b1dc.jpeg%22%2C%22avatar300Url%22%3A%22%2F%2Fp3.douyinpic.com%2Fimg%2Ftos-cn-i-0813%2Fd2cba90a048c46e7a3e55302e940b1dc~c5_300x300.webp%22%2C%22followStatus%22%3A0%2C%22followerStatus%22%3A0%2C%22awemeCount%22%3A12%2C%22followingCount%22%3A29%2C%22followerCount%22%3A14%2C%22mplatformFollowersCount%22%3A14%2C%22favoritingCount%22%3A173%2C%22totalFavorited%22%3A22%2C%22uniqueId%22%3A%22dk_818%22%2C%22customVerify%22%3A%22%22%2C%22enterpriseVerifyReason%22%3A%22%22%2C%22secret%22%3A0%2C%22userCanceled%22%3Afalse%2C%22roomData%22%3A%7B%7D%2C%22shareQrcodeUrl%22%3A%22https%3A%2F%2Fp3.douyinpic.com%2Fobj%2F31735000ab94954cbd483%22%2C%22roomId%22%3A0%2C%22favoritePermission%22%3A1%7D%2C%22statusCode%22%3A0%2C%22isSpider%22%3Afalse%7D%2C%22isSpider%22%3Afalse%7D%2C%22C_19%22%3A%7B%22awemeId%22%3A%227004173561606753539%22%2C%22logPb%22%3A%22%7B%5C%22impr_id%5C%22%3A%5C%22021634474240681fdbddc0100fff0030ad295880000002b3be82d%5C%22%7D%22%2C%22aweme%22%3A%7B%22statusCode%22%3A0%2C%22detail%22%3A%7B%22awemeId%22%3A%227004173561606753539%22%2C%22awemeType%22%3A0%2C%22groupId%22%3A%226982868964862971143%22%2C%22authorInfo%22%3A%7B%22uid%22%3A%222568934929994387%22%2C%22secUid%22%3A%22MS4wLjABAAAAS4y5ucL_DoxSbSJBPC0doCNy94lXd6CnDTl7l8UTpWkwQ_ZK9GxsxqFxJRrl2doF%22%2C%22nickname%22%3A%22%E6%A5%A0%E6%A5%A0.%22%2C%22remarkName%22%3A%22%22%2C%22avatarUri%22%3A%22%2F%2Fp3-pc.douyinpic.com%2Fimg%2Ftos-cn-i-0813%2F33c618100012493cb30d0559fff786dc~c5_100x100.jpeg%3Ffrom%3D116350172%22%2C%22followerCount%22%3A453%2C%22totalFavorited%22%3A7050%2C%22followStatus%22%3A0%2C%22followerStatus%22%3A0%2C%22enterpriseVerifyReason%22%3A%22%22%2C%22customVerify%22%3A%22%22%7D%2C%22desc%22%3A%22%E8%8A%B1%E4%BA%86%E7%82%B9%E6%97%B6%E9%97%B4%E5%89%AA%E9%9B%86%23%E4%BA%95%E5%B7%9D%E9%87%8C%E4%BA%88%20%E8%B7%B3%E8%88%9E%E8%A7%86%E9%A2%91%E5%90%88%E9%9B%86%EF%BC%8C%E5%96%9C%E6%AC%A2%E5%8F%AF%E4%BB%A5%E6%94%AF%E6%8C%81%E4%B8%80%E4%B8%8B%E6%88%91%E4%B8%AB%E8%B0%A2%E8%B0%A2%E5%AE%B6%E9%93%B6%E4%BB%AC%E2%9D%A4%EF%B8%8F%23%E7%83%AD%E9%97%A8%23%E6%8E%A8%E5%B9%BF%E5%B0%8F%E5%8A%A9%E6%89%8B%20%20%40DOU%2B%E5%B0%8F%E5%8A%A9%E6%89%8B%22%2C%22authorUserId%22%3A%222568934929994387%22%2C%22createTime%22%3A1630786249%2C%22textExtra%22%3A%5B%7B%22start%22%3A7%2C%22end%22%3A12%2C%22type%22%3A1%2C%22hashtagId%22%3A%221662018461135883%22%2C%22hashtagName%22%3A%22%E4%BA%95%E5%B7%9D%E9%87%8C%E4%BA%88%22%2C%22awemeId%22%3A%22%22%2C%22userId%22%3A%22%22%2C%22isCommerce%22%3Afalse%7D%2C%7B%22start%22%3A37%2C%22end%22%3A40%2C%22type%22%3A1%2C%22hashtagId%22%3A%221588489879306259%22%2C%22hashtagName%22%3A%22%E7%83%AD%E9%97%A8%22%2C%22awemeId%22%3A%22%22%2C%22userId%22%3A%22%22%2C%22isCommerce%22%3Afalse%7D%2C%7B%22start%22%3A40%2C%22end%22%3A46%2C%22type%22%3A1%2C%22hashtagId%22%3A%221627250169964547%22%2C%22hashtagName%22%3A%22%E6%8E%A8%E5%B9%BF%E5%B0%8F%E5%8A%A9%E6%89%8B%22%2C%22awemeId%22%3A%22%22%2C%22userId%22%3A%22%22%2C%22isCommerce%22%3Afalse%7D%2C%7B%22start%22%3A49%2C%22end%22%3A58%2C%22type%22%3A0%2C%22hashtagId%22%3A%22%22%2C%22hashtagName%22%3A%22%22%2C%22awemeId%22%3A%22%22%2C%22userId%22%3A%2270258503077%22%2C%22isCommerce%22%3Afalse%7D%5D%2C%22userDigged%22%3Afalse%2C%22video%22%3A%7B%22width%22%3A1080%2C%22height%22%3A1920%2C%22ratio%22%3A%221080p%22%2C%22duration%22%3A51851%2C%22playAddr%22%3A%5B%7B%22src%22%3A%22%2F%2Fv26-web.douyinvod.com%2F4655cc98bb0358b53cd639b679207958%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2F8c10edb56f1546cc97f1ea484b390be8%2F%3Fa%3D6383%26br%3D3071%26bt%3D3071%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D4%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApaDdpPDY6OmVpNzlmM2VoZmdqNDVrcjRvMG1gLS1kLTBzczJfXjUvM2A1Xy1jXzAuXi46Yw%253D%253D%26vl%3D%26')
ic| h: '%22%3A%22%2F%2Fv3-web.douyinvod.com%2F33673180fc481002021f006712e77be1%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2F8c10edb56f1546cc97f1ea484b390be8%2F%3Fa%3D6383%26br%3D3071%26bt%3D3071%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D4%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApaDdpPDY6OmVpNzlmM2VoZmdqNDVrcjRvMG1gLS1kLTBzczJfXjUvM2A1Xy1jXzAuXi46Yw%253D%253D%26vl%3D%26'
ic| h: '%22%3A%22%2F%2Fv26-web.douyinvod.com%2F8d29f309041395f0f17eeb2875bbfcb0%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2F7f7d16ce9e8a4c06b25dc58132327a79%2F%3Fa%3D6383%26br%3D2054%26bt%3D2054%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D3%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApNzRkMzo8PGU3N2Y2PGg2OGdqNDVrcjRvMG1gLS1kLTBzc2I1YDBfMDBeXjE2MWBgNTQ6Yw%253D%253D%26vl%3D%26'
ic| h: '%22%3A%22%2F%2Fv3-web.douyinvod.com%2Facabc63318f8e681e83a2ff71a3c8206%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2F7f7d16ce9e8a4c06b25dc58132327a79%2F%3Fa%3D6383%26br%3D2054%26bt%3D2054%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D3%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApNzRkMzo8PGU3N2Y2PGg2OGdqNDVrcjRvMG1gLS1kLTBzc2I1YDBfMDBeXjE2MWBgNTQ6Yw%253D%253D%26vl%3D%26'
ic| h: '%22%3A%22%2F%2Fv26-web.douyinvod.com%2F8fa32b926628584a42e222b32d81c239%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2Fccf5fff2e52f430896430aa9bd3e5c43%2F%3Fa%3D6383%26br%3D1901%26bt%3D1901%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D6%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApaGhoZDtmZDs8Nzo5ZGc6OmdqNDVrcjRvMG1gLS1kLTBzcy4zMl5gLTJgLi82MS40NDI6Yw%253D%253D%26vl%3D%26'
ic| h: '%22%3A%22%2F%2Fv3-web.douyinvod.com%2F499af49c2cee102d8a427272bee6e903%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2Fccf5fff2e52f430896430aa9bd3e5c43%2F%3Fa%3D6383%26br%3D1901%26bt%3D1901%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D6%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApaGhoZDtmZDs8Nzo5ZGc6OmdqNDVrcjRvMG1gLS1kLTBzcy4zMl5gLTJgLi82MS40NDI6Yw%253D%253D%26vl%3D%26'
显然第一个排除,都是一些乱七八糟的html结构数据,明显不是链接。
下面几个都有可能,我们先来解码第一个进行测试
http://tool.chinaz.com/tools/urlencode.aspx
解码后发现链接最前面的':'需要使用'https:'替换后才是视频的真实请求链接
并且这几个视频的请求连接第一是最高清的
好了接下来我们使用代码来解码如下:
# 3.解析数据
title = re.findall('<title data-react-helmet="true"> (.*?)</title>', resp.text)[0]
href = re.findall('src(.*?)vr%3D%22', resp.text)[1]
video_url = requests.utils.unquote(href).replace('":"', 'https:') # 解码
保存数据
数据分析完成已经成功获取到我们想要的url和标题,接下来下载就很简单了
# 5.保存数据
video_content = requests.get(url=video_url).content
with open('抖音高清视频\\' + title + '.mp4', mode='wb') as fin:
fin.write(video_content)
print(title+'.mp4文件下载完成!!')
最后我们使用函数封装,每次下载新的视频只需要传入视频id即可
def download_douyin(video_id)
接下来一起看看效果吧!
花了点时间剪集#井川里予 跳舞视频合集,喜欢可以支持一下我丫谢谢家银们 - 抖音.mp4文件下载完成!!心过好每一天。
万丈高楼平地起,辉煌只能靠自己。#社会慢摇 - 抖音.mp4
喜欢哪个#穿搭#身材 - 抖音.mp4
是不是高清无水印?
好了今天分享到此为止,
下期看我如何使用selenium批量爬取抖音小姐姐视频~~~
文末赠书
内容简介
全面:数据分析与大数据处理所需的所有技术,包含基础理论、核心概念、实施流程,从编程语言准备、数据采集与清洗、数据分析与可视化,到大型数据的分布式存储与分布式计算等。
深入:一本书讲透1种编程语言和14种数据分析与大处理工具,以及大数据分析技术及项目开发方法。
丰富:包含45个“新手问答”、17个章节的“实训”、3个项目综合实战、50道Python面试题精选。
限时抢购👇
送书规则
送书方式:本次共包邮送书2本,均由留言送出!
留言内容:在本文下面留言,主题:说说你对数据分析的一些认识或者看法?
开奖方式:选择精心留言8条,群抽奖参与赠书!
开奖时间:2021年10月22日20:00,开奖后12小时内未与我联系视为放弃,逾期不候。
抽奖规则:
1.截止日前需要给本文点赞+在看,领奖时需要提供截图,否则无效
2. 参与本次活动的读者需在活动截止前添加老邓好友,否则中奖无效!
3.每人限得一本!