学点真正的技术,搞定知乎,b站,豆瓣,抖音,公众号,微博等平台
苏生不惑第
463
篇原创文章,加入我的知识星球。
前几天我的知识星球里一位小伙伴问怎么下载知乎用户的回答 ?
有兴趣的小伙伴可以加入我的知识星球 , 星球几乎每天更新,主要发布我每天在国内外互联网上看到过有趣的网站,软件和一些工作生活经验分享,包括方方面面,堪称互联网宝藏库,所以叫互联网达人嘛,每条帖子都有标签,可以选择标签查看对应内容 https://t.zsxq.com/13bqoLXHJ
记得很久之前写过一篇关于web scraper抓取数据的文章,今天再整理分享下,不用写代码也可以自由抓取数据。
这里以渤海小吏这个知乎号为例https://www.zhihu.com/people/dai-zong-66 ,首先安装 web scraper 浏览器扩展(下载地址在公众号对话框回复 scraper
) ,安装后打开浏览器控制台点击import sitemap 。
{"_id":"zhihu_answer","startUrl":["https://www.zhihu.com/people/dai-zong-66/answers?page=[1-5]"],"selectors":[{"id":"row","parentSelectors":["_root"],"type":"SelectorElement","selector":"div.List-item","multiple":true},{"id":"知乎问题标题","parentSelectors":["row"],"type":"SelectorText","selector":"div[itemprop='zhihu:question'] a","multiple":false,"regex":""},{"id":"知乎问题链接","parentSelectors":["row"],"type":"SelectorElementAttribute","selector":"[itemprop='zhihu:question'] a","multiple":false,"extractAttribute":"href"}]}
{"_id":"zhihu_zhuanlan","startUrl":["https://www.zhihu.com/people/dai-zong-66/posts/posts?page=[1-30]"],"selectors":[{"id":"row","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.List-item","multiple":true,"delay":0},{"id":"知乎标题","type":"SelectorText","parentSelectors":["row"],"selector":"h2.ContentItem-title","multiple":false,"regex":"","delay":0},{"id":"知乎链接","type":"SelectorElementAttribute","parentSelectors":["row"],"selector":"h2.ContentItem-title span a ","multiple":false,"extractAttribute":"href","delay":0}]}
导出的excel数据包含知乎文章标题,链接,评论数和赞同数:
还有知乎话题的抓取,导入以下代码:
{"_id":"zhihu_topic","startUrl":["https://www.zhihu.com/topic/19559424/top-answers"],"selectors":[{"id":"row","parentSelectors":["_root"],"type":"SelectorElementScroll","selector":"div.List-item:nth-of-type(-n+10)","multiple":true,"delay":2000,"elementLimit":500},{"id":"知乎标题","parentSelectors":["row"],"type":"SelectorText","selector":"h2 a","multiple":false,"regex":""},{"id":"知乎链接","parentSelectors":["row"],"type":"SelectorLink","selector":"[itemprop='zhihu:question'] a[data-za-detail-view-element_name]","multiple":false,"linkType":"linkFromHref"}]}
{"_id":"bilibili_videos","startUrl":["https://space.bilibili.com/927587/video?tid=0&pn=[1-42:1]&keyword=&order=pubdate"],"selectors":[{"id":"row","parentSelectors":["_root"],"type":"SelectorElement","selector":"li.small-item","multiple":true},{"id":"视频标题","parentSelectors":["row"],"type":"SelectorText","selector":"a.title","multiple":false,"regex":""},{"id":"视频链接","parentSelectors":["row"],"type":"SelectorElementAttribute","selector":"a.cover","multiple":false,"extractAttribute":"href"},{"id":"视频封面","parentSelectors":["row"],"type":"SelectorElementAttribute","selector":"a.cover div.b-img picture img","multiple":false,"extractAttribute":"src"},{"id":"视频播放量","parentSelectors":["row"],"type":"SelectorText","selector":".play span","multiple":false,"regex":""},{"id":"视频长度","parentSelectors":["row"],"type":"SelectorText","selector":" a.cover span.length","multiple":false,"regex":""},{"id":"发布时间","parentSelectors":["row"],"type":"SelectorText","selector":"span.time","multiple":false,"regex":""}]}
导出的excel数据包含视频标题,链接,封面,播放量,长度,时间等,从2013到2023年共发布视频1200多个。
{"_id":"bilibili","startUrl":["https://www.bilibili.com/v/popular/rank/all"],"selectors":[{"id":"row","multiple":true,"parentSelectors":["_root"],"selector":"li.rank-item","type":"SelectorElement"},{"id":"视频排名","multiple":false,"parentSelectors":["row"],"regex":"","selector":"i.num","type":"SelectorText"},{"id":"视频标题","multiple":false,"parentSelectors":["row"],"regex":"","selector":"a.title","type":"SelectorText"},{"id":"播放量","multiple":false,"parentSelectors":["row"],"regex":"","selector":".detail-state > span:nth-of-type(1)","type":"SelectorText"},{"id":"弹幕数","multiple":false,"parentSelectors":["row"],"regex":"","selector":"span:nth-of-type(2)","type":"SelectorText"},{"id":"up主","multiple":false,"parentSelectors":["row"],"regex":"","selector":"a span","type":"SelectorText"},{"id":"视频链接","multiple":false,"parentSelectors":["row"],"selector":"a.title","type":"SelectorLink"},{"id":"点赞数","multiple":false,"parentSelectors":["视频链接"],"regex":"","selector":"span.like","type":"SelectorText"},{"id":"投币数","multiple":false,"parentSelectors":["视频链接"],"regex":"","selector":"span.coin","type":"SelectorText"},{"id":"收藏数","multiple":false,"parentSelectors":["视频链接"],"regex":"","selector":"span.collect","type":"SelectorText"}]}
抓取豆瓣电影排行榜 top 250,导入以下代码:
{"_id":"douban_movie_top_250","startUrl":["https://movie.douban.com/top250?start=0&filter="],"selectors":[{"id":"next_page","type":"SelectorLink","parentSelectors":["_root","next_page"],"selector":".next a","multiple":true,"delay":0},{"id":"container","type":"SelectorElement","parentSelectors":["_root","next_page"],"selector":".grid_view li","multiple":true,"delay":0},{"id":"title","type":"SelectorText","parentSelectors":["container"],"selector":"span.title:nth-of-type(1)","multiple":false,"regex":"","delay":0},{"id":"number","type":"SelectorText","parentSelectors":["container"],"selector":"em","multiple":false,"regex":"","delay":0}]}
还有抖音账号所有视频数据 ,数据包括视频日期,视频标题,视频链接,点赞数,评论数,收藏数,转发数等。
以及公众号的所有文章数据,数据包含文章日期,文章标题,文章链接,文章简介,文章作者,文章封面图,是否原创,IP归属地,阅读数,在看数,点赞数,留言数,赞赏次数,视频数,音频数等,比如深圳卫健委2022年的文章阅读数都是10万+,文章数据分析见文章2022年过去,抓取公众号阅读数点赞数在看数留言数做数据分析, 以深圳卫健委这个号为例 。
最新原创文章:
再次更新:2023批量下载公众号文章内容/话题/图片/封面/视频/音频,导出文章pdf,文章数据含阅读数/点赞数/在看数/留言数
一次性搞定微博,苏生不惑又写了个脚本,一键下载微博内容导出pdf,批量抓取微博评论转发数据导出excel
2023 年数字图书馆 zlibrary 复活,新推出客户端人人可用
批量下载抖音视频,小红书视频,抓取抖音视频数据导出excel
如果文章对你有帮助还请
点赞/在看/分享
三连支持下, 感谢各位!