使用skimr对数据进行描述性分析

Original 阿越就是我医学和生信笔记 2023-06-15

收录于合集

#R包学习 73 个

#数据分析 34 个

#r语言 205 个

关注公众号，发送R语言或Python，获取学习资料！

skimr提供了一种无摩擦的汇总统计方法，1行代码显示汇总统计，用户可以快速浏览以理解数据。可以处理不同的数据类型并返回一个skim df对象，该对象可以包含在管道中，阅读方式十分友好。

这个skimr包属于一个叫ropensci的组织，有很多好用的R包都属于这个组织，大家感兴趣的话可以去官网^[1]探索。

安装
使用
定制参数

安装

目前不支持从cran安装，只能通过github安装。

# install.packages("devtools")
devtools::install_github("ropensci/skimr")

使用

library(skimr)

使用非常简单，就是一个skim函数，支持的描述统计比summary更多。输的内容会按照数据类型给你呈现，也可以自己定制输出内容和格式。

skim(iris)

Table: Data summary


Name	iris
Number of rows	150
Number of columns	5
_______________________
Column type frequency:
factor	1
numeric	4
________________________
Group variables	None

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
Species	0	1	FALSE	3	set: 50, ver: 50, vir: 50

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Sepal.Length	1	5.84	0.83	4.3	5.1	5.80	6.4	7.9	▆▇▇▅▂
Sepal.Width	1	3.06	0.44	2.0	2.8	3.00	3.3	4.4	▁▆▇▂▁
Petal.Length	1	3.76	1.77	1.0	1.6	4.35	5.1	6.9	▇▁▆▇▂
Petal.Width	1	1.20	0.76	0.1	0.3	1.30	1.8	2.5	▇▁▇▅▃

skim(dplyr::starwars)

Table: Data summary


Name	dplyr::starwars
Number of rows	87
Number of columns	14
_______________________
Column type frequency:
character	8
list	3
numeric	3
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
name	0	1.00	3	21	87
hair_color	5	0.94	4	13	12
skin_color	0	1.00	3	19	31
eye_color	0	1.00	3	13	15
sex	4	0.95	4	14	4
gender	4	0.95	8	9	2
homeworld	10	0.89	4	14	48
species	4	0.95	3	14	37

Variable type: list

skim_variable	complete_rate	n_unique	min_length	max_length
films	1	24	1	7
vehicles	1	11	0	2
starships	1	17	0	5

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
height	6	0.93	174.36	34.77	66	167.0	180	191.0	264	▁▁▇▅▁
mass	28	0.68	97.31	169.46	15	55.6	79	84.5	1358	▇▁▁▁▁
birth_year	44	0.49	87.57	154.69	8	35.0	52	72.0	896	▇▁▁▁▁

当然也支持管道符：

library(dplyr)
## 
## 载入程辑包：'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

iris %>% 
  group_by(Species) %>% 
  skim()

Table: Data summary


Name	Piped data
Number of rows	150
Number of columns	5
_______________________
Column type frequency:
numeric	4
________________________
Group variables	Species

Variable type: numeric

skim_variable	Species	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Sepal.Length	setosa	1	5.01	0.35	4.3	4.80	5.00	5.20	5.8	▃▃▇▅▁
Sepal.Length	versicolor	1	5.94	0.52	4.9	5.60	5.90	6.30	7.0	▂▇▆▃▃
Sepal.Length	virginica	1	6.59	0.64	4.9	6.23	6.50	6.90	7.9	▁▃▇▃▂
Sepal.Width	setosa	1	3.43	0.38	2.3	3.20	3.40	3.68	4.4	▁▃▇▅▂
Sepal.Width	versicolor	1	2.77	0.31	2.0	2.52	2.80	3.00	3.4	▁▅▆▇▂
Sepal.Width	virginica	1	2.97	0.32	2.2	2.80	3.00	3.18	3.8	▂▆▇▅▁
Petal.Length	setosa	1	1.46	0.17	1.0	1.40	1.50	1.58	1.9	▁▃▇▃▁
Petal.Length	versicolor	1	4.26	0.47	3.0	4.00	4.35	4.60	5.1	▂▂▇▇▆
Petal.Length	virginica	1	5.55	0.55	4.5	5.10	5.55	5.88	6.9	▃▇▇▃▂
Petal.Width	setosa	1	0.25	0.11	0.1	0.20	0.20	0.30	0.6	▇▂▂▁▁
Petal.Width	versicolor	1	1.33	0.20	1.0	1.20	1.30	1.50	1.8	▅▇▃▆▁
Petal.Width	virginica	1	2.03	0.27	1.4	1.80	2.00	2.30	2.5	▂▇▆▅▇

定制参数

尽管skimr提供了自己的默认值，但它是高度可定制的。用户可以指定自己的统计数据，改变结果的格式，为新类创建统计数据等。

大家可以去skimr的github^[2]看看哦，不过也没有多少东西了，最主要的就是这个skim函数。

参考资料

[1]

ropensci官网: https://ropensci.org/

[2]

skimr github: https://github.com/ropensci/skimr

以上就是今天的内容，希望对你有帮助哦！欢迎点赞、在看、关注、转发！

欢迎在评论区留言或直接添加我的微信！

欢迎关注公众号：医学和生信笔记

“
医学和生信笔记 公众号主要分享：1.医学小知识、肛肠科小知识；2.R语言和Python相关的数据分析、可视化、机器学习等；3.生物信息学学习资料和自己的学习笔记！

往期回顾

让你的图片中文不再乱码！

2022-04-16

用更简单的方式画森林图

2022-04-17

让ggplot2变成Graphpad Prism样式：ggprism（01）

2022-02-23

让ggplot2变成Graphpad Prism样式：ggprism（02）

2022-02-24

让ggplot2变成Graphpad Prism样式：ggprism（03）

2022-02-25

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突认定该生系高空坠落死亡

桐城一派｜倒在“跨年夜”的龚书记，13个字换来免职调查冤不冤？

市管干部“龚书记”免职迷局

讣告！又一知名女星在家中去世，终年54岁，曾是无数人白月光…

近视的孩子有救了！国内最新近视防控矫正技术，不手术，扫码进群即可了解！

使用skimr对数据进行描述性分析

安装

使用

定制参数

参考资料

您可能也对以下帖子感兴趣

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突 认定该生系高空坠落死亡

桐城一派｜倒在“跨年夜”的龚书记，13个字换来免职调查冤不冤？

市管干部“龚书记”免职迷局

讣告！又一知名女星在家中去世，终年54岁，曾是无数人白月光…

近视的孩子有救了！国内最新近视防控矫正技术，不手术，扫码进群即可了解！

生成图片，分享到微信朋友圈

使用skimr对数据进行描述性分析

安装

使用

定制参数

参考资料

您可能也对以下帖子感兴趣

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突认定该生系高空坠落死亡