R可视化17|ggstatsplot几行code终结SCI级图表统计+画图 (上)
"pythonic生物人"的第116篇分享
「ggstatsplot最大亮点」:「轻松的在图形中添加统计检验」【多种检验可一键切换,展示数据子集间P value及其他统计指标】 「ggstatsplot想要解决的中心问题」:将数据分析工作流中的「数据可视化」和「统计建模」两个阶段结合在一起,使数据挖掘变得简单和快速。 「ggstatsplot的缺陷」:支持的图形有限。
「本文目录」
ggstatsplot目前支持的图形
ggstatsplot目前支持的统计检验
统计检验类型
统计结果展示
ggstatsplot主要函数
ggbetweenstats
grouped_ggbetweenstats
ggwithinstats
grouped_ggwithinstats
ggscatterstats
grouped_ggscatterstats
ggpiestats
grouped_ggpiestats
ggbarstats
grouped_ggbarstats
gghistostats
grouped_gghistostats
ggcorrmat
grouped_ggcorrmat
ggcoefstats
ggstatsplot目前支持的图形
ggstatsplot目前支持的统计检验
统计检验类型
目前,它只支持最常见的统计检验类型:「parametric」, 「nonparametric」, 「robust」, and 「bayesian」 versions of 「t-test」/「anova」, 「correlation」 analyses, 「contingency table」 analysis, 「meta-analysis」, and 「regression」 analyses.:
统计结果展示
这是最大的亮点,ggstatsplot这里的统计结果展示参考了APA 格式(APA 格式详情参考https://my.ilstu.edu/~jhkahn/apastats.html)。
ggstatsplot主要函数
ggbetweenstats
创建violin plot, box plot或者二者混合图,ggbetweenstats具有众多参数可自行设置,如下为默认参数。
> ggstatsplot::ggbetweenstats#ggbetweenstats用法
function (data, x, y, plot.type = "boxviolin",
type = "parametric",
# Type of statistic expected ("parametric" or "nonparametric" or "robust" or
"bayes").Corresponding abbreviations are also accepted: "p" (for parametric),
"np" (nonparametric), "r" (robust), or "bf"resp.
pairwise.comparisons = TRUE, pairwise.display = "significant",
p.adjust.method = "holm",
effsize.type = "unbiased",
bf.prior = 0.707, # the prior width to use in calculating Bayes factors
bf.message = TRUE, results.subtitle = TRUE,
xlab = NULL, ylab = NULL, caption = NULL, title = NULL, subtitle = NULL,
sample.size.label = TRUE, k = 2L, var.equal = FALSE, conf.level = 0.95,
nboot = 100L, tr = 0.1, mean.plotting = TRUE, mean.ci = FALSE,
mean.point.args = list(size = 5, color = "darkred"),
mean.label.args = list(size = 3), notch = FALSE, notchwidth = 0.5,
outlier.tagging = FALSE, outlier.label = NULL, outlier.coef = 1.5,
outlier.shape = 19, outlier.color = "black", outlier.label.args = list(size = 3),
outlier.point.args = list(), point.args = list(position = ggplot2::position_jitterdodge(dodge.width = 0.6),
alpha = 0.4, size = 3, stroke = 0), violin.args = list(width = 0.5,
alpha = 0.2), ggsignif.args = list(textsize = 3, tip_length = 0.01),
ggtheme = ggplot2::theme_bw(), ggstatsplot.layer = TRUE,
package = "RColorBrewer", palette = "Dark2",
ggplot.component = NULL, output = "plot", ...)
{。。。。。。。。。。。。。。。。。。。。
}
一个简单例子,全部默认参数,使用iris数据集,比较不同鸢尾花萼片长度差异,数据集详细介绍见:Python可视化|matplotlib10-绘制散点图scatter
library(ggstatsplot)
set.seed(123)
ggstatsplot::ggbetweenstats(
data = iris,
x = Species,
y = Sepal.Length,
title = "Distribution of sepal length across Iris species"
)
library(ggstatsplot)
library(ggplot2)
library(ggthemes)
options(repr.plot.width = 4.5, repr.plot.height = 5, repr.plot.res = 300)
#男性
set.seed(123)
ggstatsplot::ggbetweenstats(
data = iris,
x = Species,
y = Sepal.Length,
title = "Distribution of sepal length across Iris species",
ggtheme = ggthemes::theme_economist(),#ggthemes经济学人主题
package = "wesanderson", #修改图形配色包View(paletteer::palettes_d_names)可查看所有可使用的包及对应色盘
palette = "Darjeeling1" # 选择颜色盘
)
grouped_ggbetweenstats
ggstatsplot还有一个重要函数grouped_ggbetweenstats,可以很方便的展示数据集子集的分布差异,一些参数。
function (data, x, y, grouping.var, outlier.label = NULL, title.prefix = NULL,
output = "plot", ..., plotgrid.args = list(), title.text = NULL,
title.args = list(size = 16, fontface = "bold"), caption.text = NULL,
caption.args = list(size = 10), sub.text = NULL, sub.args = list(size = 12))
{。。。。。。。。。。
}
简单使用下
options(repr.plot.width = 5.5, repr.plot.height = 13, repr.plot.res = 300)
set.seed(123)
ggstatsplot::grouped_ggbetweenstats(
data = dplyr::filter(
.data = ggstatsplot::movies_long,
genre %in% c("Action", "Action Comedy")
),#数据过滤
x = mpaa,
y = length,
grouping.var = genre, #分组设置
ggsignif.args = list(textsize = 4, tip_length = 0.01),#p值属性设置
#p.adjust.method = "bonferroni",
ggplot.component = list(ggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis())),
k = 3,
title.prefix = "电影类别",
caption = substitute(paste(italic("Source"), ": IMDb (Internet Movie Database)")),
ggtheme = ggthemes::theme_economist(),#使用ggthemes经济学人主题
package = "wesanderson", #修改图形配色包View(paletteer::palettes_d_names)可查看所有可使用的
palette = "Darjeeling1", # 选择颜色盘
plotgrid.args = list(nrow = 2),
title.text = "不同类别电影中不同级别电影时长差异"
)
ggwithinstats
类似于ggbetweenstats,不过他可以把各个箱子「牵起来****,「下图会把均值牵起来」。」
options(repr.plot.width = 6.5, repr.plot.height = 6, repr.plot.res = 300)
Sys.setlocale('LC_ALL','C')
set.seed(123)
library(WRS2)
# plot
ggstatsplot::ggwithinstats(
data = WineTasting,
x = Wine,
y = Taste,
title = "Wine tasting",
caption = "Data source: `WRS2` R package",
ggtheme = ggthemes::theme_fivethirtyeight(),
ggstatsplot.layer = FALSE,
messages = FALSE,
ggsignif.args = list(textsize = 3, tip_length = 0.01),
)
grouped_ggwithinstats
options(repr.plot.width = 6.5, repr.plot.height = 6, repr.plot.res = 300)
set.seed(123)
# plot
ggstatsplot::grouped_ggwithinstats(
data = dplyr::filter(
.data = ggstatsplot::bugs_long,
region %in% c("Europe", "North America"),
condition %in% c("LDLF", "LDHF")
),
x = condition,
y = desire,
xlab = "Condition",
ylab = "Desire to kill an artrhopod",
grouping.var = region,
outlier.tagging = TRUE,
outlier.label = education,
ggtheme = hrbrthemes::theme_ipsum_tw(),
ggstatsplot.layer = FALSE,
messages = FALSE
)
ggscatterstats
R中的方法绘制边际分布图,python版本的边际分布图见:Python可视化24|seaborn绘制多变量分布图(jointplot|JointGrid)
ggstatsplot::ggscatterstats(
data = ggplot2::msleep,
x = sleep_rem,
y = awake,
xlab = "REM sleep (in hours)",
ylab = "Amount of time spent awake (in hours)",
title = "Understanding mammalian sleep",
messages = FALSE
)
# for reproducibility
set.seed(123)
# plot
ggstatsplot::ggscatterstats(
data = dplyr::filter(.data = ggstatsplot::movies_long, genre == "Action"),
x = budget,
y = rating,
type = "robust", # type of test that needs to be run
xlab = "Movie budget (in million/ US$)", # label for x axis
ylab = "IMDB rating", # label for y axis
label.var = "title", # variable for labeling data points
label.expression = "rating < 5 & budget > 100", # expression that decides which points to label
title = "Movie budget and IMDB rating (action)", # title text for the plot
caption = expression(paste(italic("Note"), ": IMDB stands for Internet Movie DataBase")),
ggtheme = hrbrthemes::theme_ipsum_ps(), # choosing a different theme
ggstatsplot.layer = FALSE, # turn off `ggstatsplot` theme layer
marginal.type = "densigram", # type of marginal distribution to be displayed
xfill = "pink", # color fill for x-axis marginal distribution
yfill = "#009E73", # color fill for y-axis marginal distribution
centrality.parameter = "median", # central tendency lines to be displayed
messages = FALSE # turn off messages and notes
)
grouped_ggscatterstats
同样也有grouped_函数
library(ggstatsplot)
# for reproducibility
set.seed(123)
options(repr.plot.width = 13.5, repr.plot.height = 10,repr.plot.res = 400)
ggstatsplot::grouped_ggscatterstats(
data = dplyr::filter(
.data = ggstatsplot::movies_long,
genre %in% c("Action", "Action Comedy", "Action Drama", "Comedy")
),
x = rating,
y = length,
grouping.var = genre, # grouping variable
label.var = title,
label.expression = length > 200,
xfill = "#E69F00",
yfill = "#8b3058",
xlab = "IMDB rating",
title.prefix = "Movie genre",
ggtheme = ggplot2::theme_grey(),
ggplot.component = list(
ggplot2::scale_x_continuous(breaks = seq(2, 9, 1), limits = (c(2, 9)))
),
plotgrid.args = list(nrow = 2),
title.text = "Relationship between movie length by IMDB ratings for different genres"
)
ggpiestats
绘制饼图,计算各个快之间是否有差异。
ggthemr_reset()
set.seed(123)
# to speed up the process, let's use only half of the dataset
Titanic_full_50 <- dplyr::sample_frac(tbl = ggstatsplot::Titanic_full, size = 0.5)
# plot
ggstatsplot::ggpiestats(
data = Titanic_full_50,
x = Survived,
title = "Passenger survival on the Titanic", # title for the entire plot
caption = "Source: Titanic survival dataset", # caption for the entire plot
legend.title = "Survived?",
package = "ggthemr",
palette = "dust",
)
set.seed(123)
# to speed up the process, let's use only half of the dataset
Titanic_full_50 <- dplyr::sample_frac(tbl = ggstatsplot::Titanic_full, size = 0.5)
# plot
ggstatsplot::ggpiestats(
data = Titanic_full_50,
x = Survived,
y = Sex,
title = "Passenger survival on the Titanic by gender", # title for the entire plot
caption = "Source: Titanic survival dataset", # caption for the entire plot
legend.title = "Survived?", # legend title
ggtheme = ggplot2::theme_grey(), # changing plot theme
package = "ggthemr",
palette = "dust",
k = 3, # decimal places in result
perc.k = 1 # decimal places in percentage labels
) + # further modification with `ggplot2` commands
ggplot2::theme(
plot.title = ggplot2::element_text(
color = "black",
size = 14,
hjust = 0
)
)
本文结束,更多好文:
有意见请移步到QQ群629562529反馈,一起进步哈!