R可视化17|ggstatsplot几行code终结SCI级图表统计+画图 (上)

Original pythonic生物人 pythonic生物人 2023-07-26

收录于合集 #R数据可视化 88个

"pythonic生物人"的第116篇分享。

「ggstatsplot最大亮点」：「轻松的在图形中添加统计检验」【多种检验可一键切换，展示数据子集间P value及其他统计指标】
「ggstatsplot想要解决的中心问题」：将数据分析工作流中的「数据可视化」和「统计建模」两个阶段结合在一起，使数据挖掘变得简单和快速。
「ggstatsplot的缺陷」：支持的图形有限。

「本文目录」

ggstatsplot目前支持的图形
ggstatsplot目前支持的统计检验
    统计检验类型
    统计结果展示
ggstatsplot主要函数
    ggbetweenstats
    grouped_ggbetweenstats
    ggwithinstats
    grouped_ggwithinstats
    ggscatterstats
    grouped_ggscatterstats
    ggpiestats
    grouped_ggpiestats 
    ggbarstats
    grouped_ggbarstats 
    gghistostats
    grouped_gghistostats 
    ggcorrmat
    grouped_ggcorrmat 
    ggcoefstats

ggstatsplot目前支持的图形

ggstatsplot目前支持的统计检验

统计检验类型

目前，它只支持最常见的统计检验类型：「parametric」, 「nonparametric」, 「robust」, and 「bayesian」 versions of 「t-test」/「anova」, 「correlation」 analyses, 「contingency table」 analysis, 「meta-analysis」, and 「regression」 analyses.：Bayesian analysis相关检验以上详细介绍可见：https://indrajeetpatil.github.io/statsExpressions/articles/stats_details.html每种图中，「type、pairwise.comparisons、pairwise.display、effsize.type、p.adjust.method、effsize.type、mean.ci、mean.point.args」等参数控制统计建模过程，参数详细可参考：https://cran.r-project.org/web/packages/ggstatsplot/index.html，以下截取部分。

统计结果展示

这是最大的亮点，ggstatsplot这里的统计结果展示参考了APA 格式（APA 格式详情参考https://my.ilstu.edu/~jhkahn/apastats.html）。

ggstatsplot主要函数

ggbetweenstats

创建violin plot, box plot或者二者混合图，ggbetweenstats具有众多参数可自行设置，如下为默认参数。

> ggstatsplot::ggbetweenstats#ggbetweenstats用法
function (data, x, y, plot.type = "boxviolin", 
    type = "parametric", 
# Type of statistic expected ("parametric" or "nonparametric" or "robust" or
"bayes").Corresponding abbreviations are also accepted: "p" (for parametric),
"np" (nonparametric), "r" (robust), or "bf"resp.

    pairwise.comparisons = TRUE, pairwise.display = "significant", 
    p.adjust.method = "holm", 
    effsize.type = "unbiased", 
    bf.prior = 0.707, # the prior width to use in calculating Bayes factors
    bf.message = TRUE, results.subtitle = TRUE, 
    xlab = NULL, ylab = NULL, caption = NULL, title = NULL, subtitle = NULL, 
    sample.size.label = TRUE, k = 2L, var.equal = FALSE, conf.level = 0.95, 
    nboot = 100L, tr = 0.1, mean.plotting = TRUE, mean.ci = FALSE, 
    mean.point.args = list(size = 5, color = "darkred"), 
    mean.label.args = list(size = 3), notch = FALSE, notchwidth = 0.5, 
    outlier.tagging = FALSE, outlier.label = NULL, outlier.coef = 1.5, 
    outlier.shape = 19, outlier.color = "black", outlier.label.args = list(size = 3), 
    outlier.point.args = list(), point.args = list(position = ggplot2::position_jitterdodge(dodge.width = 0.6), 
        alpha = 0.4, size = 3, stroke = 0), violin.args = list(width = 0.5, 
        alpha = 0.2), ggsignif.args = list(textsize = 3, tip_length = 0.01), 
    ggtheme = ggplot2::theme_bw(), ggstatsplot.layer = TRUE, 
    package = "RColorBrewer", palette = "Dark2", 
    ggplot.component = NULL, output = "plot", ...) 
{。。。。。。。。。。。。。。。。。。。。
}

一个简单例子，全部默认参数，使用iris数据集，比较不同鸢尾花萼片长度差异，数据集详细介绍见：Python可视化|matplotlib10-绘制散点图scatter

library(ggstatsplot)
set.seed(123)
ggstatsplot::ggbetweenstats(
  data = iris,
  x = Species,
  y = Sepal.Length,
  title = "Distribution of sepal length across Iris species"
)

「修改配色、主题」

library(ggstatsplot)
library(ggplot2)
library(ggthemes)
options(repr.plot.width = 4.5, repr.plot.height = 5, repr.plot.res = 300)
#男性
set.seed(123)
ggstatsplot::ggbetweenstats(
  data = iris,
  x = Species,
  y = Sepal.Length,
  title = "Distribution of sepal length across Iris species",
  ggtheme = ggthemes::theme_economist(),#ggthemes经济学人主题
  package = "wesanderson", #修改图形配色包View(paletteer::palettes_d_names)可查看所有可使用的包及对应色盘
  palette = "Darjeeling1" # 选择颜色盘
)

package和palette参数可供选择的特别多，1000+种，前面介绍的ggthemr、gsci都包含在内，颜色控可自己去玩玩。

grouped_ggbetweenstats

ggstatsplot还有一个重要函数grouped_ggbetweenstats，可以很方便的展示数据集子集的分布差异，一些参数。

function (data, x, y, grouping.var, outlier.label = NULL, title.prefix = NULL, 
    output = "plot", ..., plotgrid.args = list(), title.text = NULL, 
    title.args = list(size = 16, fontface = "bold"), caption.text = NULL, 
    caption.args = list(size = 10), sub.text = NULL, sub.args = list(size = 12)) 
{。。。。。。。。。。
}

简单使用下

options(repr.plot.width = 5.5, repr.plot.height = 13, repr.plot.res = 300)
set.seed(123)

ggstatsplot::grouped_ggbetweenstats(
  data = dplyr::filter(
    .data = ggstatsplot::movies_long,
    genre %in% c("Action", "Action Comedy")
  ),#数据过滤
  x = mpaa,
  y = length,
  grouping.var = genre, #分组设置
  ggsignif.args = list(textsize = 4, tip_length = 0.01),#p值属性设置
  #p.adjust.method = "bonferroni", 
  ggplot.component = list(ggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis())),
  k = 3,
  title.prefix = "电影类别",
  caption = substitute(paste(italic("Source"), ": IMDb (Internet Movie Database)")), 
  ggtheme = ggthemes::theme_economist(),#使用ggthemes经济学人主题
  package = "wesanderson", #修改图形配色包View(paletteer::palettes_d_names)可查看所有可使用的
  palette = "Darjeeling1", # 选择颜色盘
  plotgrid.args = list(nrow = 2),
  title.text = "不同类别电影中不同级别电影时长差异"
)

ggwithinstats

类似于ggbetweenstats，不过他可以把各个箱子「牵起来****，「下图会把均值牵起来」。」

options(repr.plot.width = 6.5, repr.plot.height = 6, repr.plot.res = 300)
Sys.setlocale('LC_ALL','C')
set.seed(123)
library(WRS2)

# plot
ggstatsplot::ggwithinstats(
  data = WineTasting,
  x = Wine,
  y = Taste,
  title = "Wine tasting",
  caption = "Data source: `WRS2` R package",
  ggtheme = ggthemes::theme_fivethirtyeight(),
  ggstatsplot.layer = FALSE,
  messages = FALSE,
  ggsignif.args = list(textsize = 3, tip_length = 0.01),
)

grouped_ggwithinstats

options(repr.plot.width = 6.5, repr.plot.height = 6, repr.plot.res = 300)
set.seed(123)

# plot
ggstatsplot::grouped_ggwithinstats(
  data = dplyr::filter(
    .data = ggstatsplot::bugs_long,
    region %in% c("Europe", "North America"),
    condition %in% c("LDLF", "LDHF")
  ),
  x = condition,
  y = desire,
  xlab = "Condition",
  ylab = "Desire to kill an artrhopod",
  grouping.var = region,
  outlier.tagging = TRUE,
  outlier.label = education,
  ggtheme = hrbrthemes::theme_ipsum_tw(),
  ggstatsplot.layer = FALSE,
  messages = FALSE
)

ggscatterstats

R中的方法绘制边际分布图，python版本的边际分布图见：Python可视化24|seaborn绘制多变量分布图（jointplot|JointGrid）

ggstatsplot::ggscatterstats(
  data = ggplot2::msleep,
  x = sleep_rem,
  y = awake,
  xlab = "REM sleep (in hours)",
  ylab = "Amount of time spent awake (in hours)",
  title = "Understanding mammalian sleep",
  messages = FALSE
)

边际上的图有以下「5种」图可修改，修改参数 marginal.type即可。

# for reproducibility
set.seed(123)

# plot
ggstatsplot::ggscatterstats(
  data = dplyr::filter(.data = ggstatsplot::movies_long, genre == "Action"),
  x = budget,
  y = rating,
  type = "robust", # type of test that needs to be run
  xlab = "Movie budget (in million/ US$)", # label for x axis
  ylab = "IMDB rating", # label for y axis
  label.var = "title", # variable for labeling data points
  label.expression = "rating < 5 & budget > 100", # expression that decides which points to label
  title = "Movie budget and IMDB rating (action)", # title text for the plot
  caption = expression(paste(italic("Note"), ": IMDB stands for Internet Movie DataBase")),
  ggtheme = hrbrthemes::theme_ipsum_ps(), # choosing a different theme
  ggstatsplot.layer = FALSE, # turn off `ggstatsplot` theme layer
  marginal.type = "densigram", # type of marginal distribution to be displayed
  xfill = "pink", # color fill for x-axis marginal distribution
  yfill = "#009E73", # color fill for y-axis marginal distribution
  centrality.parameter = "median", # central tendency lines to be displayed
  messages = FALSE # turn off messages and notes
)

grouped_ggscatterstats

同样也有grouped_函数

library(ggstatsplot)
# for reproducibility
set.seed(123)
options(repr.plot.width = 13.5, repr.plot.height = 10,repr.plot.res = 400)
ggstatsplot::grouped_ggscatterstats(
  data = dplyr::filter(
    .data = ggstatsplot::movies_long,
    genre %in% c("Action", "Action Comedy", "Action Drama", "Comedy")
  ),
  x = rating,
  y = length,
  grouping.var = genre, # grouping variable
  label.var = title,
  label.expression = length > 200,
  xfill = "#E69F00",
  yfill = "#8b3058",
  xlab = "IMDB rating",
  title.prefix = "Movie genre",
  ggtheme = ggplot2::theme_grey(),
  ggplot.component = list(
    ggplot2::scale_x_continuous(breaks = seq(2, 9, 1), limits = (c(2, 9)))
  ),
  plotgrid.args = list(nrow = 2),
  title.text = "Relationship between movie length by IMDB ratings for different genres"
)

ggpiestats

绘制饼图，计算各个快之间是否有差异。

ggthemr_reset()
set.seed(123)

# to speed up the process, let's use only half of the dataset
Titanic_full_50 <- dplyr::sample_frac(tbl = ggstatsplot::Titanic_full, size = 0.5)

# plot
ggstatsplot::ggpiestats(
  data = Titanic_full_50,
  x = Survived,
  title = "Passenger survival on the Titanic", # title for the entire plot
  caption = "Source: Titanic survival dataset", # caption for the entire plot
  legend.title = "Survived?",
  package = "ggthemr",
  palette = "dust",
)

数据集分组，组间及组内计算统计指标。

set.seed(123)

# to speed up the process, let's use only half of the dataset
Titanic_full_50 <- dplyr::sample_frac(tbl = ggstatsplot::Titanic_full, size = 0.5)

# plot
ggstatsplot::ggpiestats(
  data = Titanic_full_50,
  x = Survived,
  y = Sex,
  title = "Passenger survival on the Titanic by gender", # title for the entire plot
  caption = "Source: Titanic survival dataset", # caption for the entire plot
  legend.title = "Survived?", # legend title
  ggtheme = ggplot2::theme_grey(), # changing plot theme
  package = "ggthemr",
  palette = "dust",
  k = 3, # decimal places in result
  perc.k = 1 # decimal places in percentage labels
) + # further modification with `ggplot2` commands
  ggplot2::theme(
    plot.title = ggplot2::element_text(
      color = "black",
      size = 14,
      hjust = 0
    )
  )

本文结束，更多好文：

R可视化16|《ggplot2：数据分析与图形艺术》最新版（第三版）笔记合集
Python可视化|Matplotlib39-Matplotlib 1.4W+字教程（珍藏版）
Python可视化|Matplotlib&Seaborn36（完结篇）
python3基础12详解模块和包（库）|构建|使用
Perl基础系列合集
NGS各种组学建库原理（图解）

有用请“点赞”“在看”“分享”

有意见请移步到QQ群629562529反馈，一起进步哈！

法明传[2024]173号：1月1日起，未用示范文本提交起诉状，部分法院将不予立案

法明传[2024]173号：1月1日起，未用示范文本提交起诉状，部分法院将不予立案

2025.1.1起，全国法院全面推进应用民事起诉状、答辩状示范文本(附下载链接)

法明传[2024]173号：关于加快推进起诉状、答辩状示范文本全面应用工作的通知(附下载链接)

2025.1.1起，全国法院全面推进应用民事起诉状、答辩状示范文本(附下载链接)

R可视化17|ggstatsplot几行code终结SCI级图表统计+画图 (上)

「本文目录」

ggstatsplot目前支持的图形

ggstatsplot目前支持的统计检验

统计检验类型

统计结果展示

ggstatsplot主要函数

ggbetweenstats

grouped_ggbetweenstats

ggwithinstats

grouped_ggwithinstats

ggscatterstats

grouped_ggscatterstats

ggpiestats

您可能也对以下帖子感兴趣

法明传[2024]173号：1月1日起，未用示范文本提交起诉状，部分法院将不予立案

法明传[2024]173号：1月1日起，未用示范文本提交起诉状，部分法院将不予立案

2025.1.1起，全国法院全面推进应用民事起诉状、答辩状示范文本(附下载链接)

法明传[2024]173号：关于加快推进起诉状、答辩状示范文本全面应用工作的通知(附下载链接)

2025.1.1起，全国法院全面推进应用民事起诉状、答辩状示范文本(附下载链接)

生成图片，分享到微信朋友圈

R可视化17|ggstatsplot几行code终结SCI级图表统计+画图 (上)

「本文目录」

ggstatsplot目前支持的图形

ggstatsplot目前支持的统计检验

统计检验类型

统计结果展示

ggstatsplot主要函数

ggbetweenstats

grouped_ggbetweenstats

ggwithinstats

grouped_ggwithinstats

ggscatterstats

grouped_ggscatterstats

ggpiestats

您可能也对以下帖子感兴趣