查看原文
其他

R语言tidy风格医学统计学

阿越就是我 医学和生信笔记 2023-02-25

rstatix提供一个简单直观的管道友好的框架,与整洁的设计理念一致,用于执行基本的统计检验,包括t检验,Wilcoxon检验,方差分析,Kruskal-Wallis和相关性分析。每个分析的输出会自动转换成一个整洁的数据框架,以方便可视化。

附加功能可用于重塑,重新排序,操作和可视化相关矩阵。功能还包括析因实验的分析,包括重复测量设计、析因设计、正交设计等。

可以计算几个效应大小指标,包括方差分析eta平方,t检验的Cohen's d和分类变量之间的关联的Cramer's v。该软件包包含用于识别单变量和多变量异常值、评估正态性和方差齐性的辅助函数。

主要函数

描述性统计

  • get_summary_stat():计算描述性的统计指标;
  • freq_table(): 分类变量的频率表;
  • get_mode(): 众数;
  • identify_outliers(): 使用boxplot鉴别离群值;
  • mahalanobis_distance(): 计算Mahalanobi距离和离群点;
  • shapiro_test() and mshapiro_test(): 正态性检验.

比较均值

  • t_test(): 单样本、配对样本、独立样本t检验;
  • wilcox_test(): 单样本、配对样本、独立样本秩和检验;
  • sign_test(): 符号检验;
  • anova_test(): 基于car::Anova()改写,可以做:独立测量、重复测量、混合anova;
  • get_anova_test_table(): 从anova_test() 提取结果,可自动执行球形检验.;
  • welch_anova_test(): Welch one-Way ANOVA test. 基于stats::oneway.test()改写;
  • kruskal_test(): kruskal-wallis rank sum test;
  • friedman_test(): Friedman rank sum test;
  • get_comparisons(): 创建需要比较的组;
  • get_pvalue_position: 使用ggplot2添加p值时可自动计算添加坐标

增强R中的ANOVA

  • factorial_design(): 建立因子化的设计,方便使用car::Anova()进行分析,对于重复测量Anova非常有帮助;
  • anova_summary(): 提取美观的Anova检验的结果,包括从car:Anova()或者stats:aov()中,主要结果包含Anova结果表、一般效应量、和一些假设检验,比如球形检验。

事后检验(post-hoc)

  • tukey_hsd(): tukey post-hoc tests;
  • dunn_test(): 计算Kruskal-Wallis的成对比较;
  • games_howell_test(): Games-Howell test;
  • emmeans_test(): estimated marginal means

比较比例

  • prop_test(), pairwise_prop_test() and row_wise_prop_test(). Performs one-sample and two-samples z-test of proportions. Wrappers around the R base function prop.test() but have the advantage of performing pairwise and row-wise z-test of two proportions, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.
  • fisher_test(), pairwise_fisher_test() and row_wise_fisher_test(): Fisher's exact test for count data. Wrappers around the R base function fisher.test() but have the advantage of performing pairwise and row-wise fisher tests, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.
  • chisq_test(), pairwise_chisq_gof_test(), pairwise_chisq_test_against_p(): Performs chi-squared tests, including goodness-of-fit, homogeneity and independence tests.
  • binom_test(), pairwise_binom_test(), pairwise_binom_test_against_p(): Performs exact binomial test and pairwise comparisons following a significant exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample.
  • multinom_test(): performs an exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample size is small.
  • mcnemar_test(): performs McNemar chi-squared test to compare paired proportions. Provides pairwise comparisons between multiple groups.
  • cochran_qtest(): extension of the McNemar Chi-squared test for comparing more than two paired proportions.
  • prop_trend_test(): Performs chi-squared test for trend in proportion. This test is also known as Cochran-Armitage trend test

比较方差

  • levene_test(): Pipe-friendly framework to easily compute Levene's test for homogeneity of variance across groups.
  • box_m(): Box's M-test for homogeneity of covariance matrices

计算效应量

  • cohens_d(): Compute cohen's d measure of effect size for t-tests.
  • wilcox_effsize(): Compute Wilcoxon effect size (r).
  • eta_squared() and partial_eta_squared(): Compute effect size for ANOVA.
  • kruskal_effsize(): Compute the effect size for Kruskal-Wallis test as the eta squared based on the H-statistic.
  • friedman_effsize(): Compute the effect size of Friedman test using the Kendall's W value.
  • cramer_v(): Compute Cramer's V, which measures the strength of the association between categorical variables

相关性分析

计算相关性

  • cor_test(): correlation test between two or more variables using Pearson, Spearman or Kendall methods.
  • cor_mat(): compute correlation matrix with p-values. Returns a data frame containing the matrix of the correlation coefficients. The output has an attribute named "pvalue", which contains the matrix of the correlation test p-values.
  • cor_get_pval(): extract a correlation matrix p-values from an object of class cor_mat().
  • cor_pmat(): compute the correlation matrix, but returns only the p-values of the correlation tests.
  • as_cor_mat(): convert a cor_test object into a correlation matrix format.

重塑相关矩阵

  • cor_reorder(): reorder correlation matrix, according to the coefficients, using the hierarchical clustering method.
  • cor_gather(): takes a correlation matrix and collapses (or melt) it into long format data frame (paired list)
  • cor_spread(): spread a long correlation data frame into wide format (correlation matrix).

相关矩阵取子集

  • cor_select(): subset a correlation matrix by selecting variables of interest.
  • pull_triangle(), pull_upper_triangle(), pull_lower_triangle(): pull upper and lower triangular parts of a (correlation) matrix.
  • replace_triangle(), replace_upper_triangle(), replace_lower_triangle(): replace upper and lower triangular parts of a (correlation) matrix.

可视化相关矩阵

  • cor_as_symbols(): replaces the correlation coefficients, in a matrix, by symbols according to the value.
  • cor_plot(): visualize correlation matrix using base plot.
  • cor_mark_significant(): add significance levels to a correlation matrix

添加P值和显著性标记

  • adjust_pvalue(): add an adjusted p-values column to a data frame containing statistical test p-values
  • add_significance(): add a column containing the p-value significance level
  • p_round(), p_format(), p_mark_significant(): rounding and formatting p-values

提取统计信息

  • get_pwc_label(): Extract label from pairwise comparisons.
  • get_test_label(): Extract label from statistical tests.
  • create_test_label(): Create labels from user specified test results

数据处理辅助函数

  • df_select(), df_arrange(), df_group_by(): wrappers arround dplyr functions for supporting standard and non standard evaluations.
  • df_nest_by(): Nest a tibble data frame using grouping specification. Supports standard and non standard evaluations.
  • df_split_by(): Split a data frame by groups into subsets or data panel. Very similar to the function df_nest_by(). The only difference is that, it adds labels to each data subset. Labels are the combination of the grouping variable levels.
  • df_unite(): Unite multiple columns into one.
  • df_unite_factors(): Unite factor columns. First, order factors levels then merge them into one column. The output column is a factor.
  • df_label_both(), df_label_value(): functions to label data frames rows by by one or multiple grouping variables.
  • df_get_var_names(): Returns user specified variable names. Supports standard and non standard evaluation

其他

  • doo(): alternative to dplyr::do for doing anything. Technically it uses nest() + mutate() + map() to apply arbitrary computation to a grouped data frame.
  • sample_n_by(): sample n rows by group from a table
  • convert_as_factor(), set_ref_level(), reorder_levels(): Provides pipe-friendly functions to convert simultaneously multiple variables into a factor variable.
  • make_clean_names(): Pipe-friendly function to make syntactically valid column names (for input data frame) or names (for input vector).
  • counts_to_cases(): converts a contingency table or a data frame of counts into a data frame of individual observations



以上就是今天的内容,希望对你有帮助哦!欢迎点赞、在看、关注、转发

欢迎在评论区留言或直接添加我的微信!




欢迎关注我的公众号:医学和生信笔记

医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!


往期精彩内容:

使用tinyarray简化你的TCGA分析流程!


使用tinyarray包简化你的GEO分析流程!


R语言缺失值插补之simputation包


R语言缺失值探索的强大R包:naniar



您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存