其他
R语言tidy风格医学统计学
rstatix
提供一个简单直观的管道友好的框架,与整洁的设计理念一致,用于执行基本的统计检验,包括t检验,Wilcoxon检验,方差分析,Kruskal-Wallis和相关性分析。每个分析的输出会自动转换成一个整洁的数据框架,以方便可视化。
附加功能可用于重塑,重新排序,操作和可视化相关矩阵。功能还包括析因实验的分析,包括重复测量设计、析因设计、正交设计等。
可以计算几个效应大小指标,包括方差分析eta平方,t检验的Cohen's d和分类变量之间的关联的Cramer's v。该软件包包含用于识别单变量和多变量异常值、评估正态性和方差齐性的辅助函数。
主要函数
描述性统计
get_summary_stat()
:计算描述性的统计指标;freq_table()
: 分类变量的频率表;get_mode()
: 众数;identify_outliers()
: 使用boxplot鉴别离群值;mahalanobis_distance()
: 计算Mahalanobi距离和离群点;shapiro_test()
andmshapiro_test()
: 正态性检验.
比较均值
t_test()
: 单样本、配对样本、独立样本t检验;wilcox_test()
: 单样本、配对样本、独立样本秩和检验;sign_test()
: 符号检验;anova_test()
: 基于car::Anova()
改写,可以做:独立测量、重复测量、混合anova;get_anova_test_table()
: 从anova_test()
提取结果,可自动执行球形检验.;welch_anova_test()
: Welch one-Way ANOVA test. 基于stats::oneway.test()
改写;kruskal_test()
: kruskal-wallis rank sum test;friedman_test()
: Friedman rank sum test;get_comparisons()
: 创建需要比较的组;get_pvalue_position
: 使用ggplot2
添加p值时可自动计算添加坐标
增强R中的ANOVA
factorial_design()
: 建立因子化的设计,方便使用car::Anova()
进行分析,对于重复测量Anova非常有帮助;anova_summary()
: 提取美观的Anova检验的结果,包括从car:Anova()
或者stats:aov()
中,主要结果包含Anova结果表、一般效应量、和一些假设检验,比如球形检验。
事后检验(post-hoc)
tukey_hsd()
: tukey post-hoc tests;dunn_test()
: 计算Kruskal-Wallis的成对比较;games_howell_test()
: Games-Howell test;emmeans_test()
: estimated marginal means
比较比例
prop_test()
,pairwise_prop_test()
androw_wise_prop_test()
. Performs one-sample and two-samples z-test of proportions. Wrappers around the R base function prop.test() but have the advantage of performing pairwise and row-wise z-test of two proportions, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.fisher_test()
,pairwise_fisher_test()
androw_wise_fisher_test()
: Fisher's exact test for count data. Wrappers around the R base function fisher.test() but have the advantage of performing pairwise and row-wise fisher tests, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.chisq_test()
,pairwise_chisq_gof_test()
,pairwise_chisq_test_against_p()
: Performs chi-squared tests, including goodness-of-fit, homogeneity and independence tests.binom_test()
,pairwise_binom_test()
,pairwise_binom_test_against_p()
: Performs exact binomial test and pairwise comparisons following a significant exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample.multinom_test()
: performs an exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample size is small.mcnemar_test()
: performs McNemar chi-squared test to compare paired proportions. Provides pairwise comparisons between multiple groups.cochran_qtest()
: extension of the McNemar Chi-squared test for comparing more than two paired proportions.prop_trend_test()
: Performs chi-squared test for trend in proportion. This test is also known as Cochran-Armitage trend test
比较方差
levene_test()
: Pipe-friendly framework to easily compute Levene's test for homogeneity of variance across groups.box_m()
: Box's M-test for homogeneity of covariance matrices
计算效应量
cohens_d()
: Compute cohen's d measure of effect size for t-tests.wilcox_effsize()
: Compute Wilcoxon effect size (r).eta_squared()
andpartial_eta_squared()
: Compute effect size for ANOVA.kruskal_effsize()
: Compute the effect size for Kruskal-Wallis test as the eta squared based on the H-statistic.friedman_effsize()
: Compute the effect size of Friedman test using the Kendall's W value.cramer_v()
: Compute Cramer's V, which measures the strength of the association between categorical variables
相关性分析
计算相关性
cor_test()
: correlation test between two or more variables using Pearson, Spearman or Kendall methods.cor_mat()
: compute correlation matrix with p-values. Returns a data frame containing the matrix of the correlation coefficients. The output has an attribute named "pvalue", which contains the matrix of the correlation test p-values.cor_get_pval()
: extract a correlation matrix p-values from an object of class cor_mat().cor_pmat()
: compute the correlation matrix, but returns only the p-values of the correlation tests.as_cor_mat()
: convert a cor_test object into a correlation matrix format.
重塑相关矩阵
cor_reorder()
: reorder correlation matrix, according to the coefficients, using the hierarchical clustering method.cor_gather()
: takes a correlation matrix and collapses (or melt) it into long format data frame (paired list)cor_spread()
: spread a long correlation data frame into wide format (correlation matrix).
相关矩阵取子集
cor_select()
: subset a correlation matrix by selecting variables of interest.pull_triangle()
,pull_upper_triangle()
,pull_lower_triangle()
: pull upper and lower triangular parts of a (correlation) matrix.replace_triangle()
,replace_upper_triangle()
,replace_lower_triangle()
: replace upper and lower triangular parts of a (correlation) matrix.
可视化相关矩阵
cor_as_symbols()
: replaces the correlation coefficients, in a matrix, by symbols according to the value.cor_plot()
: visualize correlation matrix using base plot.cor_mark_significant()
: add significance levels to a correlation matrix
添加P值和显著性标记
adjust_pvalue()
: add an adjusted p-values column to a data frame containing statistical test p-valuesadd_significance()
: add a column containing the p-value significance levelp_round()
,p_format()
,p_mark_significant()
: rounding and formatting p-values
提取统计信息
get_pwc_label()
: Extract label from pairwise comparisons.get_test_label()
: Extract label from statistical tests.create_test_label()
: Create labels from user specified test results
数据处理辅助函数
df_select()
,df_arrange()
,df_group_by()
: wrappers arround dplyr functions for supporting standard and non standard evaluations.df_nest_by()
: Nest a tibble data frame using grouping specification. Supports standard and non standard evaluations.df_split_by()
: Split a data frame by groups into subsets or data panel. Very similar to the functiondf_nest_by()
. The only difference is that, it adds labels to each data subset. Labels are the combination of the grouping variable levels.df_unite()
: Unite multiple columns into one.df_unite_factors()
: Unite factor columns. First, order factors levels then merge them into one column. The output column is a factor.df_label_both()
,df_label_value()
: functions to label data frames rows by by one or multiple grouping variables.df_get_var_names()
: Returns user specified variable names. Supports standard and non standard evaluation
其他
doo()
: alternative to dplyr::do for doing anything. Technically it usesnest()
+mutate()
+map()
to apply arbitrary computation to a grouped data frame.sample_n_by()
: sample n rows by group from a tableconvert_as_factor()
,set_ref_level()
,reorder_levels()
: Provides pipe-friendly functions to convert simultaneously multiple variables into a factor variable.make_clean_names()
: Pipe-friendly function to make syntactically valid column names (for input data frame) or names (for input vector).counts_to_cases()
: converts a contingency table or a data frame of counts into a data frame of individual observations
以上就是今天的内容,希望对你有帮助哦!欢迎点赞、在看、关注、转发!
欢迎在评论区留言或直接添加我的微信!
欢迎关注我的公众号:医学和生信笔记
“医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!
往期精彩内容:
使用tinyarray简化你的TCGA分析流程!
使用tinyarray包简化你的GEO分析流程!
R语言缺失值插补之simputation包
R语言缺失值探索的强大R包:naniar