鉴定差异翻译效率基因之 Riborex

Original JunJunLab 老俊俊的生信笔记 2022-08-15

1、介绍

今天介绍一个 Ribo-seq 中用于快速鉴定差异翻译效率基因的软件：Riborex。文章在 2017 年发表在 Bioinformatics 上：

标题：

Riborex: fast and flexible identification of differential translation from Ribo-seq data^[1]

Riborex 基于在已有的差异分析软件 DESeq2、edgeR、Voom 的框架下，根据负二项模型（negative binomial distribution）和广义线性模型（GLM）来寻找翻译效率差异的基因。作者对不同的翻译效率计算软件和 Riborex 里 3 种不同的方法鉴定的基因进行 overlap，都有大部分的重叠：

此外对不同数据集的 Ribo-seq 数据使用不同软件分析，相对于之前的几款软件，所耗费的时间也是最快的：

不同的数据集之间，Riborex 的准确率相对于其它软件也有着良好的效果，此外 Xtail 软件也表现不错：

总之，Riborex 是作为一个开源的 R 包实现的，它的准确性与现有的最精确的方法不相上下，但是速度要快得多。

2、安装

包的信息及软件存放在 githup 网址上：

smithlabcode/riborex：^[2] https://github.com/smithlabcode/riborex

作者在 3 个月前还更新了 README 文件，此外在 vignetts 文件夹里有个 manual 使用帮助文档，可以参考使用。

该软件依赖 DESeq2、edgeR 和 fdrtool 3 个包，提前装好，作者强烈建议使用 conda 安装：

$ conda install -c bioconda r-riborex

R 里安装：

# 1、安装包
BiocManager::install("DESeq2")
BiocManager::install("edgeR")
install.packages('fdrtool')
# 从githup安装
library(devtools)
options(unzip='internal')
devtools::install_github('smithlabcode/riborex')

# 2、加载R包
library(riborex)

3、使用

输入文件需要 RNA-seq 和 Ribo-seq 的定量 count 表达矩阵，可以使用 HT-seq 或 featureCount 等软件得到，接下来使用软件来鉴定差异翻译效率的基因：

# 加载测试数据
data(riborexdata)
# 获取rna-seq和ribo-seq count
RNACntTable <- rna
RiboCntTable <- ribo
# 查看数据
head(RNACntTable,3)
                   BN_336 BN_337 BN_338 BN_339
ENSRNOG00000000017      7     11      4      4
ENSRNOG00000000024   2467   2478   3258   2316
ENSRNOG00000000033    206    282    330    244
head(RiboCntTable,3)
                   BN_341 BN_342 BN_343 BN_344
ENSRNOG00000000017     15      5     10      2
ENSRNOG00000000024   5206   5921   2864   1985
ENSRNOG00000000033     30     30     23     13
# 分组
rnaCond <- c("control", "control", "treated", "treated")
riboCond <- c("control", "control", "treated", "treated")
# 差异分析，使用DESeq2
res.deseq2 <- riborex(RNACntTable, RiboCntTable, rnaCond, riboCond, engine = "DESeq2")
DESeq2 mode selected
combining design matrix
applying DESeq2 to modified design matrix
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors

# 查看结果
head(res.deseq2,3)
log2 fold change (MLE): EXTRA1 treated vs control
Wald test p-value: EXTRA1 treated vs control
DataFrame with 3 rows and 6 columns
                     baseMean log2FoldChange     lfcSE      stat    pvalue      padj
                    <numeric>      <numeric> <numeric> <numeric> <numeric> <numeric>
ENSRNOG00000000017    7.69493       1.533227  1.566415  0.978813  0.327672  0.736753
ENSRNOG00000000024 3544.23807      -0.102151  0.202054 -0.505564  0.613163  0.901926
ENSRNOG00000000033  111.43945       0.265252  0.503397  0.526924  0.598246  0.896407

检查 p 值的分布：

良好的 p 值分布图应该是在 0 侧有个最高的峰，往后面则是均匀分布，具体可参考：什么，你算出的 P-value 看上去像齐天大圣变的庙？：

# 检查p值分布
hist(res.deseq2$pvalue, main = 'DESeq2 unadjusted p-values', xlab='Unadjusted p-values',
     col = '#F4C7AB')

统计差异结果：

# 统计差异结果
summary(res.deseq2)
out of 13916 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 1107, 8%
LFC < 0 (down)     : 1217, 8.7%
outliers [1]       : 0, 0%
low counts [2]     : 540, 3.9%
(mean count < 6)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

保存差异结果：

# 保存结果
write.table(res.deseq2, "riborex_res_deseq2.txt", quote=FALSE,sep = '\t')

使用 edgeR 来差异分析：

# 使用edgeR来差异分析
res.edgeR <- riborex(RNACntTable, RiboCntTable, rnaCond, riboCond, "edgeR")
edgeR mode selected
combining design matrix
applying edgeR to modified design matrix

# 查看结果
head(res.edgeR$table,3)
                         logFC    logCPM         LR    PValue       FDR
ENSRNOG00000000017  1.36290149 -2.456968 1.80649827 0.1789289 0.4627345
ENSRNOG00000000024 -0.30127172  6.212404 2.13670654 0.1438103 0.4037250
ENSRNOG00000000033  0.07178854  1.313235 0.02631769 0.8711269 0.9636097

# 使用edgeRD来对RNA-seq和Ribo-seq分别评估离散度
res.edgeRD <- riborex(RNACntTable, RiboCntTable, rnaCond, riboCond, "edgeRD")

使用 limma 包的 Voom 来进行差异分析：

# 使用limma包的Voom来进行差异分析：
res.voom <- riborex(RNACntTable, RiboCntTable, rnaCond, riboCond, "Voom")
Voom mode selected
combining design matrix
applying Voom to modified design matrix

# 查看差异结果
head(res.voom,3)
                         logFC    AveExpr          t   P.Value adj.P.Val         B
ENSRNOG00000000017  1.39674189 -2.8437969  1.2847048 0.2268559 0.5243467 -4.928058
ENSRNOG00000000024 -0.30047073  6.0075571 -1.8737990 0.0893910 0.2991861 -5.682138
ENSRNOG00000000033  0.07661457  0.7070877  0.1687028 0.8692764 0.9540355 -6.320171

4、矫正 p 值

对于我们的 p 值分布如果不是那么理想，很有可能鉴定出来的差异基因数量会很少，作者还提供了矫正 p 值的解决办法，让 p 值分布尽可能 "correct" ：

# 操作基本一致
# 载入数据
RNACntTable.corrected <- rna.null
RiboCntTable.corrected <- ribo.null
# 分组
rnaCond.corrected <- c(rep('T0', 3), rep('T24',3))
riboCond.corrected <- rnaCond.corrected
# 差异分析
res.deseq2.corrected <- riborex(RNACntTable.corrected, RiboCntTable.corrected,
                                rnaCond.corrected, riboCond.corrected)
DESeq2 mode selected
combining design matrix
applying DESeq2 to modified design matrix
estimating size factors
estimating dispersions
gene-wise dispersion estimates
...

# 检查p值分布
hist(res.deseq2.corrected$pvalue, main = 'DESeq2 unadjusted p-values', xlab='Unadjusted p-values',
     col = '#F4C7AB')

可以看到这个 p 值分布是向右倾斜逐渐升高的，p < 0.05 基因就会很少，接下来用 fdrtool 进行矫正：

# 矫正
results.corrected <- correctNullDistribution(res.deseq2.corrected)
Step 1... determine cutoff point
Step 2... estimate parameters of null distribution and eta0
Step 3... compute p-values and estimate empirical PDF/CDF
Step 4... compute q-values and local fdr
# 检查矫正后的p值分布
hist(results.corrected$pvalue, main = 'DESeq2 unadjusted p-values after correction',
     xlab='Corrected unadjusted p-values',col = '#F4C7AB')

5、多因子实验设计的差异分析

此外，此软件还支持多个实验条件设计的差异翻译效率的分析：

# 多因子实验设计
rna <- RNACntTable[,c(1,2,3,4,1,2,3,4)]
ribo <- RiboCntTable[,c(1,2,3,4,1,2,3,4)]
# 分组
rnaCond <- data.frame(factor1=(c("control1", "control1", "treated1", "treated1",
                                 "control1", "control1", "control1", "control1")),
                      factor2=(c("control2", "control2", "control2", "control2",
                                 "control2", "control2", "treated2", "treated2")))
riboCond <- data.frame(factor1=(c("control1", "control1", "treated1", "treated1",
                                  "control1", "control1", "control1", "control1")),
                       factor2=(c("control2", "control2", "control2", "control2",
                                  "control2", "control2", "treated2", "treated2")))

进行差异分析需要提供指定的比较分组：

# 进行差异分析需要提供指定的比较分组
contrast = c("factor2", "control2", "treated2")
# 差异分析
res.deseq2 <- riborex(rna, ribo, rnaCond, riboCond, "DESeq2", contrast = contrast)
# 查看统计结果
summary(res.deseq2)
out of 13916 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 1887, 14%
LFC < 0 (down)     : 1987, 14%
outliers [1]       : 0, 0%
low counts [2]     : 270, 1.9%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

# edgeR和edgeRD类似
res.edgeR <- riborex(rna, ribo, rnaCond, riboCond, "edgeR", contrast = contrast)
res.edgeRD <- riborex(rna, ribo, rnaCond, riboCond, "edgeRD", contrast = contrast)

6、绘制火山图

画个火山图对上下调基因可视化一下：

# 转换为data.frame
dat <- as.data.frame(res.deseq2)
# 定义上下调基因
dat$type <- ifelse(abs(dat$log2FoldChange)>1 & dat$pvalue < 0.05,
                   ifelse(dat$log2FoldChange >1 & dat$pvalue < 0.05,'UP','DOWN'),'nonSig')
# 查看上下调基因数量
table(dat$type)
DOWN nonSig     UP
   983  12011    922

# 绘图
library(ggplot2)
library(ggprism)

ggplot(dat,aes(x = log2FoldChange,y = -log10(pvalue))) +
  geom_point(aes(color = type,shape = type),alpha = .5,size = 3) +
  theme_prism() + ylim(0,80) + xlim(-6,6) +
  theme(legend.position = c(0.9,0.9)) +
  scale_color_manual(values = c('UP'= '#F54748','nonSig'='#F8A488','DOWN'='#5AA897')) +
  geom_hline(yintercept = -log10(0.05),linetype = 'dashed',size = 1) +
  geom_vline(xintercept = c(-1,1),linetype = 'dashed',size = 1)

7、实验没有 replicate 能分析吗？

对于实验没有 replicate 是不能做统计学检验分析的，但是 edgeR 是可以对 RNA-seq 在没有生物学重复的情况下进行差异分析，对于差异翻译效率分析的话，我也不知道 Riborex 是否能对于没有重复的条件下进行差异分析。

于是昨天在 githup 上问了作者这个问题，今天中午便很快得到了答复！

很不幸的是，不可以哟。

参考资料

[1]

Riborex: fast and flexible identification of differential translation from Ribo-seq data: https://academic.oup.com/bioinformatics/article/33/11/1735/2964727#118738289

[2]

smithlabcode/riborex：: https://github.com/smithlabcode/riborex

欢迎小伙伴留言评论！

今天的分享就到这里了，敬请期待下一篇！

最后欢迎大家分享转发，您的点赞是对我的鼓励和肯定！

如果觉得对您帮助很大，打赏一下吧！

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突认定该生系高空坠落死亡

桐城一派｜倒在“跨年夜”的龚书记，13个字换来免职调查冤不冤？

比佟丽娅还恋爱脑，怀孕7次流产4次，目睹丈夫背叛却选择原谅

市管干部“龚书记”免职迷局

讣告！又一知名女星在家中去世，终年54岁，曾是无数人白月光…

鉴定差异翻译效率基因之 Riborex

参考资料

今天的分享就到这里了，敬请期待下一篇！

您可能也对以下帖子感兴趣

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突 认定该生系高空坠落死亡

桐城一派｜倒在“跨年夜”的龚书记，13个字换来免职调查冤不冤？

比佟丽娅还恋爱脑，怀孕7次流产4次，目睹丈夫背叛却选择原谅

市管干部“龚书记”免职迷局

讣告！又一知名女星在家中去世，终年54岁，曾是无数人白月光…

生成图片，分享到微信朋友圈

鉴定差异翻译效率基因之 Riborex

参考资料

今天的分享就到这里了，敬请期待下一篇！

您可能也对以下帖子感兴趣

观察｜官方通报陕西蒲城一职校学生坠亡：事发前与舍友发生口角和肢体冲突认定该生系高空坠落死亡