现代生物学所需要的现代统计学

生信技能树

现代生物学所需要的现代统计学 Original

看到了一本有意思的书籍：《现代生物学所需要的现代统计学》，名字是我自己翻译的。

主要是因为太多小伙伴在咱们《生信技能树》后台咨询过想不错生物学知识和统计学知识，恰好这个《Modern Statistics for Modern Biology》把二者涵盖了，在线阅读链接：https://www.huber.embl.de/msmb/index.html

全书还配套代码哦：

source("https://www.huber.embl.de/msmb/install_packages.R")

Data

Zipped data directory，压缩包自己下载，https://www.huber.embl.de/msmb/data.tar.gz

Code

Rfiles folder，链接是：https://www.huber.embl.de/msmb/code/

章节目录：

Home
Book supplements
Physical Copy
Introduction
1 Generative Models for Discrete Data
2 Statistical Modeling
3 High Quality Graphics in R
4 Mixture Models
5 Clustering
6 Testing
7 Multivariate Analysis
8 High-Throughput Count Data
9 Multivariate methods for heterogeneous data
10 Networks and Trees
11 Image data
12 Supervised Learning
13 Design of High Throughput Experiments and their Analyses
Statistical Concordance
Acknowledgements
References

确实非常详细，图表代码丰富，比如第8节是高通量测序数据表达量矩阵处理：

Goals of this chapter
Some core concepts
Count data
Modeling count data
A basic analysis
Critique of default choices and possible modifications
Multi-factor designs and linear models
Generalized linear models
Two-factor analysis of the pasilla data
Further statistical concepts
Summary of this chapter
Further reading
Exercises

使用了一个R包《pasilla》里面的果蝇的表达量矩阵和分组信息：

fn = system.file("extdata", "pasilla_gene_counts.tsv",
                  package = "pasilla", mustWork = TRUE)
counts = as.matrix(read.csv(fn, sep = "\t", row.names = "gene_id"))
 
annotationFile = system.file("extdata",
  "pasilla_sample_annotation.csv",
  package = "pasilla", mustWork = TRUE)
pasillaSampleAnno = readr::read_csv(annotationFile)
pasillaSampleAnno

然后根据分组，构建好比较信息，使用DESeq2包如下所示代码即可差异分析：

library("dplyr")
pasillaSampleAnno = mutate(pasillaSampleAnno,
condition = factor(condition, levels = c("untreated", "treated")),
type = factor(sub("-.*", "", type), levels = c("single", "paired")))
 
library("DESeq2")
pasilla = DESeqDataSetFromMatrix(
  countData = counts,
  colData   = pasillaSampleAnno[mt, ],
  design    = ~ condition)
class(pasilla)

pasilla = DESeq(pasilla)

res = results(pasilla)
res[order(res$padj), ] %>% head

是不是超级方便啊！

生物学背景也可以看公开课

因为绝大部分转生物信息学工程师的小伙伴都是有至少4年的生物学背景，生物大分子，中心法则都没有问题，但是也有部分计算机背景学生转过来，会不停的问我该如何补充生物学背景，这里推荐慕课(https://www.icourse163.org/)的两个课程

复旦大学的基因组学：https://www.icourse163.org/course/FUDAN-1002839009#/info
四川大学的细胞生物学：https://www.icourse163.org/course/SCU-46011
其它课程请自行搜索，按需学习，争取掌握生信基础100讲：https://mp.weixin.qq.com/s/Gr_0H4-GaTYkgUkbNHcMcg

ngs课程还有更多

不容错过的B站免费NGS数据处理视频课程，目前，已经组建了微信交流群的有下面这些：

警察殴打打人学生，舆论撕裂的背后

你手放哪呢，出生啊

薅广电羊毛！100元话费实付94.6元，还有电费96.9充100元！招团长~

警察踢打校园欺凌者：当事人不愿返校，派出所拒收锦旗

疯传！广州地铁突发！警方介入

Data

Code

章节目录：

生物学背景也可以看公开课

ngs课程还有更多

您可能也对以下帖子感兴趣

警察殴打打人学生，舆论撕裂的背后

你手放哪呢，出生啊​

薅广电羊毛！100元话费实付94.6元，还有电费96.9充100元！招团长~

警察踢打校园欺凌者：当事人不愿返校，派出所拒收锦旗

疯传！广州地铁突发！警方介入

生成图片，分享到微信朋友圈

Data

Code

章节目录：

生物学背景也可以看公开课

ngs课程还有更多

您可能也对以下帖子感兴趣

你手放哪呢，出生啊