其他
看到了一本有意思的书籍:《现代生物学所需要的现代统计学》,名字是我自己翻译的。
主要是因为太多小伙伴在咱们《生信技能树》后台咨询过想不错生物学知识和统计学知识,恰好这个《Modern Statistics for Modern Biology》把二者涵盖了,在线阅读链接:https://www.huber.embl.de/msmb/index.html
全书还配套代码哦:
source("https://www.huber.embl.de/msmb/install_packages.R")
Data
Zipped data directory,压缩包自己下载,https://www.huber.embl.de/msmb/data.tar.gz
Code
Rfiles folder,链接是:https://www.huber.embl.de/msmb/code/
章节目录:
Home Book supplements Physical Copy Introduction 1 Generative Models for Discrete Data 2 Statistical Modeling 3 High Quality Graphics in R 4 Mixture Models 5 Clustering 6 Testing 7 Multivariate Analysis 8 High-Throughput Count Data 9 Multivariate methods for heterogeneous data 10 Networks and Trees 11 Image data 12 Supervised Learning 13 Design of High Throughput Experiments and their Analyses Statistical Concordance Acknowledgements References
确实非常详细,图表代码丰富,比如第8节是高通量测序数据表达量矩阵处理:
Goals of this chapter Some core concepts Count data Modeling count data A basic analysis Critique of default choices and possible modifications Multi-factor designs and linear models Generalized linear models Two-factor analysis of the pasilla data Further statistical concepts Summary of this chapter Further reading Exercises
使用了一个R包《pasilla》里面的果蝇的表达量矩阵和分组信息:
fn = system.file("extdata", "pasilla_gene_counts.tsv",
package = "pasilla", mustWork = TRUE)
counts = as.matrix(read.csv(fn, sep = "\t", row.names = "gene_id"))
annotationFile = system.file("extdata",
"pasilla_sample_annotation.csv",
package = "pasilla", mustWork = TRUE)
pasillaSampleAnno = readr::read_csv(annotationFile)
pasillaSampleAnno
然后根据分组,构建好比较信息,使用DESeq2包如下所示代码即可差异分析 :
library("dplyr")
pasillaSampleAnno = mutate(pasillaSampleAnno,
condition = factor(condition, levels = c("untreated", "treated")),
type = factor(sub("-.*", "", type), levels = c("single", "paired")))
library("DESeq2")
pasilla = DESeqDataSetFromMatrix(
countData = counts,
colData = pasillaSampleAnno[mt, ],
design = ~ condition)
class(pasilla)
pasilla = DESeq(pasilla)
res = results(pasilla)
res[order(res$padj), ] %>% head
是不是超级方便啊!
生物学背景也可以看公开课
因为绝大部分转生物信息学工程师的小伙伴都是有至少4年的生物学背景,生物大分子,中心法则都没有问题,但是也有部分计算机背景学生转过来,会不停的问我该如何补充生物学背景,这里推荐慕课(https://www.icourse163.org/)的两个课程
复旦大学的基因组学:https://www.icourse163.org/course/FUDAN-1002839009#/info 四川大学的细胞生物学:https://www.icourse163.org/course/SCU-46011 其它课程请自行搜索,按需学习,争取掌握生信基础100讲:https://mp.weixin.qq.com/s/Gr_0H4-GaTYkgUkbNHcMcg
ngs课程还有更多
不容错过的B站免费NGS数据处理视频课程,目前,已经组建了微信交流群的有下面这些: