查看原文
其他

寻找生存分析的最佳基因表达分组阈值

生信技能树 生信技能树 2022-06-06

昨天我们提到了任意更改基因表达分组阈值生存分析结果大不一样:

集思广益-生存分析可以随心所欲根据表达量分组吗

看到

https://www.proteinatlas.org/ENSG00000111801-BTN3A3/pathology/tissue/breast+cancer


文字版是:

Based on the FPKM value of each gene, we classified the patients into two groups and examined their prognoses.

In the analysis, we excluded genes with low expression, i.e., those with a median expression among samples less than FPKM 1.

The prognosis of each group of patients was examined by Kaplan-Meier survival estimators, and the survival outcomes of the two groups were compared by log-rank tests.

To choose the best FPKM cut-offs for grouping the patients most significantly, all FPKM values from the 20th to 80th percentiles were used to group the patients, significant differences in the survival outcomes of the groups were examined and the value yielding the lowest log-rank P value is selected.

得到的K-M图如下:

image-20190516092236980

如果是在 http://www.oncolnc.org/ 出图如下:


a=read.csv('BRCA_10384_50_50.csv')
head(a)
a$event=ifelse(a$Status=='Alive',0,1)
library(survival)
library(survminer)
sfit <- survfit(Surv(Days, event)~Group, data=a) 
ggsurvplot(sfit, conf.int=F, pval=TRUE)

phe=a
phe$time=phe$Days/365

## 批量生存分析 使用  logrank test 方法
mySurv=with(phe,Surv(time, event))
log_rank_p <- lapply(2:8, function(i){
  thr=sort(phe$Expression)[round(nrow(phe)*i/10)]
  phe$group=ifelse(phe$Expression > thr,'high','low') 
  print(table( phe$group ))
  data.survdiff=survdiff(mySurv~group,data=phe)
  p.val = 1 - pchisq(data.survdiff$chisq, length(data.survdiff$n) - 1)
  return(p.val)
}) 
log_rank_p=unlist(log_rank_p)
log_rank_p

i=8
thr=sort(phe$Expression)[round(nrow(phe)*i/10)]
phe$group=ifelse(phe$Expression > thr,'high','low') 
print(table( phe$group ))
sfit <- survfit(Surv(time, event)~group, data=phe) 
ggsurvplot(sfit, conf.int=F, pval=TRUE)

遗憾的是,因为数据源不一样,使用oncolnc的数据也不太可能画出 proteinatlas 一模一样的图:

见: TCGA数据库生存分析的网页工具哪家强

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存