单细胞测序构建人类正常乳腺上皮细胞图谱
当你的才华还撑不起你的野心时,请潜下心来,脚踏实地,跟着我们慢慢进步。不知不觉在单细胞转录组领域做知识分析也快两年了,通过文献速递这个栏目很幸运聚集了一些小伙伴携手共进,一起成长。
文献速递栏目通过简短介绍,扩充知识面,每天关注,希望你也能有所收获!
摘要
本文使用对来自7名女性的一共25790个正常乳腺上皮细胞进行了单细胞RNA测序。经过聚类分析,定义了三群细胞:分泌性 Luminal cells,激素反应性 Luminal cells, 和基细胞。经过伪时间分析,重建了连接这三类细胞的分化路径。此研究系统描述了乳腺上皮细胞的特征,并有助于理解乳腺癌细胞的谱系分化。
背景知识
normal breast histology
normal breast cells to breast cancer
breast cancer classifications
正文
测序数据
平台:Fluidigm C1 (Protocol 100-7168 I1
【https://www.fluidigm.com/binaries/content/documents/fluidigm/resources/c1-mrna%E2%80%90seq-pr-100%E2%80%907168/c1-mrna%E2%80%90seq-pr-100%E2%80%907168/fluidigm%3Afile】)
数据:GSE113197
(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE113197)
代码:github(https://github.com/kessenbrocklab/Nguyen_Pervolarakis_Nat_Comm_2018)
基本流程
Isolation:FACS分别筛选出basal和luminal两种上皮细胞
Alignment & quantification: RSEM / bowtie2 (GRCh38 RefSeq genome) => FPKM matrix
Quality control: 细胞: > 900 genes/cell ; 基因:>3 cells/gene (*)
Output: 来自3名女性的703 single cells, ~4500 genes/cell
Clustering: Seurat pipeline
Quality control (*) => log(FPKM) => PCA (from highly variable genes) => tSNE (top 2 PCs), clustering => marker genes selection (FindAllMarkers)
分析
scRNAseq reveals three cell types in the breast epithelium
Identification of three major epithelial cell types and their markers using scRNAseq. aOverview of scRNAseq approach using primary human breast tissue samples that were processed into single cell suspension, followed by FACS isolation of basal (CD49f-hi, EPCAM+) and luminal (CD49f+, EPCAM-hi), and scRNAseq analysis using the microfluidics-enabled scRNAseq. b Combined tSNE projection of cells from all three microfluidics-enabled scRNAseq datasets. The major basal cluster is highlighted in red; Luminal1 (L1) in green; Luminal2 (L2) in blue. c Heatmap displaying the scaled expression patterns of top marker genes within each cell type with selected marker genes highlighted; yellow indicating high expression of a particular gene, and purple indicating low expression. d Feature plots showing the scaled expression of TCF4 and ZEB1 marking a subpopulation of basal cells and gene plot showing co-expression of TCF4 and ZEB1 in the same cells. See Supplementary Fig. 1 capture site imaging, gene detection, individual principal component analysis, tSNE plot colored by individual-derived cells and feature plots of cell type-specific markers
integrate三名女性的sample data,通过clustering,发现三个主要群体
将这三个cluster中高表达的genes作为signature,发现其分别对应一群basal(KRT14+);两群luminal(L1: KRT18+/SLPI+; L2: KRT18+/ANKRD30A+),且这三群在三名女性中都存在。
将这些signatures和FACS筛选过的细胞的microarray public data比对,得到 basal, L1, L2群与细胞表面marker表达的对应关系。其中basal cell中有一部分有上调的stem cell markers的表达(ZEB1, TCF4)
测序数据
平台:10X Genomics
(v2【https://assets.ctfassets.net/an68im79xiti/6EX4Qbv2LYMW4A8kk4qGsE/7c71611089466264e1463ce552652851/20180618_CG00052_SingleCell3_v2UG_RevisionSummary_RevDtoRevE.pdf】/ v1 chemistry)
数据:GSE113197
(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE113197)
代码github(https://github.com/kessenbrocklab/Nguyen_Pervolarakis_Nat_Comm_2018)
取材:乳腺组织来自乳房缩小手术的7名匿名患者,经机械切割和酶消化得到单细胞,通过流式 (Fluidigm C1) 筛选出上皮细胞, 通过上述平台建库,分别使用Illumina HiSeq2500 (1.6 million reads per cell) 和HiSeq4000 (50,000 reads per cell) 测序。对25790个细胞进行下游分析。
基本流程
Isolation:FACS一并筛选出basal和luminal两种上皮细胞
Alignment & quantification: 10X Cell Ranger (GRCh38 RefSeq genome) => Cell Ranger Count => Cell Ranger Aggr (normalization)
Quality control: 细胞: 500-6000 genes/cell ; 线粒体基因:=<10% (*)
Output: 来自4名女性的24646 single cells, ~60,000 reads/cell;每个sample分别处理
Clustering: Seurat pipeline
Quality control (*) => log(counts) => regress UMI number & mitochondria gene percnet => PCA (from highly variable genes) => tSNE (top 10 PCs) , clustering => marker genes selection (FindAllMarkers)
分析
Droplet-mediated scRNAseq reveals subpopulation diversity
High throughput droplet-mediated scRNAseq reveals additional epithelial cell states. aOverview for droplet-enabled scRNAseq approach as described above; basal and luminal epithelial cells were sorted together and subjected to combined scRNAseq analysis using the droplet-based scRNAseq. b Data from individual four was analyzed using Seurat and the distinct clusters (0–10) are displayed in tSNE projection with selected marker gene for each cluster, and main epithelial cell types (Basal, L1, L2) are outlined. Feature plots of characteristic markers for the three main cell types are shown on the right showing expression levels as gradient of purple. c Heatmap showing the top ten marker genes for each cluster as determined by Seurat analysis with three selected genes per clusterhighlighted on the right. See Supplementary Fig. 2 for individual clustering and marker gene analyses for Individuals 5–7
L1, L2, Basal三种 (types) 细胞内部又可以分出subgroups (cell state). L1具有milk production相关特征,L2具有表达hormone-responsvie genes的特性;不同个体都有 3 types of cells,但组成它们的subgroups互异
Combined droplet based RNAseq data to identify generalizable cell types and states. a–cHeatmaps showing gene scoring results using marker genes for Ind4 clusters (0–10; on bottom of heatmap) in all clusters from Ind5 (a), Ind6 (b), and Ind7 (c). Individual-specific cluster IDs are shown in different colors on the right and bottom, and cell type IDs for Basal (b), L1, L2, X are indicated on for every cluster. Data shown as Z scores from purple (low) to yellow (high). Two distinct cell states L1.1 and L1.2 were found within L1 in all pairwise comparisons as highlighted by colored boxes on heatmap. d Combined tSNE projection of all individual datasets (outlined) is shown including the cell state identity marked by different colors. e Heatmap showing the expression pattern of the top ten markers per cell state with selected markers indicated (yellow = high expression; purple = low expression). See Supplementary Fig. 4 for separate basal cell Seurat analysis, summary of cell state designations and Ingenuity Pathway Analysis
使用 cell scoring method,将其他个体的subgroup map到某一个reference 个体的subgroups上。四个个体普遍具有:L1中两种state:L1.1, L1.2;basal中两种state
合并4名个体的basal cells重新聚类 => 得到myoepithelial cell signature,据此将basal cells分为basal & myoepithelial 两群
综上,得到 Basal, Myoepithelial, Luminal1.1, Luminal1.2, Luminal2, unclassified 6种不失普遍性的cell states subgroups
对每种state的markers进行 IPA/TF 的enrichment analysis显示各个state与功能的相关性:Myo - physical integrity; Basal : stemness; L1.1: sentinel/immune related functions;L1.2: steroid hormone signaling
Spatial integration of cell types and states
Characterization and spatial integration of basal cell states. a Immunofluorescence analysis of ZEB1 protein expression (red) in combination with basal marker KRT14 (green) and DNA stain using DAPI (blue) within tissue sections from primary human reduction mammoplasty samples showing ZEB1 expression in a subpopulation of basal (KRT14+) cells. Scale bar = 15 µm. b Heatmap showing expression of genes previously shown to be up- (red) or down-regulated (blue) in a population of PROCR+ mammary stem cells show correlation with ZEB1+ cells in scRNAseq. c Immunofluorescence analysis of TCF4 protein expression (red) in combination with basal marker SMA (green) and DNA stain using DAPI (blue) within tissue sections from primary human reduction mammoplasty samples revealed that TCF4 is expressed in a subpopulation of basal (SMA+) cells. Scale bar = 25 µm. d Violin plot for expression of KRT14 by cell state showing highest expression in the myoepithelial (Myo) cells. e KRT14 and KRT8 double immunostaining revealed highest expression of KRT14 in ductal basal cells, while lobular basal cells show more diverse KRT14 positivity. Scale bar = 75 µm. See Supplementary Fig. 4 for violin plots displaying selected myoepithelial gene expression and identification of KRT8/KRT14 double positive cells
Validation and spatial integration of two distinct luminal cell types. a Immunofluorescence analysis of NY-BR-1 protein expression (green) in combination with basal marker SLPI (red) and DNA stain using DAPI (blue) within tissue sections from primary human reduction mammoplasty samples revealed that NY-BR-1 and SLPI are markers for distinct luminal subpopulations. b–e Immunofluorescence analysis of NY-BR-1 and SLPI (red) protein expression with: hormone receptors for estrogen receptor (b), progesterone (c), and androgen (d) and proliferation marker Ki67 e in green. f Summary of hormone receptor and proliferation marker expression in L1 and L2 cells. g Violin plot showing expression of KRT8 in the luminal subpopulations, higher expression is seen in the luminal L1.1 and L1.2 subpopulation. h Sample frame for detection of KRT8 protein content from individual cells using single cell Western blot following detection using microarray scanner. i Population summary showing cell number per fluorescence intensity confirmed bimodal distribution of KRT8 expression on the protein level. See Supplementary Fig. 5 for violin plots displaying expression of relevant hormone receptors as well as proliferation and luminal progenitor markers. All scale bars = 25 µm
通过immune fluorescence和single cell western blot (Abs on chip),验证scRNA seq分析得到的marker gene以及其它显示细胞状态的gene (Ki67, p27, ELF5, KIT) 在spatial的表达情况,主要结论如下
各个cell state的marker gene protein 的确只在相应组织位置的一部分细胞表达(一致性)
L1, L2在lobular/ductal的分布上没有显著区别(没有为IDC/ILC的lineage提供证据)
Basal,L1, L2都含有一部分 proliferative cells, 它们可能为各自lineage-restricted的progenitor; L1: secretary functions, L2: hormone-sensing functions
Reconstructing lineage hierarchies within the epithelium
Reconstruction of differentiation and relation of cell states to breast cancer subtypes. aMonocle-generated pseudotemporal trajectory of a subsampled population of cells (n = 4000) from four individuals analyzed using droplet-mediated scRNAseq is shown colored by cell state designation. b Pseudotime is shown colored in a gradient from dark to light blue and start of pseudotime is indicated. See Supplementary Fig. 6 for summary list of discovered cell states, Monocle analysis of microfluidics-enabled scRNAseq results and gene scoring for breast cancer subtypes
subsample出4000 cells (1000 cells / person) 做pseudo-time analysis (set Basal cells as start point);得到的trajectory和之前乳腺细胞两级分化模型相符。其中L1.2是具有luminal-restricted & bi-potent progenitor,而L1.1则为更加mature的细胞类型,而并非之前报导的progenitors.
Subpopulations correspond to breast cancer subtypes
使用新发现的cell state signature genes为METABRIC的样本基于相似性打分,显示其中各种breast cancer亚型和正常上皮细胞subgroups的相似性: Luminal A/B 和L2相似
使用TNBC subtype signature genes为正常上皮的各个subgroups基于相似性打分,发现Myo组相对于其他subgroups最接近TNBC mesenchymal subtype; 而L1.1相对于其他subgroups最接近TNBC Basa1 subtype
综上,这种相似性分析或可以揭示breast cancer subtype来源于那种normal epithelial cells
总结
Proposed cellular heterogeneity and lineage hierarchies within the human breast. aSchematic summary of discovered cell states within the basal and luminal compartment of the human breast epithelium with proposed function, key transcription factors (in white), selected markers (in black) and similarities to breast cancer subtypes indicated in boxes. bProposed model summarizing the lineage hierarchies within the breast epithelium based on one continuous differentiation trajectory from basal stem cells to three distinct differentiated cell types with overlaid marker genes of interest shown (black on gray bars)
通过对多名女性正常乳腺上皮细胞的scRNA seq, 发现了6种 cell state-specific的subgroups。
通过DE和enrichment分析,对这6个subgroups的以下方面进行annotation: marker genes, marker transcription factors, (putative) specific functions。marker gene的表达在protein水平上基于免疫荧光染色进行了validation,并同时得到各亚型/marker gene expression的组织空间分布
通过pseudo-time analysis,显示了乳腺上皮细胞可能的分化lineage hierarchies.
通过结合临床肿瘤样本RNAseq data,找到这6种normal cells和breast cancer subtypes的相对相似性关系,或可为breast cancer的早诊断早预防有促进作用
One sentence of summary:
A de novo investigation of cellular heterogeneity of human breast epithelial cells
Reference
Nguyen, Q. H., Pervolarakis, N., Blake, K., Ma, D., Davis, R. T., James, N., … Kessenbrock, K. (2018). Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity. Nature communications, 9(1), 2028. doi:10.1038/s41467-018-04334-1
往期精彩
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程