Clustering and Embedding Methods for scRNA-Seq
3 minute read
RNA-Seq Workflow
- Read quality assessment, filtering and trimming
- Map reads against reference genome
- Perform read counting for required ranges (e.g. exonic gene ranges)
- Normalization of read counts
- Identification of differentially expressed genes (DEGs)
- Clustering of gene expression profiles
- Gene set enrichment analysis
Challenge Project
Clustering and Embedding Methods for scRNA-Seq
- Run the above workflow from start to finish (steps 1-7) on the full RNA-Seq data set from Howard et al. (2013).
- Challenge project tasks
- Group 1 and 2 compare the partition performance of at least 3 clustering and 3 embedding methods, respectively, for high-dimensional gene expression data using single cell RNA-Seq data.
- The clustering methods can include SC3, TSCAM, Seurat, PCAkmeans, etc (for additional methods, see table 3 in Duò et al, 2018).
- The dimensionality reduction methods can include PCA, MDS, SC3, isomap, t-SNE, FIt-SNE, UMAP, runUMAP in scater Bioc package, etc.
- To obtain meaningful test results, choose an scRNA-Seq data set (here pre-processed count data) where the correct cell clustering is known (ground truth). For simplicity the data could be obtained from the scRNAseq package (Risso and Cole, 2020) or loaded from GEO (e.g. Shulse et al., 2019). For learning purposes, organize the data in a SingleCellExperiment object. How to work with
SingleCellExperiment
objects with embedding methods like t-SNE, the tutorial (here) of the scran package provides an excellent introduction. - Optional: plot the (partitioning) performance in the form of ROC curves and/or record their AUC values.
- Compare your test results with published performance test results, e.g. Sun et al. (2019) or Duò et al. (2018).
References
- Duò A, Robinson MD, Soneson C (2018) A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 7: 1141. PubMed
- Howard, B.E. et al., 2013. High-throughput RNA sequencing of pseudomonas-infected Arabidopsis reveals hidden transcriptome complexity and novel splice variants. PloS one, 8(10), p.e74183. PubMed
- Kiselev VY, Kirschner K, Schaub MT, Andrews T, Yiu A, Chandra T, Natarajan KN, Reik W, Barahona M, Green AR, et al (2017) SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14: 483–486. PubMed
- L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9 (Nov) : 2579-2605, 2008.
- Linderman GC, Rachh M, Hoskins JG, Steinerberger S, Kluger Y (2019) Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat Methods 16: 243–245 PubMed (Note: this could be used as a more recent pub on t-SNE; the speed improved version is also available for R with a C)
- McInnes L, Healy J, Melville J (2018) UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv
- Risso D, Cole M (2020). scRNAseq: Collection of Public Single-Cell RNA-Seq Datasets. R package version 2.4.0. -> Choose one scRNA-Seq data set from this Bioc data package for testing embedding methods. URL
- Senabouth A, Lukowski SW, Hernandez JA, Andersen SB, Mei X, Nguyen QH, Powell JE (2019) ascend: R package for analysis of single-cell RNA-seq data. Gigascience. doi: 10.1093/gigascience/giz087. PubMed
- Shulse CN, Cole BJ, Ciobanu D, Lin J, Yoshinaga Y, Gouran M, Turco GM, Zhu Y, O’Malley RC, Brady SM, et al (2019) High-Throughput Single-Cell Transcriptome Profiling of Plant Cell Types. Cell Rep 27: 2241–2247.e4 PubMed
- Sun S, Zhu J, Ma Y, Zhou X (2019) Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 20: 269. PubMed
- Sun S, Zhu J, Zhou X (2020) Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies. Nat Methods. doi: 10.1038/s41592-019-0701-7. PubMed
Last modified 2024-05-27: some edits (b92443af0)