Embedding Methods for scRNA-Seq

RNA-Seq Workflow

  1. Read quality assessment, filtering and trimming
  2. Map reads against reference genome
  3. Perform read counting for required ranges (e.g. exonic gene ranges)
  4. Normalization of read counts
  5. Identification of differentially expressed genes (DEGs)
  6. Clustering of gene expression profiles
  7. Gene set enrichment analysis

Challenge Projects

1. Embedding Methods for scRNA-Seq

  • Run workflow from start to finish (steps 1-7) on RNA-Seq data set from Howard et al. (2013)
  • Challenge project tasks
    • Compare the partition performance of at least 3 embedding methods for high-dimensional gene expression data using single cell RNA-Seq data. The dimensionality reduction methods can include PCA, MDS, SC3, isomap, t-SNE, FIt-SNE, UMAP, runUMAP in scater Bioc package, etc.
    • To obtain meaningful test results, choose an scRNA-Seq data set (here pre-processed count data) where the correct cell clustering is known (ground truth). For simplicity the data could be obtained from the scRNAseq package (Risso and Cole, 2020) or loaded from GEO (e.g. Shulse et al., 2019). For learning purposes, organize the data in a SingleCellExperiment object. How to work with SingleCellExperiment objects with embedding methods like t-SNE, the tutorial (here) of the scran package provides an excellent introduction.
    • Optional: plot the (partitioning) performance in the form of ROC curves and/or record their AUC values.
    • Compare your test results with published performance test results, e.g. Sun et al. (2019) or Duò et al. (2018).

