RNA-Seq Workflow

  1. Read quality assessment, filtering and trimming
  2. Map reads against reference genome
  3. Perform read counting for required ranges (e.g. exonic gene ranges)
  4. Normalization of read counts
  5. Identification of differentially expressed genes (DEGs)
  6. Clustering of gene expression profiles
  7. Gene set enrichment analysis

Challenge Project: Comparison of DEG analysis methods

  • Run workflow from start to finish (steps 1-7) on RNA-Seq data set from Howard et al. (2013)
  • Challenge project tasks
    • Run at least 2 RNA-Seq DEG analysis methods (e.g. edgeR, DESeq2, limma/voom) and compare the results as follows:
      • Analyze the similarities and differences in the DEG lists obtained from the two methods
      • Does it affect the results from the downstream gene set enrichment analysis?
      • Plot the performance of the DEG methods in form of an ROC curve. The DEG set from the Howard et al., 2013 paper could be used as benchmark (true result).

References

  • Howard, B.E. et al., 2013. High-throughput RNA sequencing of pseudomonas-infected Arabidopsis reveals hidden transcriptome complexity and novel splice variants. PloS one, 8(10), p.e74183. PubMed
  • Guo Y, Li C-I, Ye F, Shyr Y (2013) Evaluation of read count based RNAseq analysis methods. BMC Genomics 14 Suppl 8: S2 PubMed
  • Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140 PubMed
  • Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15: 550 PubMed
  • Zhou X, Lindsay H, Robinson MD (2014) Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Res 42: e91 PubMed