RNA-Seq - DEG Analysis Methods
2 minute read
RNA-Seq Workflow
- Read quality assessment, filtering and trimming
- Map reads against reference genome
- Perform read counting for required ranges (e.g. exonic gene ranges)
- Normalization of read counts
- Identification of differentially expressed genes (DEGs)
- Clustering of gene expression profiles
- Gene set enrichment analysis
Challenge Projects
1. Comparison of DEG analysis methods
- Run the workflow from start to finish (steps 1-7) on the full RNA-Seq data set from Howard et al. (2013).
- Challenge project tasks
- Compare the DEG analysis method chosen for the paper presentation with at least 1-2 additional methods (e.g. one student compares edgeR vs. baySeq, and the other student DESeq2 vs. limma/voom). Assess the results as follows:
- Analyze the the similarities and differences in the DEG lists obtained from the two methods using intersect matrices, venn diagrams and/or upset plots.
- Assess the impact of the DEG method on the downstream gene set enrichment analysis?
- Plot the performance of the DEG methods in thevform of ROC curves and record their AUC values. A consensus DEG set or the one from the Howard et al. (2013) paper could be used as the ‘pseudo’ ground truth result.
- Compare the DEG analysis method chosen for the paper presentation with at least 1-2 additional methods (e.g. one student compares edgeR vs. baySeq, and the other student DESeq2 vs. limma/voom). Assess the results as follows:
2. Comparison of DEG analysis methods
- Similar as above but with different combination of DEG methods and/or performance testing approach.
References
- Howard, B.E. et al., 2013. High-throughput RNA sequencing of pseudomonas-infected Arabidopsis reveals hidden transcriptome complexity and novel splice variants. PloS one, 8(10), p.e74183. PubMed
- Guo Y, Li C-I, Ye F, Shyr Y (2013) Evaluation of read count based RNAseq analysis methods. BMC Genomics 14 Suppl 8: S2 PubMed
- Hardcastle TJ, Kelly KA (2010) baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11: 422 PubMed
- Liu R, Holik AZ, Su S, Jansz N, Chen K, Leong HS, Blewitt ME, Asselin-Labat M-L, Smyth GK, Ritchie ME (2015) Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Res. doi: 10.1093/nar/gkv412. PubMed
- Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15: 550 PubMed
- Zhou X, Lindsay H, Robinson MD (2014) Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Res 42: e91 PubMed
Last modified 2024-05-24: some edits (a73f918c0)