HW7 - RNA-Seq Analysis
3 minute read
EXCEPTION: Due to the cluster outage (May 7-8th), the read count tables required for HW7 have been made available for download from this GEN242_Data repository created for this purpose. To work with this data set, clone this repository (same way as HW repos), and then
cd
into the rnaseq subdirectory located underGEN242/HW7_data/
of this repos, or set your RStudio session to this directory. Next open the providedHW07.Rmd
file underHW7_data/rnaseq
and follow the instructions given in this file. I contains quite a bit of additional helper code, well beyond what is given below.
A. Unstranded and strand-specific read counting
-
Task 1: Rerun or reload the RNA-Seq workflow with the toy data sets up to the read quantification step here. Note, the toy data set gets automatically loaded when intializing a workflow environment (directory structure) with the
genWorkenvir
function (see tutorial here).In the read quantification step with
summarizeOverlaps
generate count tables for exons by genes (eByg
) of the following three strand modes:- Unstranded
- Strand-specific for positive (sense) strand
- Strand-specific for negative (antisense) strand
The solution for generating the unstranded read counts is given below. Note, the upstream steps of the RNA-Seq workflow only need to be rerun to generate the proper inputs for the read counting. Thus, they are not required to be included in the homework results (see
HW7.R
below).
unstranded <- summarizeOverlaps(eByg, bfl, mode="Union",
ignore.strand=TRUE,
# preprocess.reads=invertStrand,
inter.feature=FALSE,
singleEnd=FALSE)
unstranded <- assays(unstranded)$counts
unstranded[1:4,]
Before attempting to solve this homework task please read the vignette
Counting reads with summarizeOverlaps
(here)
from the GenomicAlignments
package that defines the summarizeOverlap
function. In addition, the help file for ?summarizeOverlaps
provides useful information.
-
Task 2: Provide R code that demonstrates that the two strand-specific count tables sum up to very similar values as the unstranded count table.
-
Task 3: Explain the utility (biological relevance) of the different strand counting modes used under Task 1. Include your explanation as comment text in your homework script (see
HW7.R
below).
Note, for Tasks 1-3 only the code and/or text needs to be included in the homework submission (no data/result files). For details see below.
B. Read counting for different feature types
-
Task 4: Compute strand-specific count tables for the positive (sense) strand of the following feature types. The help files of
?exonsBy
and?transcripts
provide useful information for solving these tasks.- Genes
- Exons
- Exons by genes
- Introns by transcripts
- 5’-UTRs by transcripts
Note, for Tasks 4 only include the code and/or text in your homework submission (no data/result files).
C. DEG analysis
-
Task 5: Perform the DEG analysis with
edgeR
as outlined under section 6 of the RNA-Seq workflow here. Use in one case for the DEG analysis the unstranded count table as input (from Task 1.1) and in another the sense strand count table (from Task 1.2). Compare the DEG result of the two methods in two separate 4-way Venn diagrams for the same sample comparisons used in the workflow example here.- 4-way Venn diagram for unstranded count table
- 4-way Venn diagram for sense strand count table
Note, for Tasks 5 include both the code and the resulting images in your homework submission.
Homework submission
Please submit the homework results in one well structured and annotated R
script to your private GitHub repository under Homework/HW7/HW7.R
. Instead
of an R script the homework can be submitted in form of an R Markdown (*Rmd) file.
Due date
This homework is due on Tue, May 14th at 6:00 PM.
Homework Solutions
See here.