HW7 - RNA-Seq Analysis

4 minute read



Source code downloads:   [ .R ]

A. Unstranded and strand-specific read counting

  • Task 1: Rerun the RNA-Seq workflow with the toy data sets up to the read quantification step here. Note, the toy data set gets automatically loaded when intializing a workflow environment (directory structure) with the genWorkenvir function (see tutorial here). In the read quantification step with summarizeOverlaps generate count tables for exons by genes (eByg) of the following three strand modes:

    1. Unstranded
    2. Strand-specific for positive (sense) strand
    3. Strand-specific for negative (antisense) strand

    The solution for generating the unstranded read counts is given below. Note, the upstream steps of the RNA-Seq workflow only need to be rerun to generate the proper inputs for the read counting. Thus, they are not required to be included in the homework results (see HW7.R below).

summarizeOverlaps(eByg, bfl, mode="Union", 
                    ignore.strand=TRUE, 
                    # preprocess.reads=invertStrand,
                    inter.feature=FALSE, 
                    singleEnd=TRUE)

Before attempting to solve this homework task please read the vignette Counting reads with summarizeOverlaps (here) from the GenomicAlignments package that defines the summarizeOverlap function. In addition, the help file for ?summarizeOverlaps provides useful information.

  • Task 2: Provide R code that demonstrates that the two strand-specific count tables sum up to the values of the unstranded count table.

  • Task 3: Explain the utility (biological relevance) of the different strand counting modes used under Task 1. Include your explanation as comment text in your homework script (see HW7.R below).

Note, for Tasks 1-3 only the code and/or text needs to be included in the homework submission (no data/result files). For details see below.

B. Read counting for different feature types

  • Task 4: Compute strand-specific count tables for the positive (sense) strand of the following feature types. The help files of ?exonsBy and ?transcripts provide useful information for solving these tasks.

    1. Genes
    2. Exons
    3. Exons by genes
    4. Introns by transcripts
    5. 5’-UTRs by transcripts

Note, for Tasks 4 only include the code and/or text in your homework submission (no data/result files). For details see below.

C. DEG analysis

  • Task 5: Perform the DEG analysis with edgeR as outlined under section 6 of the RNA-Seq workflow here. Use in one case for the DEG analysis the unstranded count table as input (from Task 1.1) and in another the sense strand count table (from Task 1.2). Compare the DEG result of the two methods in two separate 4-way Venn diagrams for the same sample comparisons used in the workflow example here.

    1. 4-way Venn diagram for unstranded count table
    2. 4-way Venn diagram for sense strand count table

Note, for Tasks 5 include both the code and the resulting images in your homework submission. For deteails see below.

Homework submission

Accept the project repos on Github Classroom

Starting with this homework, you will use a classroom repository called project. This repository will be used until the end of this class for all remaining homework assignments as well as the course project. You are responsible to maintain this project repository including its README.md file.

  1. To start with HW7, accept the assignment and git clone to your user account on the HPCC cluster or your local computer.
  2. Create a directory “hw7” and upload all your HW7 script and result files to this directory. This includes the following files:
    • Your script file either as R or R Markdown file, here hw7.R or hw7.Rmd.
    • Two Venn diagram plots: unstranded.png and hw7/sense.png

Auto-grading

You will complete most of your HW7 on the HPCC cluster. In this and the following homeworks auto-grading will not be used.
Your TA will manually check your homework solutions.

Details

  • If you wish then you can submit your homeworks as an R Markdown (Rmd) file. However, an R file will be sufficient. Due to dependencies of large input/result files there is also no need to make sure your code can be sourced with source().
  • Another option will be to add your homework code to the systemPipeRNAseq.Rmd file used in the corresponding workflow template and rename it to hw7.Rmd when you upload it to GitHub.
  • If possible please use only one of the above format options.

For the graphis files, you can upload them either as .png or .jpg (.jpeg) files.

Grading

  • Task1
    • Unstranded: 1
    • Positive: 1
    • Negative: 1
  • Task2: 1
  • Task3: 1
  • Task4
    • Genes: 0.5
    • Exons: 0.5
    • Exons by genes: 0.5
    • Introns by transcripts: 0.5
    • 5’-UTRs by transcripts: 0.5
  • Task5
    • DEG: 1
    • Venn1: 1
    • Venn2: 1

Total: 10.5

Due date

This homework is due in one week on Thu, May 6th at 6:00 PM. As discussed in today’s class, the due date of this homework has been moved to Mon, May 10th at 6:00 PM.

Homework Solutions

See here.

Last modified 2021-05-12: some edits (58559518d)