Read counting with summarizeOverlaps in parallel mode using multiple cores
Reads overlapping with annotation ranges of interest are counted for
each sample using the summarizeOverlaps function (Lawrence et al., 2013). The read counting is
preformed for exonic gene regions in a non-strand-specific manner while
ignoring overlaps among different genes. Subsequently, the expression
count values are normalized by reads per kp per million mapped reads
(RPKM). The raw read count table (countDFeByg.xls) and the correspoding
RPKM table (rpkmDFeByg.xls) are written
to separate files in the directory of this project. Parallelization is
achieved with the BiocParallel package, here using 8 CPU cores.
Sample of data slice of count table
Sample of data slice of RPKM table
Note, for most statistical differential expression or abundance analysis
methods, such as edgeR or DESeq2, the raw count values should be used as input. The
usage of RPKM values should be restricted to specialty applications
required by some users, e.g. manually comparing the expression levels
among different genes or features.
Sample-wise correlation analysis
The following computes the sample-wise Spearman correlation coefficients from
the rlog transformed expression values generated with the DESeq2 package. After
transformation to a distance matrix, hierarchical clustering is performed with
the hclust function and the result is plotted as a dendrogram
(also see file sample_tree.pdf).