Generate list of CEL names defining treatment vs. control comparisons
The sampleList
function extracts the sample comparisons (contrasts) from the
CMAP annotation table and stores them as a list.
Load normalized expression data
The following loads the MAS5 normalized expression data into a single data.frame
.
To accelerate the import, the data is read from rds
files.
The next step generates gene level expression values. If genes are represented by several
probe sets then their mean intensities are used. This is necessary because
the U133 chip contains many genes with duplicated probe sets. Probe sets not matching
any gene are removed.
DEG analysis with limma
The analysis of differentially expressed genes (DEGs) is performed with the limma
package.
Genes meeting the chosen cutoff criteria are reported as DEGs (below set to FDR of 10% and
a minimum fold change of 2). The DEG matrix is written to a file named
degMA.xls
.
Number of DEGs across drug treatments
The following plots the number of drug treatments (y-axis) for increasing bin sizes (x-axis)
of DEGs.
Identify DEG overlaps with Peters et al. (2015)
Peters et al. (2015) reported 1,497 age-related gene expression
signatures. The intersectStats
function computes their intersects with each
of the 3,318 drug-responsive DEG sets from CMAP. The result includes the
Jaccard index as a simple similarity metric for gene sets as well as the raw
and adjusted p-values based on the hypergeometric distribution expressing how
likely it is to obtain the observed intersect sizes just by chance. The
results for the 20 top scoring drugs are given below and the full data set is
written to a file named
degOL_PMID26490707.xls
.
Identify DEG overlaps with Sood et al. (2015)
Sood et al. (2015) reported 150 age-related gene expression signatures.
The intersectStats
function computes their intersects with each of the 3,318
drug-responsive DEG sets from CMAP. The result includes the Jaccard index as a simple
similarity metric for gene sets as well as the raw and adjusted p-values based on the
hypergeometric distribution expressing how likely it is to observe the observed intersect
sizes just by chance. The results for the 20 top scoring drugs are given below and the full
data set is written to a file named degOL_PMID26343147.xls
.
Drugs affecting known longevity genes
The following identifies CMAP drugs affecting the expression of the IGF1 or IGF1R genes.
The final result is written to a file named deg_IGF1.xls
.
Now the final data.frame
can be sorted by increasing mean FDR values.
Plot structures of compounds