## Generate list of CEL names defining treatment vs. control comparisons

The sampleList function extracts the sample comparisons (contrasts) from the CMAP annotation table and stores them as a list.

The following loads the MAS5 normalized expression data into a single data.frame. To accelerate the import, the data is read from rds files.

## Transform probe set to gene level data

The next step generates gene level expression values. If genes are represented by several probe sets then their mean intensities are used. This is necessary because the U133 chip contains many genes with duplicated probe sets. Probe sets not matching any gene are removed.

## DEG analysis with limma

The analysis of differentially expressed genes (DEGs) is performed with the limma package. Genes meeting the chosen cutoff criteria are reported as DEGs (below set to FDR of 10% and a minimum fold change of 2). The DEG matrix is written to a file named degMA.xls.

## Number of DEGs across drug treatments

The following plots the number of drug treatments (y-axis) for increasing bin sizes (x-axis) of DEGs.

## Identify DEG overlaps with Peters et al. (2015)

Peters et al. (2015) reported 1,497 age-related gene expression signatures. The intersectStats function computes their intersects with each of the 3,318 drug-responsive DEG sets from CMAP. The result includes the Jaccard index as a simple similarity metric for gene sets as well as the raw and adjusted p-values based on the hypergeometric distribution expressing how likely it is to obtain the observed intersect sizes just by chance. The results for the 20 top scoring drugs are given below and the full data set is written to a file named degOL_PMID26490707.xls.

## Identify DEG overlaps with Sood et al. (2015)

Sood et al. (2015) reported 150 age-related gene expression signatures. The intersectStats function computes their intersects with each of the 3,318 drug-responsive DEG sets from CMAP. The result includes the Jaccard index as a simple similarity metric for gene sets as well as the raw and adjusted p-values based on the hypergeometric distribution expressing how likely it is to observe the observed intersect sizes just by chance. The results for the 20 top scoring drugs are given below and the full data set is written to a file named degOL_PMID26343147.xls.

## Drugs affecting known longevity genes

The following identifies CMAP drugs affecting the expression of the IGF1 or IGF1R genes. The final result is written to a file named deg_IGF1.xls.

Now the final data.frame can be sorted by increasing mean FDR values.