Download data from Connectivity Map project site
The drug-related expression data are downloaded from the CMAP web site
here. The getCmap
function downloads
the CMAP rank matrix along with the compound annotations, and getCmapCEL
downloads the corresponding 7,056 CEL files. The functions will write the
downloaded files to the data
and data/CEL
directories within the present
working directory of the user’s R session. Since some of the raw data sets
are large, the functions will only rerun the download if the argument rerun
is assigned TRUE
. If the raw data are not needed then users can skip this
time consuming download step and work with the preprocessed data
obtained in the next section.
Overview of CMAP data
The experimental design of the CMAP project is defined in the file
cmap_instances_02.xls
. Note, this file required some cleaning in LibreOffice
(Excel would work for this too). After this it was saved as tab delimited txt
file named
cmap_instances_02.txt.
The following count statisitics are extracted from this file.
The panel of cell lines used by CMAP includes
MCF7,
ssMCF7,
HL60,
PC3 and
SKMEL5.
Each cell type was subjected to the following number of total treatments and number
of distinct drugs, respectively. The total number of drugs used by CMAP is 1,309.
The number Affymetrix chip used in the experiments is plotted here for each of
the three chip types used by CMAP: