Determine chip type from CEL files
The CMAP data set is based on three different Affymetrix chip types (HG-U133A,
HT_HG-U133A and U133AAofAv2). The following extracts the chip type information
from the CEL files and stores the result in an rds
file with the path
./data/chiptype.rds
. Users who skipped the download of the CEL files can
download this file here.
Normalization of CEL files
The follwoing processes the CEL files from each chip type separately using the
MAS5 normalization algorithm. The results will be written to 3 subdirectores
under data
that are named after the chip type names. To save time, the
processing is parallelized with BiocParallel
to run on 100 CPU cores of a
computer cluster with a scheduler (e.g. Torque). The number of CEL files from
each chip type are: 807 CEL files from HG-U133A, 6029 CEL files from
HT_HG-U133A, and 220 CEL files from U133AAofAv2. Note, these numbers are slightly
different than those reported in the cmap_instances_02.txt
file. The MAS5 normalized data
sets can be downloaded here:
HG-U133A,
HT_HG-U133A,
U133AAofAv2.
Combine results from same chip type in single data frame
This deletes intermediate files. Before executing these lines, please make sure that this is what you want.
The following generates annotation information for the Affymetirx probe set
identifiers. Note, the three different Affymetrix chip types used by CMAP
share most probe set ids (>95%), meaning it is possible to combine the data
after normalization and use the same annotation package for all of them. The
annotation libraries for the chip types HG-U133A and HT_HG-U133A are
hgu133a.db
and hthgu133a.db
, respectively. However, there is no annotation
library (e.g. CDF) available for U133AAofAv2. The annotation file can be downloaded
from here: myAnnot.xls
.