## Determine chip type from CEL files

The CMAP data set is based on three different Affymetrix chip types (HG-U133A, HT_HG-U133A and U133AAofAv2). The following extracts the chip type information from the CEL files and stores the result in an rds file with the path ./data/chiptype.rds. Users who skipped the download of the CEL files can download this file here.

## Normalization of CEL files

The follwoing processes the CEL files from each chip type separately using the MAS5 normalization algorithm. The results will be written to 3 subdirectores under data that are named after the chip type names. To save time, the processing is parallelized with BiocParallel to run on 100 CPU cores of a computer cluster with a scheduler (e.g. Torque). The number of CEL files from each chip type are: 807 CEL files from HG-U133A, 6029 CEL files from HT_HG-U133A, and 220 CEL files from U133AAofAv2. Note, these numbers are slightly different than those reported in the cmap_instances_02.txt file. The MAS5 normalized data sets can be downloaded here: HG-U133A, HT_HG-U133A, U133AAofAv2.

## Clean-up of intermediate files

This deletes intermediate files. Before executing these lines, please make sure that this is what you want.

## Obtain annotation information

The following generates annotation information for the Affymetirx probe set identifiers. Note, the three different Affymetrix chip types used by CMAP share most probe set ids (>95%), meaning it is possible to combine the data after normalization and use the same annotation package for all of them. The annotation libraries for the chip types HG-U133A and HT_HG-U133A are hgu133a.db and hthgu133a.db, respectively. However, there is no annotation library (e.g. CDF) available for U133AAofAv2. The annotation file can be downloaded from here: myAnnot.xls.