Introduction
Paper
Report
Source Code
Slides
FAQ
People
Links
Tufts CAMDA Group Home Page
Introduction
The CAMDA’03 data sets are focused on lung cancers. Four microarray data sets were released as a CAMDA'03 data challenge package. All four microarray data sets (Harvard, Michigan, Ontario, and Stanford) are independently acquired to ask the same questions in lung cancer biology. This year challenge is to integrate information from different data sets.

We investigate clustering and dimension-reduction techniques on two of the four CAMDA datasets of gene expression values and survival times of patients with lung adenocarcinomas. We chose the Michigan and Harvard data due to the reasonably large sample sizes (n = 86 and 84) and lack of missing values. We use ADC maps to project the data into one or two dimensions so we can use very simple clustering techniques, then follow this with Nearest Shrunken Mean to reduce the number of genes used to predict the clusters. We contrast this with more classical techniques of variance ratios and hierarchical clustering.


Introduction | Paper | Report | Source Code | Slides | FAQ | People | Links