|
|
|
Tufts CAMDA Group Home Page
|
|
Introduction
|
The CAMDA’03
data sets are focused on lung cancers. Four microarray data sets were
released as a CAMDA'03 data challenge package. All four microarray data
sets (Harvard, Michigan, Ontario, and Stanford) are independently
acquired to ask the same questions in lung cancer biology. This year
challenge is to integrate information from different data sets.
We investigate
clustering and dimension-reduction techniques on two of the four CAMDA datasets
of gene expression values and survival times of patients with lung
adenocarcinomas. We chose the Michigan
and Harvard data due to the reasonably large sample sizes (n = 86 and
84) and lack of missing values. We use ADC maps to project the data into
one or two dimensions so we can use very simple clustering techniques, then
follow this with Nearest Shrunken Mean to reduce the number of genes used
to predict the clusters. We contrast
this with more classical techniques of variance ratios and hierarchical
clustering.
|
|
|
|
|
|
|