| |
|
A recently published article by Dr.
Henry Thompson et al (Cancer Prevention Laboratory)
made the front cover of the June 2004 issue of Cancer
Epidemiology Biomarkers & Prevention. The cover
depicts a pairwise correlation graphs and histograms
of nuclear morphometric parameters superimposed
over Feulgen stained lung epithelial cells obtained
from cell culture. |
Wolfe P, Murphy J, McGinley J, Zhu Z, Jiang W, Gottschall
EB, Thompson HJ. (2004) Using nuclear morphometry to
discriminate the tumorigenic potential of cells: a comparison
of statistical methods. Cancer Epidemiol Biomarkers
Prev. Jun;13(6):976-88.
Abstract
Despite interest in the use of nuclear morphometry for
cancer diagnosis and prognosis as well as to monitor
changes in cancer risk, no generally accepted statistical
method has emerged for the analysis of these data. To
evaluate different statistical approaches, Feulgen-stained
nuclei from a human lung epithelial cell line, BEAS-2B,
and a human lung adenocarcinoma (non-small cell) cancer
cell line, NCI-H522, were subjected to morphometric
analysis using a CAS-200 imaging system. The morphometric
characteristics of these two cell lines differed significantly.
Therefore, we proceeded to address the question of which
statistical approach was most effective in classifying
individual cells into the cell lines from which they
were derived. The statistical techniques evaluated ranged
from simple, traditional, parametric approaches to newer
machine learning techniques. The multivariate techniques
were compared based on a systematic cross-validation
approach using 10 fixed partitions of the data to compute
the misclassification rate for each method. For comparisons
across cell lines at the level of each morphometric
feature, we found little to distinguish nonparametric
from parametric approaches. Among the linear models
applied, logistic regression had the highest percentage
of correct classifications; among the nonlinear and
nonparametric methods applied, the Classification and
Regression Trees model provided the highest percentage
of correct classifications. Classification and Regression
Trees has appealing characteristics: there are no assumptions
about the distribution of the variables to be used,
there is no need to specify which interactions to test,
and there is no difficulty in handling complex, high-dimensional
data sets containing mixed data types.
|