Gene selection for multiclass prediction of microarray data

D. Chen, D. Hua, J. Reifman, X. Cheng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations


Gene expression data from microarrays have been successfully applied to class prediction, where the purpose is to classify and predict the diagnostic category of a sample by its gene expression profile. A typical microarray dataset consists of expression levels for a large number of genes on a relatively small number of samples. As a consequence, one basic and important question associated with class prediction is: how do we identify a small subset of informative genes contributing the most to the classification task? Many methods have been proposed but most focus on two-class problems, such as discrimination between normal and disease samples. This paper addresses selecting informative genes for multiclass prediction problems by jointly considering all the classes simultaneously. Our approach is based on the power of the genes in discriminating among the different classes (e.g., tumor types) and the existing correlation between genes. We formulate the expression levels of a given gene by a one-way analysis of variance model with heterogeneity of variances, and determine the discriminatory power of the gene by a test statistic designed to test the equality of the class means. In other words, the discriminatory power of a gene is associated with a Behrens-Fisher problem. Informative genes are chosen such that each selected gene has a high discriminatory power and the correlation between any pair of selected genes is low. Test statistics considered in this paper include the ANOVA F test statistic, the Brown-Forsythe test statistic, the Cochran test statistic, and the Welch test statistic. Their performances are evaluated over several classification methods applied to two publicly available microarray datasets. The results show that Brown-Forsythe test statistic achieves the best performance.

Original languageEnglish
Title of host publicationProceedings of the 2003 IEEE Bioinformatics Conference, CSB 2003
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages4
ISBN (Electronic)0769520006, 9780769520001
StatePublished - 2003
Event2nd International IEEE Computer Society Computational Systems Bioinformatics Conference, CSB 2003 - Stanford, United States
Duration: 11 Aug 200314 Aug 2003

Publication series

NameProceedings of the 2003 IEEE Bioinformatics Conference, CSB 2003


Conference2nd International IEEE Computer Society Computational Systems Bioinformatics Conference, CSB 2003
Country/TerritoryUnited States


Dive into the research topics of 'Gene selection for multiclass prediction of microarray data'. Together they form a unique fingerprint.

Cite this