TY - JOUR
T1 - Logistic support vector machines and their application to gene expression data
AU - Liu, Zhenqiu
AU - Chen, Dechang
AU - Xu, Ying
AU - Liu, Jian
PY - 2005
Y1 - 2005
N2 - One important feature of the gene expression data is that the number of genes m far exceeds the number of samples n. When applied to analyse the gene expression data, standard statistical methods do not work well when n < m. Development of new methodologies or modification of existing methodologies is needed for the analysis of microarray data. Support vector machine (SVM) has been applied in gene expression data classification. In traditional SVM classification, a classifier is usually built by a small subset of samples called support vectors. This may cause a loss of available information since the number of samples in a gene expression dataset is usually very small. In this paper, we introduce a logistic support vector machine (LSVM) algorithm for the classification task. In LSVM, all the samples are used as support vectors and parameters are estimated via the maximum a posteriori (MAP) estimation procedure. The proposed algorithm also has the advantage of providing an estimate of the underlying probability. This algorithm was applied to five different gene expression datasets. Computational results show that compared with popular classification methods such as traditional SVM, our algorithm usually leads to an improvement in classification accuracy.
AB - One important feature of the gene expression data is that the number of genes m far exceeds the number of samples n. When applied to analyse the gene expression data, standard statistical methods do not work well when n < m. Development of new methodologies or modification of existing methodologies is needed for the analysis of microarray data. Support vector machine (SVM) has been applied in gene expression data classification. In traditional SVM classification, a classifier is usually built by a small subset of samples called support vectors. This may cause a loss of available information since the number of samples in a gene expression dataset is usually very small. In this paper, we introduce a logistic support vector machine (LSVM) algorithm for the classification task. In LSVM, all the samples are used as support vectors and parameters are estimated via the maximum a posteriori (MAP) estimation procedure. The proposed algorithm also has the advantage of providing an estimate of the underlying probability. This algorithm was applied to five different gene expression datasets. Computational results show that compared with popular classification methods such as traditional SVM, our algorithm usually leads to an improvement in classification accuracy.
KW - gene expression
KW - kernels
KW - maximum a posteriori estimation
KW - support vector machines
KW - tumour classification
UR - http://www.scopus.com/inward/record.url?scp=33846977847&partnerID=8YFLogxK
U2 - 10.1504/IJBRA.2005.007576
DO - 10.1504/IJBRA.2005.007576
M3 - Article
C2 - 18048128
AN - SCOPUS:33846977847
SN - 1744-5485
VL - 1
SP - 169
EP - 182
JO - International Journal of Bioinformatics Research and Applications
JF - International Journal of Bioinformatics Research and Applications
IS - 2
ER -