Abstract—The main aim of this paper is to find the smallest set of genes that can ensure highly accurate classification of cancer from micro array data by using supervised machine learning algorithms. The significance of finding the minimum gene subset is three fold: 1) It greatly reduces the computational burden and noise arising from irrelevant genes. 2) It simplifies gene expression tests to include only a very small number of genes rather than thousands of genes, which can bring down the cost for cancer testing significantly. 3) It calls for further investigation into the possible biological relationship between these small numbers of genes and cancer development and treatment. Our simple yet very effective method involves two steps. In the first step, we choose some important genes using a 2 way Analysis of Variance (ANOVA) ranking scheme. In the second step, we test the classification capability of all simple combinations of those important genes using a good classifier such as Support Vector Machines. Our approach obtained very high accuracy with only two genes.
Index Terms—Gene expressions, Cancer classification, Neural networks, Support vector machines
A. Bharathi, Dr. A. M. Natarajan, Bannari Amman Institute of Technology Sathyamangalam, Tamil Nadu (email: email@example.com, firstname.lastname@example.org)
Cite: A. Bharathi, A. M. Natarajan, "Cancer Classification of Bioinformatics datausing ANOVA," International Journal of Computer Theory and Engineering vol. 2, no. 3, pp. 369-373, 2010.