(A. Das, S. Pati, H. Huang, C. Chen) Cancer Classification by Gene Subset Selection from MicroarrayDataset

Abstract: Microarray dataset contains huge number of genes, many of which are irrelevant regarding cancer classification and as a result classification accuracy is reduced. Therefore, the dataset should be pre-processed to filter out these redundant genes. In this paper, initially a Pareto optimality based Multi-objective Genetic Algorithm has been proposed where non-linear cellular automata is employed to overcome the demerits of random initialization to generate initial population in high dimensional space. The fitness functions are defined based on both attribute dependency and boundary region exploration of rough set theory and Log-Likelihood ratio to select the informative genes. The chromosomes are hybridized by applying multi-point crossover; whereas proximity mutation builds on Flip-bit mutation with a little modification to produce fittest offspring. Finally, the gene subset with strong biological significance in cancer treatment is obtained from the Pareto dominant solutions. Performances are investigated on publicly available microarray cancer datasets and compared with the state-of-the-art methods to demonstrate the effectiveness of the proposed method.

Keywords: cellular automata, gene selection, log-likelihood ratio, multi-objective genetic algorithm, proximity mutation, rough set theory

Categories: F.1.1, F.4.1, G.1.6, H.0, I.2.4, M.7