Abstract—During recent years, machine learning techniques have been attracting significant attentions in molecular biology and genomic era. They have become increasingly important to solve real-world problems such as elucidating protein function. An important step in the search for knowledge of protein function is to predict its cellular localization sites. Many computational methods that try to solve this problem have been developed over the years but the imbalanced distribution of proteins in cellular locations enormously influences the behavior of these methods. Hence, the performance and efficiency of the existing prediction methods still need to be improved. A computational method for efficiently predicting protein cellular localization is highly required. In this paper, we explore the use of four supervised machine learning algorithms in predicting the cellular localization sites of proteins from the primary sequence information. Our experiments were performed using Naïve Bayesian, k-Nearest Neighbor and feed-forward Neural Network classifiers. The experts were evaluated with and without cross-validation on E.coli and Yeast benchmarks and combined using majority voting rule for improving classification accuracy on each dataset. The experimental results show that the proposed combination system significantly outperforms the best individual classifier.
Index Terms—Protein localization, naïve Bayesian classifier, k-nearest neighbor classifier, neural network classifier, combination of classifiers, E.coli, yeast.
The authors are with the USTO-MB University, BP 1505 El Mnaouer 3100 Oran ALGERIA (e-mail: h_bouziane@ univ-usto.dz, messabih@ univ-usto.dz, chouarfia@ univ-usto.dz).
[PDF]
Cite:Hafida Bouziane, Belhadri Messabih, and Abdallah Chouarfa, "A Voting-Based Combination System for Protein Cellular Localization Sites Prediction," International Journal of Computer Theory and Engineering vol. 5, no. 4, pp. 585-592, 2013.