Abstract—Data Engineering is one of the Knowledge Elicitation and Analysis methods, among serveral techniques; Feature Selection methods play an important role for these processes which are the processes in data mining technique especially classification tasks. The filtering process is an important pre-treatment for every classification process. Not only decreasing the computational time and cost, but selecting an appropriate variable is increasing the classification accuracy also. In this paper, the Thalassemia knowledge was elicited using Data engineering techniques (PCA, Pearson’s Chi square and Machine Learning). This knowledge presented in form of the comparison of classification performance of machine learning techniques between using Principal Components Analysis (PCA) and Pearson’s Chi square for screening the genotypes of β-Thalassemia patients. According to using PCA, the classification results show that the Multi-Layer Perceptron (MLP) is the best algorithm, providing that the percentage of accuracy reaches 86.61, K- Nearest Neighbors (KNN), Naive Bayes, Bayesian Networks (BNs) and Multinomial Logistic Regression with the percentage of accuracy 85.83, 85.04, 85.04 and 82.68. On the other hand, these results were compared to the Pearson’s Chi Square and presented that…. In the future, we will search for the other feature selection techniques in order to improve the classification performance such as the hybrid method, filtering mathod etc.
Index Terms—Knowledge elicitation, data engineering, feature selection, principal component analysis (PCA), pearson’s chisquare, machine learning, β-thalassemia.
Patcharaporn Paokanta is with the Development, System Analysis and Design, and Information Technology at the College of Arts, Media and Technology, Chiang Mai University (CMU), Thailand.
Cite: P. Paokanta, "β-Thalassemia Knowledge Elicitation Using Data Engineering: PCA, Pearson’s Chi Square and Machine Learning," International Journal of Computer Theory and Engineering vol. 4, no. 5, pp. 702-706, 2012.