Abstract—Machine learning has increasingly attracted the attention of healthcare service providers due to its capacity to collect and analyze huge volumes of data to facilitate effective predictions and treatments. Disease data are high-dimensional data that include noise and irrelevant attributes, and the computational time and accuracy of classification algorithms used in machine learning have been a major concern. Various methods have been proposed to reduce data dimensionality by removing redundant attributes, such as step-wise backward selection and subspace clustering methods. However, removing redundant attributes may affect the accuracy of the algorithm. Based on an observation that finding hidden features, such as the relationships among attributes, can improve classification efficiency, a new hidden subspace clustering model is proposed in this paper. Experiment results show that the proposed method can reduce data dimensionality and improve the accuracy of the classification method.
Index Terms—High-dimensional data, subspace clustering, random projection, classification.
V. D. Minh and M. Kimura are with Department of Information Science and Engineering, Shibaura Institute of Technology, Tokyo, Japan (e-mail: masaomi@sic.shibaura-it.ac.jp, nb17502@shibaura-it.ac.jp).
[PDF]
Cite:Vu-Dinh Minh and Masaomi Kimura, "Subspace-Based Method to Improve Classification Accuracy of High-Dimensional Data," International Journal of Computer Theory and Engineering vol. 10, no. 6, pp. 180-184, 2018.