Abstract—Today increase in worldwide business led to offices distributed across geographical location .Hence data are loosely distributed across regionalized large scale databases across regionalized offices. To perform data mining it is required to merge distributed data and perform data mining algorithm on it. Cloud computing poses a diversity of challenges in data mining operation arising out of the dynamic structure of data distribution as against the use of typical database scenarios in conventional architecture. This document presents a way to implement Hierarchical Agglomerative Clustering Algorithm in such way so as to make it suitable for large dataset and increase its efficiency by executing task in parallel. The result shows that with increase in data set linear growth of execution time.
Index Terms—Star cluster, hierarchal agglomerative clustering, virtual k mean, cloud computing.
Kriti Srivastava is with the D. J. Sanghvi College of Engineering, Mumbai, India (e-mail: kriti.srivastava@djsce.ac.in).
R Shah is with the Capgemini, Mumbai, India (e-mail: ronshah123@gmail.com).
D. Valia is with the Sokrati, Pune, India (e-mail: dwalia@gmail.com).
[PDF]
Cite:Kriti Srivastava, R. Shah, D. Valia, and H. Swaminarayan, "Data Mining Using Hierarchical Agglomerative Clustering Algorithm in Distributed Cloud Computing Environment," International Journal of Computer Theory and Engineering vol. 5, no. 3, pp. 520-522, 2013.