Abstract—With the advances in IT technology and the rapid adoption of smart devices, users can more easily produce, distribute and consume data through network access anytime, anywhere. The data generated by users in response to these changes has increased dramatically. This has required companies to collect large amounts of logs, and these companies are actively researching and developing big data collection technologies. In this paper, we have studied the big data collection technology based on Apache Flume for bulk log collection. The structure for bulk log processing is designed to be matched with one web server and one Flume agent, and the Flume agents connected to the web server are connected to the Flume agent that plays the role of storing in the Hadoop distributed file system. This makes the collection of big data logs more efficient.
Index Terms—Big data, big data collection technology, Apache Flume, Apache Chukwa, hadoop distributed file system.
Sooyong Jung and Yongtae Shin are with Dept. of Computer Science Graduate School, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, Korea (06978) (e-mail: kevinhaha777@gmail.com, sooyong.jung@gmail.com, shin@ssu.ac.kr).
[PDF]
Cite:Sooyong Jung and Yongtae Shin, "Study of the Big Data Collection Scheme Based Apache Flume for Log Collection," International Journal of Computer Theory and Engineering vol. 10, no. 3, pp. 97-100, 2018.