• May 27, 2016 News!The submission for Special Issue is officially open now!   [Click]
  • May 03, 2016 News!Vol.6, No.6 has been indexed by EI (Inspec).   [Click]
  • Mar 17, 2017 News!Vol.9, No.2 has been published with online version. 13 peer reviewed articles from 4 specific areas are published in this issue.   [Click]
General Information
Editor-in-chief
Prof. Wael Badawy
Department of Computing and Information Systems Umm Al Qura University, Canada
I'm happy to take on the position of editor in chief of IJCTE. We encourage authors to submit papers concerning any branch of computer theory and engineering.
IJCTE 2012 Vol.4(5): 726-730 ISSN: 1793-8201
DOI: 10.7763/IJCTE.2012.V4.566

A Failure Detection and Prediction Mechanism for Enhancing Dependability of Data Centers

Qiang Guan, Ziming Zhang, and Song Fu
Abstract—Modern data centers continue to grow in their scale and complexity. They are changing dynamically as well due to the addition and removal of system components, changing execution environments, frequent updates and upgrades, online repairs and more. Classical reliability theory and conventional methods do rarely consider the actual state of a system and are therefore not capable to reflect the dynamics of runtime systems and failure processes. In this paper, we present an unsupervised failure detection and prediction method using an ensemble of Bayesian models. It characterizes normal execution states of the system and detects anomalous behaviors. We implement a prototype of our failure detection and prediction mechanism and evaluate its performance on a data center test platform. Experimental results show that our proposed method can forecast failure dynamics with high accuracy.

Index Terms—Data centers, failure detection, failure management, dependable computing.

Q. Guan, Z. Zhang, and S. Fu are with the Department of Computer Science and Engineering, University of North Texas, Denton, Texas 76203 USA (e-mail: QiangGuan@my.unt.edu; ZimingZhang@my.unt.edu; Song.Fu@unt.edu, Tel.: +1-940-565-2341; fax: +1-940-565-2799).

[PDF]

Cite: Qiang Guan, Ziming Zhang, and Song Fu, "A Failure Detection and Prediction Mechanism for Enhancing Dependability of Data Centers," International Journal of Computer Theory and Engineering vol. 4, no. 5, pp. 726-730, 2012.
Copyright © 2008-2015. International Journal of Computer Theory and Engineering. All rights reserved.
E-mail: ijcte@vip.163.com