Abstract—In the strive for knowledge discovery in a world of
ever-growing data collection, it is important that even if a
dataset is altered to preserve people’s privacy, the information
in the dataset retains as much quality as possible. In this
context, “quality” refers to the accuracy or usefulness of the
information retrievable from a dataset. Defining and measuring
the loss of information after meeting privacy requirements
proves difficult however. Techniques have been developed to
measure the information quality of a dataset for a variety of
anonymization techniques including Generalization,
Suppression, and Randomization. Some measures analyze the
data, while others analyze the outputted data mining results
from tasks such as Clustering and Classification. This survey
discusses a collection of information measures, and issues
surrounding their usage and limitations.
Index Terms—Anonymization, data mining, data quality,
privacy preserving data mining.
S. Fletcher and M. Z. Islam are with the Center for Research in Complex
Systems (CRiCS), School of Computing and Mathematics, Charles Sturt
University, Bathurst NSW
[PDF]
Cite:Sam Fletcher and Md Zahidul Islam, "Measuring Information Quality for Privacy Preserving Data Mining," International Journal of Computer Theory and Engineering vol. 7, no. 1, pp. 21-28, 2015.