Abstract—We introduce LG-encoding, a novel approach to
text encoding that shuffles the position of letters anticipating an
improved compression performance. Our technique brings
together the repeating letters in a word, so as to inflate
redundancy to be exploited by the compression algorithm to
follow. The encoding process introduces no significant
overhead: It is easily reversible as it only involves repositioning
the letters in a text. We experiment LG-encoding on text from 4
different source languages: English, French, German, and
Spanish with a set of well-known compression algorithms that
follows the encoding: Arithmetic Coding, Huffman Coding,
BWT and PPM. Our results yield promising outcomes as we
achieve substantially better compression rates for Arithmetic
Coding and Huffman Coding that follows LG-encoding. We
also propose use of our method in large data repositories, such
as cloud, as it also provides significant level of security by
shuffling the letters of words in text.
Index Terms—Text encoding, lossless text compression.
The authors are with the University of Texas at Dallas, Department of
Computer Science, Richardson, TX USA (e-mail: {exc067000,
hxv121530}@utdallas.edu).
[PDF]
Cite:Ebru Celikel Cankaya and Hina Vinayak, "A Novel Text Processing for Better Compression and Security in Cloud," International Journal of Computer Theory and Engineering vol. 8, no. 1, pp. 1-6, 2016.