• Mar 05, 2019 News!Vol.11, No.1 has been published with online version.   [Click]
  • Aug 06, 2018 News!Vol.9, No.1-Vol.9, No.4 have been indexed by EI (Inspec).   [Click]
  • Dec 29, 2018 News!Vol.10, No.6 has been published with online version.   [Click]
General Information
Prof. Wael Badawy
Department of Computing and Information Systems Umm Al Qura University, Canada
I'm happy to take on the position of editor in chief of IJCTE. We encourage authors to submit papers concerning any branch of computer theory and engineering.
IJCTE 2015 Vol.7(5): 362-365 ISSN: 1793-8201
DOI: 10.7763/IJCTE.2015.V7.986

Character Analysis Scheme for Compressing Text Files

Sunday Eric Adewumi
Abstract—Abstract—This scheme considers a text document made up of character such as letters of the alphabet, punctuation marks and special characters/symbols. If we represent each character that makes up the document as c1, c2, … , cn, compression is achieved by taking each of these characters that makes up the text one at a time and then search first, for the position of the last occurrence of a particular character being considered for compression together with the length of its digits, and then, starting from the beginning of the text file, note all the positions where this character has occurred. The positions of occurrence of this character while the search is on, is made equal to the length of the digit of the last occurrence of the character by padding it with zeroes to the left of the most significant bit, if need be. Concatenate the values representing the positions of the occurrence of a character and covert the concatenated string into a decimal value. Divide this value successively by 2 until the result lies between one and less than two. Store the quotient obtained from these divisions and the sum of the number of times the division was carried out as an index k. Decompression is the reverse of the steps just described, and this is achieved by taking each character; obtained their corresponding quotient (q), index k and length li. To recover the decimal positions of the concatenated values, we multiply the quotient (q) by 2k. We then use the length of this particular character to identify positions where they occurred. This scheme, which is lossless compression, has its ratio tending to zero when the text file is very large.

Index Terms—Compression, compression ratio, decompression, lossless, scheme, text file.

Sunday Eric Adewumi is with the Federal University Lokoja, Nigeria (email: sunday.adewumi@fulokoja.edu.ng).


Cite:Sunday Eric Adewumi, "Character Analysis Scheme for Compressing Text Files," International Journal of Computer Theory and Engineering vol. 7, no. 5, pp. 362-365, 2015.

Copyright © 2008-2019. International Journal of Computer Theory and Engineering. All rights reserved.
E-mail: ijcte@iacsitp.com