General Information
    • ISSN: 1793-8201 (Print), 2972-4511 (Online)
    • Abbreviated Title: Int. J. Comput. Theory Eng.
    • Frequency: Quarterly
    • DOI: 10.7763/IJCTE
    • Editor-in-Chief: Prof. Mehmet Sahinoglu
    • Associate Editor-in-Chief: Assoc. Prof. Alberto Arteta, Assoc. Prof. Engin Maşazade
    • Managing Editor: Ms. Mia Hu
    • Abstracting/Indexing: Scopus (Since 2022), INSPEC (IET), CNKI,  Google Scholar, EBSCO, etc.
    • Average Days from Submission to Acceptance: 192 days
    • E-mail: ijcte@iacsitp.com
    • Journal Metrics:

Editor-in-chief
Prof. Mehmet Sahinoglu
Computer Science Department, Troy University, USA
I'm happy to take on the position of editor in chief of IJCTE. We encourage authors to submit papers concerning any branch of computer theory and engineering.

IJCTE 2015 Vol.7(5): 362-365 ISSN: 1793-8201
DOI: 10.7763/IJCTE.2015.V7.986

Character Analysis Scheme for Compressing Text Files

Sunday Eric Adewumi

Abstract—Abstract—This scheme considers a text document made up of character such as letters of the alphabet, punctuation marks and special characters/symbols. If we represent each character that makes up the document as c1, c2, … , cn, compression is achieved by taking each of these characters that makes up the text one at a time and then search first, for the position of the last occurrence of a particular character being considered for compression together with the length of its digits, and then, starting from the beginning of the text file, note all the positions where this character has occurred. The positions of occurrence of this character while the search is on, is made equal to the length of the digit of the last occurrence of the character by padding it with zeroes to the left of the most significant bit, if need be. Concatenate the values representing the positions of the occurrence of a character and covert the concatenated string into a decimal value. Divide this value successively by 2 until the result lies between one and less than two. Store the quotient obtained from these divisions and the sum of the number of times the division was carried out as an index k. Decompression is the reverse of the steps just described, and this is achieved by taking each character; obtained their corresponding quotient (q), index k and length li. To recover the decimal positions of the concatenated values, we multiply the quotient (q) by 2k. We then use the length of this particular character to identify positions where they occurred. This scheme, which is lossless compression, has its ratio tending to zero when the text file is very large.

Index Terms—Compression, compression ratio, decompression, lossless, scheme, text file.

Sunday Eric Adewumi is with the Federal University Lokoja, Nigeria (email: sunday.adewumi@fulokoja.edu.ng).

[PDF]

Cite:Sunday Eric Adewumi, "Character Analysis Scheme for Compressing Text Files," International Journal of Computer Theory and Engineering vol. 7, no. 5, pp. 362-365, 2015.


Copyright © 2008-2024. International Association of Computer Science and Information Technology. All rights reserved.