Abstract—Retrieving keywords requires speed and
compactness. A trie is one of the data structure to retrieve
keywords, and the double array is one of the implementation
methods for the trie. The retrieval algorithm for the double
array is fast, and its data structure has compactness. An edge of
the trie is represented by a character in previous researches
related to the double array, but there are no researches
discussing if the edge is represented n-gram. Therefore, this
paper proposes the data structure and the retrieval algorithm
for the double array which represents an edge by n-byte. This
paper also proposes a method to compress CODE array. From
the experimental results comparing with the original double
array by using single-byte and multi-byte character sets, the
size and the retrieval speed of the proposed method became
62-64% and 1.18-1.3 times, respectively. When the CODE is
compressed, the sizes of the proposed method became 41-59%.
Index Terms—Compression, double array, n-gram, trie.
The authors are with the Department of Information Science and
Intelligent, University of Tokushima, Tokushima, Japan (e-mail:
fuketa@is.tokushima-u.ac.jp, kam@is.tokushima-u.ac.jp,
aoe@is.tokushima-u.ac.jp).
[PDF]
Cite:Masao Fuketa, Kazuhiro Morita, and Jun-Ichi Aoe, "A Retrieval Method for Double Array Structures by Using Byte N-Gram," International Journal of Computer Theory and Engineering vol. 6, no. 2, pp. 155-159, 2014.