Abstract—earch results clustering techniques help end users to find their related results easier. Both producing correct cluster contents and assigning descriptive, meaningful labels to the clusters are crucial for these techniques. Lingo is one of the most popular algorithms which consider both and it is known as a
description-comes-first algorithm. Lingo has success on assigning descriptive, human-readable cluster labels, but it actually has a minor drawback on assigning documents to the clusters, which cause low recall values. In this paper, we propose two main modifications for the
Cluster Content Discovery and the
Cluster Label Induction phases of the Lingo algorithm. The evaluation of the experimental result shows that, although it causes a slight decrease in the precision, our modified Lingo algorithm provides quite higher recall and f-measure values.
Index Terms—Information retrieval, search results clustering, cluster content discovery, cluster labeling.
The authors are with Hacettepe University, Turkey (e-mail: seyfullahdemir@gmail.com).
[PDF]
Cite:Seyfullah Demir, Ebru A. Sezer, and Hayri Sever, "Modifications for the Cluster Content Discovery and the Cluster Label Induction Phases of the Lingo Algorithm," International Journal of Computer Theory and Engineering vol. 6, no. 2, pp. 86-90, 2014.