Abstract—The possibility to face pattern recognition
problems directly on structured domains (e.g., multimedia data,
strings, graphs) is fundamental to the effective solution of many
interesting applications. In this paper, we deal with a clustering
problem defined in the string domain, focusing on the problem
of cluster representation in data domains where only a
dissimilarity measure can be fixed. To this aim, we adopt the
MinSOD (Minimum Sum of Distances) cluster representation
technique, which defines the representative as the element of
the cluster minimizing the sum of dissimilarities from all the
other elements in the considered set. Since the precise
computation of the MinSOD have a high computational cost, we
propose a suboptimal procedure consisting in computing the
representative of the cluster considering only a reduced pool of
samples, instead of the whole set of objects in the cluster. We
have carried out some tests in order to ascertain the sensitivity
of the clustering procedure with respect to the number of
samples in the pool used to compute the MinSOD. Results show
a good robustness of the proposed procedure. The
implementations are available as part of the SPARE library,
which is available as an open source project.
Index Terms—Clustering strings, MinSOD representative,
software library.
The authors are with Department of Information Engineering, Electronics
and Telecommunications, SAPIENZA University of Rome, Via Eudossiana
18, 00184 Rome, Italy (e-mail: delvescovo, livi@diet.uniroma1.it,
antonello.rizzi@uniroma1.it, mascioli@infocom.uniroma1.it).
[PDF]
Cite:Guido Del Vescovo, Lorenzo Livi, Fabio Massimo Frattale Mascioli, and Antonello Rizzi, "On the Problem of Modeling Structured Data with the MinSOD Representative," International Journal of Computer Theory and Engineering vol. 6, no. 1, pp. 9-14, 2014.