Home > Published Issues > 2015 > Volume 10, No. 9, September 2015 >

Compression and Data Mining

Dan A. Simovici, Ping Chen, Tong Wang, and Dan Pletea
Univ. of Massachusetts Boston, Boston, USA

Abstract—Data compression plays an important role in data mining in assessing the minability of data and a modality of evaluating similarities between complex objects. We discuss various mining applications ranging from compressibility of strings of symbols and of languages, graph compressibility, compression of market basket data. Also, we examine the role of compression in computing similarity in text corpora and we propose a novel approach for assessing the quality of text summarization.

Index Terms—Compression ratio, Thue-Morse sequence, lossless compression, stemming, lemmatizing

Cite: Dan A. Simovici, Ping Chen, Tong Wang, and Dan Pletea, "Compression and Data Mining," Journal of Communications, vol. 10, no. 9, pp. 677-684, 2015. Doi: 10.12720/jcm.10.9.677-684