Abstract
The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.
Similar content being viewed by others
References
Pang B, Lee L. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2008, 2(1–2): 1–135
Liu B, Hu M, Cheng J. Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th International World Wide Web Conference. 2005, 342–351
Hu M, Liu B. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 168–177
Popescu A M, Etzioni O. Extracting product features and opinions from reviews. In: Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing. 2005, 339–346
Miao Q, Li Q, Dai R. An integration strategy for mining product features and opinions. In: Proceedings of the 17th Conference on Information and Knowledge Management. 2008, 1369–1370
Giuseppe C, Raymond T, Ed Z. Extracting knowledge from evaluative text. In: Proceedings of the 3rd International Conference on Knowledge Capture. 2005, 11–18
Su Q, Xiang K, Wang H, Sun B, Yu S. Using pointwise mutual information to identify implicit features in customer reviews. In: Proceedings of the 21st International Conference on the Computer Processing of Oriental Languages. 2006
Shi B, Chang K. Mining Chinese reviews. In: Proceedings of the 6th IEEE International Conference on Data Mining. 2006, 585–589
Rayid G, Katharina P, Liu Y, Marko K, Andrew F. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41–48
Wang B, Wang H. Bootstrapping both product properties and opinion words from Chinese reviews with cross-training. In: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence. 2007, 259–262
Jin W, Ho H. A novel lexicalized HMM based learning framework for web opinion mining. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 465–472
Qi L, Chen L. A linear-chain CRF-based learning approach for web opinion mining. In: Proceedings of the 11th International Conference on Web Information Systems Engineering. 2010, 128–141
Zhang S, Jia W, Xia Y, Meng Y, Yu H. Product features extraction and categorization in Chinese reviews. In: Proceedings of the 6th International Multi-Conference on Computing in the Global Information Technology. 2010, 38–43
Somprasertsri G, Lalitrojwong P. Automatic product feature extraction from online product reviews using maximum entropy with lexical and syntactic features. In: Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration. 2008, 250–255
Miao Q, Li Q, Daniel Z. Mining fine grained opinions by using probabilistic models and domain knowledge, In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 2010, 358–365
Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 282–289
Su Q, Xu X, Guo H, Guo Z, Wu X, Zhang X, Swen B, Su Z. Hidden sentiment association in Chinese web opinion mining. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 959–968
Guo H, Zhu H, Guo Z, Zhang X, Su Z. Product feature categorization with multilevel latent semantic association. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 1087–1096
Zhai Z, Liu B, Xu H, Jia P. Clustering product features for opinion mining. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 2011, 347–354
Giuseppe P. A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering, 2009, 68(11), 1289–1308
Rudi L, Paul M. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370–383
Danushka B, Yutaka M, Mitsuru I. Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 757–766
Hu X, Sun N, Zhang C, Chua T. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 919–928
Patrick P, Dekang L. Discovering word senses from text. In: Proceedings of the 8th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2002, 613–619
Peter D T, Patrick P. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 2010, 37(1): 141–188
Author information
Authors and Affiliations
Corresponding author
Additional information
Qingliang Miao is a researcher in Fujitsu Research & Development Center. He received his PhD in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, China, in 2010. His primary research interests include knowledge management and opinion mining with broad applications on the web and E-Commerce.
Qiudan Li is an associate professor of the Laboratory of Complex Systems and Intelligence Science at the Institute of Automation in the Chinese Academy of Sciences. She received her PhD in computer science from Dalian University of Technology, China in 2004. Her research interests include web mining and mobile commerce applications.
Daniel Zeng is a research professor at the Chinese Academy of Sciences in the Institute of Automation. He received his PhD in industrial administration, in 1998 from the Graduate School of Industrial Administration (renamed Tepper School of Business) and The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA. His research interests include collaborative information and knowledge management, recommender systems, intelligence, and security informatics.
Yao Meng is a senior researcher in Fujitsu Research & Development Center. She received her PhD in computer science from Harbin Institute of Technology, China. Her research focuses on natural language processing with broad applications on web information processing and machine translation.
Shu Zhang is a researcher in Fujitsu Research & Development Center. She received her PhD in computer science from Harbin Institute of Technology, China, 2008. Her research focuses on document summarization, opinion mining, and entity analysis. She has around 20 papers published in journals and conferences.
Hao Yu is a general manager in Ricoh Software Research Center (Beijing) Co., Ltd. He received his PhD in computer science from Harbin Institute of Technology, China in 1998. His research focuses on natural language processing, machine translation, machine learning, and intelligent computing.
Rights and permissions
About this article
Cite this article
Miao, Q., Li, Q., Zeng, D. et al. Entity attribute discovery and clustering from online reviews. Front. Comput. Sci. 8, 279–288 (2014). https://doi.org/10.1007/s11704-014-3043-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-014-3043-8