Entity attribute discovery and clustering from online reviews | Frontiers of Computer Science Skip to main content
Log in

Entity attribute discovery and clustering from online reviews

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Pang B, Lee L. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2008, 2(1–2): 1–135

    Article  Google Scholar 

  2. Liu B, Hu M, Cheng J. Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th International World Wide Web Conference. 2005, 342–351

    Chapter  Google Scholar 

  3. Hu M, Liu B. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 168–177

    Google Scholar 

  4. Popescu A M, Etzioni O. Extracting product features and opinions from reviews. In: Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing. 2005, 339–346

    Google Scholar 

  5. Miao Q, Li Q, Dai R. An integration strategy for mining product features and opinions. In: Proceedings of the 17th Conference on Information and Knowledge Management. 2008, 1369–1370

    Google Scholar 

  6. Giuseppe C, Raymond T, Ed Z. Extracting knowledge from evaluative text. In: Proceedings of the 3rd International Conference on Knowledge Capture. 2005, 11–18

    Google Scholar 

  7. Su Q, Xiang K, Wang H, Sun B, Yu S. Using pointwise mutual information to identify implicit features in customer reviews. In: Proceedings of the 21st International Conference on the Computer Processing of Oriental Languages. 2006

    Google Scholar 

  8. Shi B, Chang K. Mining Chinese reviews. In: Proceedings of the 6th IEEE International Conference on Data Mining. 2006, 585–589

    Google Scholar 

  9. Rayid G, Katharina P, Liu Y, Marko K, Andrew F. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41–48

    Article  Google Scholar 

  10. Wang B, Wang H. Bootstrapping both product properties and opinion words from Chinese reviews with cross-training. In: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence. 2007, 259–262

    Google Scholar 

  11. Jin W, Ho H. A novel lexicalized HMM based learning framework for web opinion mining. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 465–472

    Google Scholar 

  12. Qi L, Chen L. A linear-chain CRF-based learning approach for web opinion mining. In: Proceedings of the 11th International Conference on Web Information Systems Engineering. 2010, 128–141

    Google Scholar 

  13. Zhang S, Jia W, Xia Y, Meng Y, Yu H. Product features extraction and categorization in Chinese reviews. In: Proceedings of the 6th International Multi-Conference on Computing in the Global Information Technology. 2010, 38–43

    Google Scholar 

  14. Somprasertsri G, Lalitrojwong P. Automatic product feature extraction from online product reviews using maximum entropy with lexical and syntactic features. In: Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration. 2008, 250–255

    Chapter  Google Scholar 

  15. Miao Q, Li Q, Daniel Z. Mining fine grained opinions by using probabilistic models and domain knowledge, In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 2010, 358–365

    Google Scholar 

  16. Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 282–289

    Google Scholar 

  17. Su Q, Xu X, Guo H, Guo Z, Wu X, Zhang X, Swen B, Su Z. Hidden sentiment association in Chinese web opinion mining. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 959–968

    Chapter  Google Scholar 

  18. Guo H, Zhu H, Guo Z, Zhang X, Su Z. Product feature categorization with multilevel latent semantic association. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 1087–1096

    Google Scholar 

  19. Zhai Z, Liu B, Xu H, Jia P. Clustering product features for opinion mining. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 2011, 347–354

    Chapter  Google Scholar 

  20. Giuseppe P. A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering, 2009, 68(11), 1289–1308

    Article  Google Scholar 

  21. Rudi L, Paul M. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370–383

    Article  Google Scholar 

  22. Danushka B, Yutaka M, Mitsuru I. Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 757–766

    Google Scholar 

  23. Hu X, Sun N, Zhang C, Chua T. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 919–928

    Google Scholar 

  24. Patrick P, Dekang L. Discovering word senses from text. In: Proceedings of the 8th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2002, 613–619

    Google Scholar 

  25. Peter D T, Patrick P. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 2010, 37(1): 141–188

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingliang Miao.

Additional information

Qingliang Miao is a researcher in Fujitsu Research & Development Center. He received his PhD in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, China, in 2010. His primary research interests include knowledge management and opinion mining with broad applications on the web and E-Commerce.

Qiudan Li is an associate professor of the Laboratory of Complex Systems and Intelligence Science at the Institute of Automation in the Chinese Academy of Sciences. She received her PhD in computer science from Dalian University of Technology, China in 2004. Her research interests include web mining and mobile commerce applications.

Daniel Zeng is a research professor at the Chinese Academy of Sciences in the Institute of Automation. He received his PhD in industrial administration, in 1998 from the Graduate School of Industrial Administration (renamed Tepper School of Business) and The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA. His research interests include collaborative information and knowledge management, recommender systems, intelligence, and security informatics.

Yao Meng is a senior researcher in Fujitsu Research & Development Center. She received her PhD in computer science from Harbin Institute of Technology, China. Her research focuses on natural language processing with broad applications on web information processing and machine translation.

Shu Zhang is a researcher in Fujitsu Research & Development Center. She received her PhD in computer science from Harbin Institute of Technology, China, 2008. Her research focuses on document summarization, opinion mining, and entity analysis. She has around 20 papers published in journals and conferences.

Hao Yu is a general manager in Ricoh Software Research Center (Beijing) Co., Ltd. He received his PhD in computer science from Harbin Institute of Technology, China in 1998. His research focuses on natural language processing, machine translation, machine learning, and intelligent computing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miao, Q., Li, Q., Zeng, D. et al. Entity attribute discovery and clustering from online reviews. Front. Comput. Sci. 8, 279–288 (2014). https://doi.org/10.1007/s11704-014-3043-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-014-3043-8

Keywords