Abstract
Web opinion feeds have become one of the most popular information sources users consult before buying products or contracting services. Negative opinions about a product can have a high impact in its sales figures. As a consequence, companies are more and more concerned about how to integrate opinion data in their business intelligence models so that they can predict sales figures or define new strategic goals. After analysing the requirements of this new application, this paper proposes a multidimensional data model to integrate sentiment data extracted from opinion posts in a traditional corporate data warehouse. Then, a new sentiment data extraction method that applies semantic annotation as a means to facilitate the integration of both types of data is presented. In this method, Wikipedia is used as the main knowledge resource, together with some well-known lexicons of opinion words and other corporate data and metadata stores describing the company products like, for example, technical specifications and user manuals. The resulting information system allows users to perform new analysis tasks by using the traditional OLAP-based data warehouse operators. We have developed a case study over a set of real opinions about digital devices which are offered by a wholesale dealer. Over this case study, the quality of the extracted sentiment data is evaluated, and some query examples that illustrate the potential uses of the integrated model are provided.
Similar content being viewed by others
References
Archak, N., Ghose, A., Ipeirotis, P.G. (2007). Show me the money!: Deriving the pricing power of product features by mining consumer reviews. In Proceedings of the 13th ACM SIGKDD (pp. 56–65).
Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings of the 22nd annual conference on research and development in information retrieval (ACM SIGIR) (pp. 222–229). Berkeley, CA.
Berry, M.W., & Castellanos, M. (2007). Survey of text mining II: Clustering, classification, and retrieval, 1st Edn. ISBN 1848000456, 9781848000452.
Bhide, M., Chakravarthy, V., Gupta, A., Gupta, H., Mohania, M., Puniyani, K., Roy, P., Roy, S., Sengar, V. (2008). Enhanced business intelligence using EROCS. In Proceedings of the 2008 IEEE 24th international conference on data engineering (pp. 1616–1619).
Bryl, V., Giuliano, C., Serafini, L., Tymoshenko, K. (2010). Supporting natural language processing with background knowledge: Coreference resolution case. In International semantic web conference (1) (pp. 80–95).
Codd, E.F. (1993). Providing OLAP (On-line Analytical Processing) to user-analysts: an IT mandate. Technical Report, E.F. Codd and Associates.
Dánger, R., & Berlanga, R. (2009). Generating complex ontology instances from documents. Journal of Algorithms, 64(1), 16–30. 1208
Deng, H., Lyu, M.R., King, I. (2009). A generalized Co-HITS algorithm and its application to bipartite graphs. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 239–248). New York, NY, U.S.A.: ACM. doi:10.1145/1557019.1557051, ISBN 978-1-60558-495-9.
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S. (2007). Duplicate record detection: a survey. IEEE Transactions on Knowledge and Data Engineering, 19, 1–16. doi:10.1109/TKDE.2007.9, ISSN 1041-4347.
Etzioni, O., Banko, M., Soderland, S., Weld, D.S. (2008). Open information extraction from the web. Communications of the Association for Computing Machinery, 51, 68–74. doi:10.1145/1409360.1409378, ISSN 0001-0782.
Funk, A., Li, Y., Saggion, H., Bontcheva, K., Leibold, C. (2008). Opinion analysis for business intelligence applications. In A. Duke, M. Hepp, K. Bontcheva, M.B. Vilain (Eds.), OBI, ACM international conference proceeding series (Vol. 308, p. 3). ACM, ISBN 978-1-60558-219-1.
García, L., Anaya, H., Berlanga, R., Aramburu, M.J. (2011). Probabilistic ranking of product features from customer reviews. In Iberian conference on pattern recognition and image analysis (IbPRIA 2011). Springer (to appear in Lecture Notes in Computer Science).
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 168–177). New York, NY: ACM Press.
Inmon, W.H. (2005). Building the data warehouse. Wiley.
ISLA (2010). The WIKIXML collection. http://ilps.science.uva.nl/WikiXML/.
Jimeno-Yepes, A., Jiménez-Ruiz, E., Lee, V., Gaudan, S., Berlanga, R., Rebholz-Schuhmann, D. (2008). Assessment of disease named entity recognition on a corpus of annotated sentences. BMC Bioinformatics, 9(Suppl 3), S3. doi:10.1186/1471-2105-9-S3-S3.
Johne, A. (1994). Listening to the voice of the market. International Marketing Review, 11(1), 47–59.
Kahan, J., & Koivunen, M.-R. (2001). Annotea: An open rdf infrastructure for shared web annotations. In Proceedings of the 10th international conference on World Wide Web, WWW ’01 (pp. 623–632). New York, NY, USA: ACM. doi:10.1145/371920.372166, ISBN 1-58113-348-0.
Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D. (2004). Semantic annotation, indexing, and retrieval. Web Semantics: Science, Services and Agents on the World Wide Web, 2(1), 49–79.
Kudama, S., Berlanga, R., García, L., Nebot, V., Aramburu, M.J. (2011). Towards tailored semantic annotation systems from Wikipedia. In Proceedings of the DEXA workshop, DEXA 2011. IEEE.
Liu, B., Hu, M., Cheng, J. (2005). Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on the World Wide Web (pp. 342–351).
Liu, Y., Huang, X., An, A., Yu, X. (2007). ARSA: A sentiment-aware model for predicting sales performance using blogs. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (pp. 607–614).
Lu, Y., Castellanos, M., Dayal, U., Zhai, C.X. (2011). Automatic construction of a context-aware sentiment lexicon: An optimization approach. In Proceedings of the 20th international conference on World Wide Web, WWW ’11 (pp. 347–356). New York, NY, USA: ACM. doi:10.1145/1963405.1963456, ISBN 978-1-4503-0632-4.
Mihalcea, R., & Csomai, A. (2007). Wikify!: Linking documents to encyclopedic knowledge. In CIKM ’07: Proceedings of the sixteenth ACM conference on conference on information and knowledge management (pp. 233–242). ACM. doi:10.1145/1321440.1321475, ISBN 978-1-59593-803-9.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Now Publishers Inc.
Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B. (2007). R-Cubes: OLAP cubes contextualized with documents. In Proceedings of the IEEE 23rd international conference on data engineering (pp. 1477–1478). 1282
Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B. (2008a). Towards a data warehouse contextualized with web opinions. In Proceedings of the 2008 IEEE international conference on e-Business engineering (pp. 697–702).
Pérez, J.M., Berlanga, R., Aramburu, M.J., Pedersen, T.B. (2008b). Contextualizing data warehouses with documents. Decision Support Systems, 45(1), 77–94.
Reidenbach, R.E. (2009). Listening to the voice of the market: How to increase market share and satisfy current customers. Crc Press.
Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M. (1966). The general inquirer: A computer approach to content analysis (Vol. 08). MIT Press.
Uren, V., Cimiano, P., Iria, J., Handschuh, S., Vargas-Vera, M., Motta, E., Ciravegna, F. (2006). Semantic annotation for knowledge management: Requirements and a survey of the state of the art. In Web semantics: Science, services and agents on the World Wide Web (Vol. 4, no. 1, pp. 14–28). doi:10.1016/j.websem.2005.10.002, ISSN 15708268.
Wang, H., Lu, Y., Zhai, C. (2010). Latent aspect rating analysis on review text data: A rating regression approach. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10 (pp. 783–792). New York, NY, USA: ACM. doi:10.1145/1835804.1835903.
Zhang, L., Liu, B., Lim, S.H., O’Brien-Strain, E. (2010). Extracting and ranking product features in opinion documents. In Proceedings of the 23rd international conference on computational linguistics (pp. 1462–1470). Beijing, China.
Acknowledgements
This work has been partially funded by the “Ministerio de Economía y Competitividad” with contract number TIN2011-24147, and the Fundació Caixa Castelló project P1- 1B2010-49.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
García-Moya, L., Kudama, S., Aramburu, M.J. et al. Storing and analysing voice of the market data in the corporate data warehouse. Inf Syst Front 15, 331–349 (2013). https://doi.org/10.1007/s10796-012-9400-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-012-9400-y