Abstract
Today, the Web is the largest source of information worldwide. There is currently a strong trend for decision-making applications such as Data Warehousing (DW) and Business Intelligence (BI) to move onto the Web, especially in the cloud. Integrating data into DW/BI applications is a critical and time-consuming task. To make better decisions in DW/BI applications, next generation data integration poses new requirements to data integration systems, over those posed by traditional data integration. In this paper, we propose a generic, metadata-based, service-oriented, and event-driven approach for integrating Web data timely and autonomously. Beside handling data heterogeneity, distribution and interoperability, our approach satisfies near real-time requirements and realize active data integration. For this sake, we design and develop a framework that utilizes Web standards (e.g., XML and Web services) for tackling data heterogeneity, distribution and interoperability issues. Moreover, our framework utilizes Active XML (AXML) to warehouse passive data as well as services to integrate active and dynamic data on-the-fly. AXML embedded services and changes detection services ensure near real-time data integration. Furthermore, the idea of integrating Web data actively and autonomously revolves around mining events logged by the data integration environment. Therefore, we propose an incremental XML-based algorithm for mining association rules from logged events. Then, we define active rules dynamically upon mined data to automate and reactivate integration tasks. Finally, as a proof of concept, we implement a framework prototype as a Web application using open-source tools.
Similar content being viewed by others
References
Abiteboul, S., Benjelloun, O., Milo, T. (2002). Web services and data integration. In Proceedings of the 3rd international conference on web information systems engineering, WISE ’02, (pp. 3–6). Washington, DC, USA: IEEE Computer Society.
Abiteboul, S., Nguyen, B., Ruberg, G. (2006). Building an active content warehouse In Darmont, & Boussaïd (Eds.), Processing and managing complex data for decision support. Idea Group.
Abiteboul, S., Benjelloun, O., Milo, T. (2008a). The active XML: an overview. VLDB Journal, 17(5), 1019–1040.
Abiteboul, S., Manolescu, I., Zoupanos, S. (2008b). OptimAX: Optimizing distributed activeXML applications In Schwabe, D., Curbera, F., Dantzig, P. (Eds.), ICWE, IEEE (pp. 299–310).
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of 20th International Conference on Very Large DataBase (VLDB’94), September 12–15, (pp.487–499). Santiago de Chile, Chile.
Bailey, J., Poulovassilis, A., Wood, P.T. (2002). An event-condition-action language for XML. In The 12th international world wide web conference, WWW (pp. 486–495). Hawaaii.
Baril, X., & Bellahs̀ene, Z. (2003). Designing and managing an XML warehouse. In XML data management: Native XML and XML-enabled database systems (pp. 455–473). Addison Wesley.
Bentayeb, F., Maiz, N., Mahboubi, H., Favre, C., Loudcher, S., Harbi, N., Boussaid, O., Darmont, J. (2011). Innovative approaches for efficiently warehousing complex data from the web, business science reference. In Zorrilla,M.,Mazón, J., Ferràndez, Ó ., Garrigós, I., Daniel, F., Trujillo, J. (Eds.), Business intelligence applications and the web: Models, systems and technologies (pp. 26–52).
Bhowmick, S.S., Madria, S.K., Ng, W.K. (2003). Web data management: A warehouse approach: Springer-Verlag, New York Inc.
Bonifati, A., Braga, D., Campi, A., Ceri, S. (2002a). Active XQuery. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02) (p. 403). San Jose, CA.
Bonifati, A., Ceri, S., Paraboschi, S. (2002b). Pushing reactive services to XML repositories using active rules. Computer Networks, 39(5), 645–660.
Boussaïd, O., Messaoud, R.B., Choquet, R., Anthoard, S. (2006). X-warehousing: An XML-based approach for warehousing complex data. In 10th East-European on Advances in Databases and Information Systems (ADBIS’06) (pp. 39–54). Thessaloniki, Greece.
Boussaid, O., Darmont, J., Bentayeb, F., Loudcher, S. (2008). Warehousing complex data from the web. International Journal of Web Engineering and Technology, 4, 408–433.
Brobst, S., & Ballinger, C. (2003). Active data warehousing: Why Teradata warehouse is the only proven, platform. NCR Teradata, white paper. http://whitepapers.zdnet.co.uk/. Accessed October 2011.
Chawathe, S.S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J.D., Widom, J. (1994). The TSIMMIS project: Integration of heterogeneous information sources. In IPSJ (pp. 7–18).
Darmont, J., & Boussa¨ıd, O. (2006). Processing and managing complex data for decision support. Idea Group Inc (IGI).
Darmont, J., Boussaid, O., Christian Ralaivao, J., Aouiche, K. (2005). An architecture framework for complex data warehouses. In 7th International Conference on Enterprise Information Systems (ICEIS’05) (pp. 370–373). Miami, USA.
Erl, T. (2004). Service-oriented architecture: A field guide to integrating XML and web services.: Prentice Hall.
Feng, L., & Dillon, T. (2004). Mining interesting XML-enabled association rules with templates. Springer.
Gaber, M.M., Zaslavsky, A.B., Krishnaswamy, S. (2005). Mining data streams: A review. ACM SIGMOD Record, 34(2), 18–26.
Halevy, A.Y., Rajaraman, A., Ordille, J.J. (2006). Data integration: The teenage years. In Dayal, U., Whang, K.-Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.-K. (Eds.), Proceedings of VLDB (pp. 9–16).
Han, J., & Kamber,M. (2005). Data mining: Concepts and techniques, 2nd edn. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Hümmer, W., Bauer, A., Harde, G. (2003). Xcube: XML for data ware houses. In 6th International Workshop on Data Warehousing and OLAP (DOLAP’03) (pp. 33–40). New Orleans, USA.
Inmon, W.H. (2002). Building the data warehouse, 2nd edn. New York: John Wiley & Sons.
Inmon, W.H., Strauss, D., Neushloss, G. (2008). DW 2.0: The architecture for the next generation of data warehousing. Morgan Kaufmann.
Janjua, N., Hussain, F., Hussain, O. (2012). Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making. Information Systems Frontiers, 1–26. doi:10.1007/s10796-012-9365-x.
Jiang, N., & Gruenwald, L. (2006). Research issues in data stream association rule mining. ACM SIGMOD Record, 35(1), 14–19.
Karakasidis, A., Vassiliadis, P., Pitoura, E. (2005). ETL queues for active data warehousing. In Proceedings of 2nd international workshop on Information Quality in Information Systems (IQIS’05) (pp. 28–39). Baltimore, USA.
Kimball, R., & Merz, R. (2000). The data Webhouse toolkit: Building the web-enabled data warehouse. John Wiley & Sons.
Kimball, R., & Ross, M. (2002). The data warehouse toolkit: The complete guide to dimensional modeling, 2nd edn. JohnWiley & Sons: New York.
Knoblock, C.A., Minton, S., Ambite, J.L., Ashish, N., Muslea, I., Philpot, A.G., Tejada, S. (2001). The ariadne approach to web-based information integration. International Journal of Cooperative Information Systems, 10(1 & 2), 145–169.
Li, G., & Wei, M. (2012). Everything-as-a-service platform for on-demand virtual enterprises. Information Systems Frontiers, 1–18. doi:10.1007/s10796-012-9351-3.
Linthicum, D.S. (2010). Approaching SaaS integration with data integration best practices and technology. White paper. http://www.informaticacloud.com/images/whitepapers/WP-Approaching_SaaS_Integration.pdf.
Lorenzo, G.D., Hacid, H., Paik, H.-Y., Benatallah, B. (2009). Data integration in mashups. SIGMOD Record, 38(1), 59–66.
Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C. (2007). Web-scale data integration: You can afford to pay as you go. In CIDR (pp. 342–350). www.crdrdb.org.
Mahboubi, H., Hachicha, M., Darmont, J. (2008). In Encyclopedia of data warehousing and mining 2nd edn, (pp. 2109–2116). USA: IGI Publishing.
Martens, B., & Teuteberg, F. (2012). Decision-making in cloud computing environments: A cost and risk based approach. Information Systems Frontiers, 14, 871–893. doi:10.1007/s10796-011-9317-x.
Milo, T., Abiteboul, S., Anman, B., Benjelloun, O., Ngoc, F. (2003). Exchanging intentional XML data. In Proceedings of international ACM special interest group for the management of data (SIGMOD’03) (pp. 289–300).
Naeem, M., Dobbie, G., Weber, G. (2011). X-hybridjoin for near-real-time data warehousing. In Fernandes, A., Gray, A., Belhajjame, K. (Eds.), Advances in databases, lecture notes in computer science (Vol. 7051, pp. 33–47). Berlin / Heidelberg: Springer.
Naeem, M.A., Dobbie, G., Webber, G. (2008). An event-based near real-time data integration architecture. In Proc. 12th enterprise distributed object computing conf. workshops (pp. 401–404).
Nassis, V., Rajugan, R., Dillon, T., Rahayu, J. (2005). Conceptual and systematic design approach for XML document warehouses. International Journal of Data Warehousing & Mining, 1(3), 63–86.
Onose, N., & Siméon, J. (2004). XQuery at your web service. In Feldman, S.I., Uretsky, M., Najork, M.,Wills, C.E. (Eds.), WWW, ACM (pp. 603–611). doi:10.1145/988672.988754.
Oracle, W.P. (2010). Real-time data integration for data warehousing and operational business intelligence (p. 17). Oracle White Paper. http://www.oracle.com/us/products/middleware/data-integration/goldengate11g-realtimedw-wp-168215.pdf.
Park, B., Han, H., Song, I. (2005). XML-OLAP: A multidimensional analysis framework for XML warehouses. In 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK’05) (pp. 32–42). Copenhagen, Denmark.
Paton, N. (1999). Active rules in database systems. New York: Springer.
Pérez, J.M., Llavori, R.B., Aramburu, M.J., Pedersen, T.B. (2008). Integrating data warehouses with web data: A survey. IEEE Transactions on Knowledge and Data Engineering, 20(7), 940–955.
Phan, B., Pardede, E., Rahayu, W. (2012). On the improvement of active xml (axml) representation and query evaluation. Information Systems Frontiers, 1–20. doi:10.1007/s10796-012-9363-z.
Pokorný, J. (2002). XML data warehouse: Modelling and querying. In 5th International baltic conference (pp. 267–280). (BalticDB&IS’02).
Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N. (2007). Supporting streaming updates in an active data warehouse. In 23rd International Conference Data Engineering(ICDE’07) (pp. 476–485). Istanbul, Turkey.
Rajugan, R., Chang, E., Dillon, T. (2005). Conceptual design of an XML FACT repository for dispersed XML document warehouses and XML marts. In 5th international conference on Computer and Information Technology (CIT’05) (pp. 141–149). Shanghai, China.
Rekouts, M. (2005). Incorporating active rules processing into update execution in XML database systems. In 16th International Workshop on Database and Expert Systems Applications(DEXA’05). Copenhagen, Denmark.
Ruberg, G., & Mattoso, M. (2008). XCraft: Boosting the performance of active XML materialization. In 11th International Conference on Extending Database Technology (EDBT’08) (pp. 299–310). Nantes, France.
Rusu, L.I., Rahayu, J.W., Taniar, D. (2005). A methodology for building XML data warehouses. International Journal of Data Warehousing & Mining, 1(2), 67–92.
Salem, R., Boussaïd, O., Darmont J. (2010). Conceptual workflow for complex data integration using AXML. In International Conference on Machine and Web Intelligence (ICMWI 10). Algiers, Algeria.
Salem, R., Darmont, J., Boussaïd, O. (2011). Efficient incremental breadth-depth xml event mining. In 15th International Database Engineering & Applications Symposium (IDEAS’11). ACM Lisbon, Portugal.
Schlesinger, L., Irmert, F., Lehner, W. (2005). Supporting the ETL-process by web service technologies. Int J of Web and Grid Services, 1(1), 31–47.
Sheth, A.P., & Larson, J.A. (1990). Federated database systems for managing distributed and autonomous databases. ACM Computing Surveys, 183–236.
Thalhammer, T., Schrefl, M., Mohania, M. (2001). Active data warehouses: Complementing OLAP with active rules. Data and Knowledge Engineering, 39(3), 241–269.
Tho, M.N., & Tjoa, A. (2003). Zero-latency data warehousing for heterogeneous data sources and continues data streams. In Proceedings of 5th international conference on information and web-based applications services (iiWAS’03) (pp. 55–64). Jakarta, Indonesia.
Thor, A., & Rahm, E. (2011). Cloudfuice: A flexible cloud-based data integration system. In Auer, S., Díaz, O., Papadopoulos, G.A. (Eds.), ICWE. Lecture notes in computer science (Vol. 6757, pp. 304–318). Springer.
Utomo, W.H. (2011). Article: B2B Integration Based on SOA using Web Service. International Journal of Computer Applications, 32(2), 41–48.
Vassiliadis, P., & Simitsis, A. (2009). Near real time etl. In Kozielski, S., & Wrembel, R. (Eds.), New trends in data warehousing and data analysis, annals of information systems (Vol. 3, pp. 1–31). US: Springer.
Vidal, V., Lemos, F., Feitosa, F. (2008). Towards automatic generation of AXML Web services for dynamic data integration. In 3rd international workshop on database technologies for handling XML information on the web (DataX-EDBT’08) (pp. 43–50). Nantes, France.
Vrdoljak, B., Banek, M., Rizzi, S. (2003). Designing Web warehouses from XML schemas. In 5th International Conference on Data Warehousing and Knowledge Discovery (DaWaK’03) (pp. 89–98). Prague, Czech.
Wu, W. (2006). Integrating deep web data sources. PhD thesis, Champaign, IL, USA.
Xyleme, L. (2001). A dynamic warehouse for XML data of the Web. In International Database Engineering & Applications Symposium (IDEAS’01) (pp. 3–7). Grenoble, France.
Yu, P.S., & Chi, Y. (2009). Association rule mining on streams. In Encyclopedia of database systems (pp. 136–139). US: Springer.
Zhao, B., & Liu, C. (2006). Efficient SIP-specific event notification. In ICN/ICONS/MCL (p. 1) doi:10.1109/ICNICONSMCL.2006.85. IEEE Computer Society.
Zhao, Q., Chen, L., Bhowmick, S.S., Madria, S.K. (2006). XML structural delta mining: Issues and challenges. Data & Knowledge Engineering, 59(3), 627–651.
Zhu, F., Turner, M., Kotsiopoulos, I.A., Bennett, K.H., Russell, M., Budgen, D., Brereton, P., Keane, J.A., Layzell, P.J., Rigby, M., Xu, J. (2004). Dynamic data integration using web services. In ICWS 2004, San Diego, July 2004. IEEE Computer Society Press (pp. 262–269).
Ziegler, P., & Dittrich, K.R. (2004). Three decades of data integration - all problems solved? In Jacquart R. (Ed.), IFIP congress topical sessions (pp. 3–12). Kluwer.
Acknowledgments
The authors thank the anonymous reviewers of this paper for their thoughtful comments, which greatly helped improving our present work.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Salem, R., Boussaïd, O. & Darmont, J. Active XML-based Web data integration. Inf Syst Front 15, 371–398 (2013). https://doi.org/10.1007/s10796-012-9405-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-012-9405-6