Abstract
Many applications rely on distributed databases. However, only few discovery methods exist to extract patterns without centralizing the data. In fact, this centralization is often less expensive than the communication of extracted patterns from the different nodes. To circumvent this difficulty, this paper revisits the problem of pattern mining in distributed databases by benefiting from pattern sampling. Specifically, we propose the algorithm DDSampling that randomly draws a pattern from a distributed database with a probability proportional to its interest. We demonstrate the soundness of DDSampling and analyze its time complexity. Finally, experiments on benchmark datasets highlight its low communication cost and its robustness. We also illustrate its interest on real-world data from the Semantic Web for detecting outlier entities in DBpedia and Wikidata.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Al Hasan, M., Zaki, M.J.: Output space sampling for graph patterns. Proc. VLDB Endow. 2(1), 730–741 (2009)
Berners-Lee, T., Hendler, J., Lassila, O., et al.: The semantic web. Sci. Am. 284(5), 28–37 (2001)
Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: Proceedings of KDD, pp. 582–590 (2011)
Cheung, D.W., Ng, V.T., Fu, A.W., Fu, Y.: Efficient mining of association rules in distributed databases. IEEE Trans. Knowl. Data Eng. 8(6), 911–922 (1996)
Diop, L., Diop, C.T., Giacometti, A., Haoyuan, D.L., Soulet, A.: Sequential pattern sampling with norm constraints. In: Proceedings of ICDM 2018 (2018)
Domadiya, N., Rao, U.P.: Privacy preserving distributed association rule mining approach on vertically partitioned healthcare data. Proc. Comput. Sci. 148, 303–312 (2019)
Dzyuba, V., van Leeuwen, M.: Learning what matters – sampling interesting patterns. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10234, pp. 534–546. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57454-7_42
Giacometti, A., Soulet, A.: Anytime algorithm for frequent pattern outlier detection. Int. J. Data Sci. Anal. 2(3–4), 119–130 (2016)
Gombos, G., Kiss, A.: Federated query evaluation supported by SPARQL recommendation. In: Yamamoto, S. (ed.) HIMI 2016. LNCS, vol. 9734, pp. 263–274. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40349-6_25
Jin, R., Agrawal, G.: Systematic approach for optimizing complex mining tasks on multiple databases. In: Proceedings of ICDE, pp. 17, April 2006
Kum, H.C., Chang, J.H., Wang, W.: Sequential pattern mining in multi-databases via multiple alignment. DMKD J. 12(2–3), 151–180 (2006)
Moens, S., Boley, M.: Instant exceptional model mining using weighted controlled pattern sampling. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds.) IDA 2014. LNCS, vol. 8819, pp. 203–214. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12571-8_18
Otey, M.E., Wang, C., Parthasarathy, S., Veloso, A., Meira, W.: Mining frequent itemsets in distributed and dynamic databases. In: Proceedings of ICDM 2003, pp. 617–620. IEEE (2003)
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Springer, Switzerland (2011). https://doi.org/10.1007/978-3-030-26253-2
Shen, H., Zhao, L., Li, Z.: A distributed spatial-temporal similarity data storage scheme in wireless sensor networks. IEEE Trans. Mob. Comput. 10(7), 982–996 (2011)
Zhang, S., Zaki, M.J.: Mining multiple data sources: local pattern analysis. DMKD J. 12(2–3), 121–125 (2006)
Zhu, X., Li, B., Wu, X., He, D., Zhang, C.: CLAP: collaborative pattern mining for distributed information systems. Decis. Support Syst. 52(1), 40–51 (2011)
Zhu, X., Wu, X.: Discovering relational patterns across multiple databases. In: Proceedings of ICDE, pp. 726–735. IEEE (2007)
Acknowledgements
This work has been partly supported by CEAMITIC (Centre d’Excellence Africain en Mathématiques, Informatique et TIC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Diop, L., Diop, C.T., Giacometti, A., Soulet, A. (2020). Pattern Sampling in Distributed Databases. In: Darmont, J., Novikov, B., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2020. Lecture Notes in Computer Science(), vol 12245. Springer, Cham. https://doi.org/10.1007/978-3-030-54832-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-54832-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54831-5
Online ISBN: 978-3-030-54832-2
eBook Packages: Computer ScienceComputer Science (R0)