Abstract
This paper studies the discovery of approximate rules in property graphs. First, we propose a semantically meaningful measure of error for mining graph entity dependencies (GEDs) that almost hold, to tolerate errors and inconsistencies that exist in real-world graphs. Second, we present a new characterisation of GED satisfaction, and devise a depth-first search strategy to traverse the search space of candidate GEDs efficiently. Further, we perform experiments to demonstrate the feasibility and scalability of our solution, FastAGEDs, with three real-world graphs. The results show FastAGEDs is effective and efficient for mining approximate GEDs in noisy and erroneous real-world graphs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
simply H when the context is clear.
- 2.
the max. |adom(X)| of variable literals is \(2\), whereas that of constant literals is \(1\).
- 3.
we adopt this metric due to its efficiency and anti-monotonic property.
- 4.
for brevity, we represent items by their lowercase letters, e.g. \(a_1\) is \(A_1[x;EA]\), \(a_3^1\) is \(A_3[y;Soccer]\).
- 5.
IMDB dataset. http://www.imdb.com/interfaces.
- 6.
- 7.
DBLP dataset. https://dblp.uni-trier.de/xml/.
References
Aberger, C.R., Lamb, A., Tu, S., Nötzli, A., Olukotun, K., Ré, C.: Emptyheaded: a relational engine for graph processing. ACM Trans. Database Syst. (TODS) 42(4), 1–44 (2017)
Alipourlangouri, M., Chiang, F.: Keyminer: discovering keys for graphs. In: VLDB Workshop TD-LSG (2018)
Bleifuß, T., et al.: Approximate discovery of functional dependencies for large datasets. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1803–1812 (2016)
Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 858–863. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68125-0_84
Caruccio, L., Deufemia, V., Polese, G.: Mining relaxed functional dependencies from data. Data Min. Knowl. Disc. 34(2), 443–477 (2020)
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. Proc. VLDB Endowment 7(7), 517–528 (2014)
Fan, W., Fan, Z., Tian, C., Dong, X.L.: Keys for graphs. Proc. VLDB Endowment 8(12), 1590–1601 (2015)
Fan, W., Chunming, H., Liu, X., Ping, L.: Discovering graph functional dependencies. ACM Trans. Database Syst. (TODS) 45(3), 1–42 (2020)
Fan, W., Lu, P.: Dependencies for graphs. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 403–416 (2017)
Fan, W., Ping, L.: Dependencies for graphs. ACM Trans. Database Syst. (TODS) 44(2), 1–40 (2019)
Fan, W., Ping, L., Tian, C., Zhou, J.: Deducing certain fixes to graphs. Proc. VLDB Endowment 12(7), 752–765 (2019)
Fan, W., Wu, Y., Xu, J.: Functional dependencies for graphs. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1843–1857 (2016)
Giannella, C., Robertson, E.: On approximation measures for functional dependencies. Inf. Syst. 29(6), 483–507 (2004)
Golab, L., Karloff, H., Korn, F., Srivastava, D., Bei, Yu.: On generating near-optimal tableaux for conditional functional dependencies. Proc. VLDB Endowment 1(1), 376–390 (2008)
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)
Kivinen, J., Mannila, H.: Approximate inference of functional dependencies from relations. Theoret. Comput. Sci. 149(1), 129–149 (1995)
Koudas, N., Saha, A., Srivastava, D., Venkatasubramanian, S.: Metric functional dependencies. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 1275–1278. IEEE (2009)
Kramer, S., Pfahringer, B.: E cient search for strong partial determinations. In: Proceedings of the International Conference on Knowledge Discover and Data Mining, pp. 371–378. Citeseer (1996)
Kruse, S., Naumann, F.: Efficient discovery of approximate dependencies. Proc. VLDB Endowment 11(7), 759–772 (2018)
Kwashie, S., Liu, J., Li, J., Ye, F.: Mining differential dependencies: a subspace clustering approach. In: Wang, H., Sharaf, M.A. (eds.) ADC 2014. LNCS, vol. 8506, pp. 50–61. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08608-8_5
Kwashie, S., Liu, J., Li, J., Ye, F.: Conditional differential dependencies (CDDs). In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. LNCS, vol. 9282, pp. 3–17. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23135-8_1
Kwashie, S., Liu, J., Li, J., Ye, F.: Efficient discovery of differential dependencies through association rules mining. In: Sharaf, M.A., Cheema, M.A., Qi, J. (eds.) ADC 2015. LNCS, vol. 9093, pp. 3–15. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19548-3_1
Kwashie, S., Liu, L., Liu, J., Stumptner, M., Li, J., Yang, L.: Certus: an effective entity resolution approach with graph differential dependencies (GDDs). Proc. VLDB Endowment 12(6), 653–666 (2019)
Lin, P., Song, Q., Yinghui, W.: Fact checking in knowledge graphs with ontological subgraph patterns. Data Sci. Eng. 3(4), 341–358 (2018)
Liu, D., et al.: An efficient approach for discovering graph entity dependencies (GEDs). arXiv preprint arXiv:2301.06264 (2023)
Liu, J., Kwashie, S., Li, J., Ye, F., Vincent, M.: Discovery of approximate differential dependencies. arXiv preprint arXiv:1309.3733 (2013)
Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data-a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2010)
Ma, H., Alipourlangouri, M., Yinghui, W., Chiang, F., Pi, J.: Ontology-based entity matching in attributed graphs. Proc. VLDB Endowment 12(10), 1195–1207 (2019)
Mandros, P., Boley, M., Vreeken, J.: Discovering reliable approximate functional dependencies. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 355–363 (2017)
Mannila, H., Räihä, K.J.: Dependency inference. In Proceedings of the 13th International Conference on Very Large Data Bases, pp. 155–158 (1987)
Mannila, H., Räihä, K.-J.: Algorithms for inferring functional dependencies from relations. Data Knowl. Eng. 12(1), 83–99 (1994)
Medina, R., Nourine, L.: A unified hierarchy for functional dependencies, conditional functional dependencies and association rules. In: Ferré, S., Rudolph, S. (eds.) ICFCA 2009. LNCS (LNAI), vol. 5548, pp. 98–113. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01815-2_9
Mhedhbi, A., Salihoglu, S.: Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB Endowment 12(11)
Piatetsky-Shapiro, G.: Probabilistic data dependencies. In: Machine Discovery Workshop (Aberdeen, Scotland) (1992)
Reimherr, M., Nicolae, D.L.: On quantifying dependence: a framework for developing interpretable measures (2013)
Teixeira, C.H., Fonseca, A.J., Serafini, M., Siganos, G., Zaki, M.J., Aboulnaga, A.: Arabesque: a system for distributed graph mining. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 425–440 (2015)
Weytjens, S.: Approximate functional dependencies: a comparison of measures and a relevance focused tool for discovery (2021)
Acknowledgment
This research project was supported in part by the following grant schemes: the Major Project of Hubei Hongshan Laboratory under Grant 2022HSZD031; the Innovation fund of Chinese Marine Defense Technology Innovation Center under Grant JJ-2021-722-04; the open funds of Hubei Three Gorges Laboratory, the Fundamental Research Funds for the Chinese Central Universities under Grant 2662023XXPY004, 2662022JC004; the open funds of the National Key Laboratory of Crop Genetic Improvement under Grant ZK202203, Huzhong Agricultural University; and the Inner Mongolia Key Scientific and Technological Project under Grant 2021SZD0099.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, G. et al. (2023). FastAGEDs: Fast Approximate Graph Entity Dependency Discovery. In: Zhang, F., Wang, H., Barhamgi, M., Chen, L., Zhou, R. (eds) Web Information Systems Engineering – WISE 2023. WISE 2023. Lecture Notes in Computer Science, vol 14306. Springer, Singapore. https://doi.org/10.1007/978-981-99-7254-8_35
Download citation
DOI: https://doi.org/10.1007/978-981-99-7254-8_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7253-1
Online ISBN: 978-981-99-7254-8
eBook Packages: Computer ScienceComputer Science (R0)