Abstract
Interactions that involve a group of people or objects are omnipresent in practice. Some examples include the list of recipients of an email, the group of co-authors of a publication, and the users participating in online discussion threads. These interactions are modeled as hypergraphs in which each hyperedge is a set of nodes constituting an interaction. In a hypergraph, the k-core is the sub-hypergraph within which the degree of each node is at least k. Investigating the k-core structures is valuable in revealing some properties of the hypergraph, one of which is the network behavior when facing attacks. Networks in practice are often prone to attacks by which the attacker removes a portion of the nodes or hyperedges to weaken some properties of the networks. The resilience of the k-cores is an indicator of the robustness of the network against such attacks. In this work, we investigate the core resilience of real-world hypergraphs against deletion attacks. How robust are the core structures of real-world hypergraphs in these attack scenarios? Given the complexity of a real-world hypergraph, how should we supplement the hypergraph with augmented hyperedges to enhance its core resilience? In light of several empirical observations regarding core resilience, we present a two-step method that preserves and strengthens the core structures of the hypergraphs.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In this study, for the sake of simplicity, we choose to exclude self-loops (i.e., hyperedges of size 1) as they are not significantly relevant to robustness.
Identical rank values are each assigned the fractional rank equal to the average of their positions in the ascending order of the rank values.
The nodes having no incident hyperedges left are assigned core number 0 in this case.
References
Akoglu L, Faloutsos C (2013) Anomaly, event, and fraud detection in large network datasets. In: Proceedings of the 6th ACM international conference on web search and data mining (WSDM), pp 773–774, https://doi.org/10.1145/2433396.2433496
Amburg I, Veldt N, Benson A (2020) Clustering in graphs and hypergraphs with categorical edge labels. In: Proceedings of the web conference 2020 (WWW), pp 706–717. https://doi.org/10.1145/3366423.3380152
Aridhi S, Brugnara M, Montresor A, et al (2016) Distributed k-core decomposition and maintenance in large dynamic graphs. In: Proceedings of the 10th ACM international conference on distributed and event-based systems (DEBS), pp 161–168. https://doi.org/10.1145/2933267.2933299
de Arruda GF, Petri G, Moreno Y (2020) Social contagion models on hypergraphs. Phys Rev Res 2(2):023,032. https://doi.org/10.1103/PhysRevResearch.2.023032
Benson AR, Abebe R, Schaub MT, et al (2018a) Simplicial closure and higher-order link prediction. Proc Natl Acad Sci 115(48):E11,221–E11,230. https://doi.org/10.1073/pnas.1800683115doi:
Benson AR, Kumar R, Tomkins A (2018b) Sequences of sets. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1148–1157. https://doi.org/10.1145/3219819.3220100
Bhawalkar K, Kleinberg J, Lewi K et al (2015) Preventing unraveling in social networks: the anchored k-core problem. SIAM J Discret Math 29(3):1452–1475. https://doi.org/10.1137/14097032X
Chen C, Zhu Q, Sun R et al (2021) Edge manipulation approaches for k-core minimization: metrics and analytics. IEEE Trans Knowl Data Eng 32(1):390–403. https://doi.org/10.1109/TKDE.2021.3085570
Chien E, Pan C, Peng J, et al (2022) You are AllSet: a multiset function framework for hypergraph neural networks. In: Proceedings of the 10th international conference on learning representations (ICLR). https://doi.org/10.48550/arXiv.2106.13264
Do MT, Yoon Se, Hooi B, et al (2020) Structural Patterns and Generative Models of Real-world Hypergraphs. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 176–186. https://doi.org/10.1145/3394486.3403060
Feng Y, You H, Zhang Z, et al (2019) Hypergraph Neural Networks. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI), pp 3558–3565. https://doi.org/10.1609/aaai.v33i01.33013558
Freitas S, Yang D, Kumar S, et al (2022) Graph vulnerability and robustness: A survey. IEEE Transactions on Knowledge and Data Engineering pp 5915–5934. https://doi.org/10.1109/TKDE.2022.3163672doi:
Gabert K, Pinar A, Çatalyürek ÜV (2021a) Shared-memory scalable k-core maintenance on dynamic graphs and hypergraphs. In: Proceedings of the 2021 IEEE international parallel and distributed processing symposium workshops (IPDPSW), IEEE, pp 998–1007. https://doi.org/10.1109/IPDPSW52791.2021.00158
Gabert K, Pinar A, Çatalyürek ÜV (2021b) A unifying framework to identify dense subgraphs on streams: graph nuclei to hypergraph cores. In: Proceedings of the 14th ACM international conference on web search and data mining (WSDM), pp 689–697. https://doi.org/10.1145/3437963.3441790
Giatsidis C, Thilikos DM, Vazirgiannis M (2011) Evaluating cooperation in communities with the k-Core Structure. In: Proceedings of the 2021 international conference on advances in social networks analysis and mining (ASONAM), pp 87–93. https://doi.org/10.1109/ASONAM.2011.65
Giroire F, Nisse N, Trolliet T et al (2022) Preferential attachment hypergraph with high modularity. Network Science 10(4):400–429. https://doi.org/10.1017/nws.2022.35
Huang Z, Chung W, Ong TH, et al (2002) A Graph-Based Recommender System for Digital Library. In: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), pp 65–73. https://doi.org/10.1145/544220.544231
Hwang H, Lee S, Park C, et al (2022) AHP: Learning to Negative Sample for Hyperedge Prediction. In: Proceedings of the 45th International ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 2237–2242. https://doi.org/10.1145/3477495.3531836
Iacopini I, Petri G, Barrat A et al (2019) Simplicial models of social contagion. Nat Commun 10(1):1–9. https://doi.org/10.1038/s41467-019-10431-6
Kim S, Choe M, Yoo J, et al (2022) Reciprocity in directed hypergraphs: measures, findings, and generators. In: Proceedings of the 2022 IEEE international conference on data mining (ICDM), pp 1005–1010, https://doi.org/10.1109/ICDM54844.2022.00122
Kim S, Bu F, Choe M, et al (2023) How transitive are real-world group interactions?–Measurement and reproduction. arXiv preprint arXiv:2306.02358
Kitsak M, Gallos LK, Havlin S et al (2010) Identification of influential spreaders in complex networks. Nat Phys 6(11):888–893. https://doi.org/10.1038/nphys1746
Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: Proceedings of the 15th European conference on machine learning (ECML), Springer, pp 217–226, https://doi.org/10.1007/978-3-540-30115-8_22
Kook Y, Ko J, Shin K (2020) Evolution of real-world hypergraphs: patterns and models without oracles. In: Proceedings of the 2020 IEEE international conference on data mining (ICDM), https://doi.org/10.1109/ICDM50108.2020.00036
Kumar T, Darwin K, Parthasarathy S, et al (2020) PHPRA: hyperedge prediction using resource allocation. In: Proceedings of the 12th ACM conference on web science, pp 135–143, https://doi.org/10.1145/3394231.3397903
Laishram R (2020) The resilience of k-cores in graphs. PhD thesis, Syracuse University
Laishram R, Sariyüce A, Eliassi-Rad T, et al (2018) Measuring and Improving the Core Resilience of Networks. In: Proceedings of the web conference 2018 (WWW), pp 609–618, https://doi.org/10.1145/3178876.3186127
Laishram R, Erdem Sar A, Eliassi-Rad T, et al (2020) Residual core maximization: an efficient algorithm for maximizing the size of the k-core. In: Proceedings of the 2020 SIAM international conference on data mining (SDM), pp 325–333. https://doi.org/10.1137/1.9781611976236.37
Lee G, Ko J, Shin K (2020) Hypergraph motifs: concepts, algorithms, and discoveries. Proc VLDB Endow 13(11):2256–2269. https://doi.org/10.14778/3407790.3407823
Lee G, Choe M, Shin K (2021) How do hyperedges overlap in real-world hypergraphs?-Patterns, measures, and generators. In: Proceedings of the web conference 2021 (WWW), pp 3396–3407. https://doi.org/10.1145/3442381.3450010
Lei S, Maniu S, Mo L, et al (2015) Online influence maximization. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 645–654. https://doi.org/10.1145/2783258.2783271
Leng M, Sun L, Jn Bian et al (2013) An O(m) algorithm for cores decomposition of undirected hypergraph. J Chin Comput Syst 34(11):2568–2573
Li P, Milenkovic O (2017) Inhomogeneous hypergraph clustering with applications. In: Proceedings of the 31st international conference on neural information processing systems, pp 2305–2315. https://doi.org/10.48550/arXiv.1709.01249s
Li RH, Yu JX, Mao R (2013) Efficient core maintenance in large dynamic graphs. IEEE Trans Knowl Data Eng 26(10):2453–2465. https://doi.org/10.1109/TKDE.2013.158
Lin Z, Zhang F, Lin X et al (2021) Hierarchical core maintenance on large dynamic graphs. Proc VLDB Endow 14(5):757–770. https://doi.org/10.14778/3446095.3446099
Linghu Q, Zhang F, Lin X, et al (2020) Global reinforcement of social networks: the anchored coreness problem. In: Proceedings of the 2020 ACM SIGMOD international conference on management of data (SIGMOD), pp 2211–2226. https://doi.org/10.1145/3318464.3389744
Liu Q, Huang Y, Metaxas DN (2011) Hypergraph with sampling for image retrieval. Pattern Recogn 44(10–11):2255–2262. https://doi.org/10.1016/j.patcog.2010.07.014
Ma X, Ma F, Yin J et al (2018) Cascading failures of k uniform hyper-network based on the hyper adjacent matrix. Physica A 510:281–289. https://doi.org/10.1016/j.physa.2018.06.122
Medya S, Ma T, Silva A, et al (2020) A game theoretic approach for core resilience. In: Proceedings of the 29th international joint conference on artificial intelligence (IJCAI), pp 3473–3479. https://doi.org/10.24963/ijcai.2020/480
Mei G, Tu J, Xiao L et al (2021) An efficient graph clustering algorithm by exploiting k-core decomposition and motifs. Comput Electric Eng 96(107):564. https://doi.org/10.1016/j.compeleceng.2021.107564
Ouyang M, Toulouse M, Thulasiraman K et al (2002) Multilevel cooperative search for the circuit/hypergraph partitioning problem. IEEE Trans Comput Aided Des Integr Circuits Syst 21(6):685–693. https://doi.org/10.1109/TCAD.2002.1004312
Peng H, Qian C, Zhao D, et al (2022) Targeting attack hypergraph networks. Chaos Interdiscip J Nonlinear Sci 32(7):073,121. https://doi.org/10.1063/5.0090626
Peng Y, Zhang Y, Zhang W, et al (2018) Efficient Probabilistic K-Core Computation on Uncertain Graphs. In: Proceedings of the IEEE 34th international conference on data engineering (ICDE), pp 1192–1203. https://doi.org/10.1109/ICDE.2018.00110
Rota Bulò S, Pelillo M (2013) A game-theoretic approach to hypergraph clustering. IEEE Trans Pattern Anal Mach Intell 35(6):1312–1327. https://doi.org/10.1109/TPAMI.2012.226
Seidman SB (1983) Network structure and minimum degree. Soc Netw 5(3):269–287. https://doi.org/10.1016/0378-8733(83)90028-X
Shin K, Eliassi-Rad T, Faloutsos C (2016) Corescope: Graph Mining Using k-Core Analysis-Patterns, Anomalies and Algorithms. In: Proceedings of the IEEE 16th international conference on data mining (ICDM), pp 469–478. https://doi.org/10.1109/ICDM.2016.0058
Sinha A, Shen Z, Song Y, et al (2015) An overview of microsoft academic service (MAS) and applications. In: Proceedings of the web conference 2015 (WWW), pp 243–246. https://doi.org/10.1145/2740908.2742839
Sun B, Chan THH, Sozio M (2020) Fully dynamic approximate k-core decomposition in hypergraphs. ACM Trans Knowl Discov Data 14(4):1–21. https://doi.org/10.1145/3385416
Tan S, Guan Z, Cai D, et al (2014) Mapping users across networks by manifold alignment on hypergraph. In: Proceedings of the 28th AAAI conference on artificial intelligence (AAAI), pp 159–165. https://doi.org/10.1609/aaai.v28i1.8720
Vitter JS (1987) An efficient algorithm for sequential random sampling. ACM Trans Math Softw 13(1):58–67. https://doi.org/10.1145/21465.21474
Yadati N, Nimishakavi M, Yadav P, et al (2019) HyperGCN: a new method of training graph convolutional networks on hypergraphs. In: Proceedings of the 33rd international conference on neural information processing systems (NeurIPS), pp 1511–1522. https://doi.org/10.48550/arXiv.1809.02589
Yadati N, Nitin V, Nimishakavi M, et al (2020) NHP: neural hypergraph link prediction. In: Proceedings of the 29th ACM international conference on information and knowledge management (CIKM), pp 1705–1714. https://doi.org/10.1145/3340531.3411870
Yang D, Qu B, Yang J, et al (2019) Revisiting user mobility and social relationships in LBSNs: a hypergraph embedding approach. In: Proceedings of the web conference 2019 (WWW), pp 2147–2157. https://doi.org/10.1145/3308558.3313635
Yin H, Benson AR, Leskovec J, et al (2017) Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 555–564. https://doi.org/10.1145/3097983.3098069
Zhang F, Zhang Y, Qin L, et al (2017) Finding critical users for social network engagement: the collapsed k-core problem. In: Proceedings of the 31st AAAI conference on artificial intelligence (AAAI), pp 245–251. https://doi.org/10.1609/aaai.v31i1.10482
Zhou Z, Zhang F, Lin X, et al (2019) K-core maximization: an edge addition approach. In: Proceedings of the 28th international joint conference on artificial intelligence (IJCAI), pp 4867–4873. https://doi.org/10.24963/ijcai.2019/676
Zhu W, Chen C, Wang X, et al (2018) K-Core Minimization: An Edge Manipulation Approach. In: Proceedings of the 27th ACM international conference on information and knowledge management (CIKM), pp 1667–1670. https://doi.org/10.1145/3269206.3269254
Zhu Y, Guan Z, Tan S et al (2016) Heterogeneous hypergraph embedding for document recommendation. Neurocomputing 216:150–162. https://doi.org/10.1016/j.neucom.2016.07.030
Funding
This work was supported by National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2020R1C1C1008296) and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00871, Development of AI Autonomy and Knowledge Enhancement for AI Agent Collaboration) (No. 2019–0-00075, Artificial Intelligence Graduate School Program (KAIST)).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Responsible editor: Charalampos Tsourakakis.
Appendices
Appendix A: Datasets
Throughout the paper, we use 10 real-world hypergraph datasets (Benson et al. 2018a). The basic statistics are provided in Table 4. Their domains are:
-
Co-authorship (coauth-MAG-Geology and coauth-MAG-History): each node is an author, and each hyperedge is the list of coauthors in a publication.
-
Contact (contact-high-school and contact-primary-school): each node is an individual, and each hyperedge is a group of people in contact at a high/primary school.
-
Email (email-Enron and email-Eu): each node is an email address, and each hyperedge consists of the sender and recipients of an email.
-
Drugs (NDC-classes and NDC-substances): each node represents a drug class/substance, and each hyperedge represents a set of classifications/substances of a drug.
-
Threads (threads-ask-ubuntu and threads-math): each node is a user in an online forum, and each hyperedge is the list of users in a question thread.
Appendix B: Core decomposition algorithm
The Core Decomposition process is outlined in Algorithm 3.
Appendix C: Algorithm for SIR on hypergraphs
The SIR model generalized to the hypergraph setting is outlined in Algorithm 4.
For the results reported in Sects. 4 and 7.6, we set \(r = 1\) and \(t=0.025\) in all datasets. Similar conclusions regarding the applicability of hypergraph core numbers (Sect. 4) and how the hyperedges augmented by COREA help support the applications of core numbers (Sect. 7.6) are drawn with different values of t in \(\{0.05, 0.025, 0.01, 0.005\}\).
Appendix D: Results on node-deletion attacks
We present the results of node-deletion attacks. The five observations, similar to those in Sect. 5.3, are reported for all attack strategies in Figs. 20, 21, 22, and 23. The results in Figs. 21, 22, and 23 are of the cases where \(25 \%\) of nodes are deleted.
The method evaluation results reported in Figs. 24, 25, 26, 27, 28, 29 and 30 are of Core Strength Attack. Similarly to Sect. 7, we also report the mean of 10 trials together with the standard deviations indicated by the vertical bars. Overall, our full-fledged method COREA significantly outperforms and provides a better time-performance trade-off than the baselines and simplified variants. The statistical significance of the gap is also verified by the one-tailed Student’s t-test, as in Sect. 7.2, at \(95 \%\) confidence (p-values \(< 0.05\) in all cases).
For all other attack strategies, we draw the same conclusion about the superiority in performance and time-performance trade-off of the full-fledged version of COREA compared to the baselines and simplified variants. Due to the large number of figures, we present the results of all other attack strategies in the supplementary material.
When switching from the real hyperedge size distribution, heavy-tailed, to the uniform distribution, COREA achieves a better performance as more larger-size hyperedges are augmented. However, assuming a uniform hyperedge distribution is both unrealistic and violative of the constraints of Problem 1.
Appendix E: Theoretical results and proofs
In this section, we present detailed theoretical results with the accompanying proofs to support the soundness of COREA.
We first define a valid deletion order in a hypergraph \(\textbf{G}= (\textbf{V}, \textbf{G})\) as a particular permutation \(\mathbb {O}= [v_{i_1}, v_{i_2},...,v_{i_n}]\) of the nodes in \(\textbf{V}= \{v_1,...,v_n\}\) such that the nodes in \(\textbf{V}\) are removed exactly in the order of \(\mathbb {O}\) in an execution of the core decomposition process. Different tie-breaking schemes \(\textsf{T}\), described in Sect. 6.2.1 determine differently which node to delete first when several nodes are up for removal, resulting in different executions of the core decomposition process of \(\textbf{G}\) and in turn different valid deletion orders. In addition, we refer to any augmentation method that augments several hyperedges of its choice to hypergraph \(\textbf{G}\) while preserving all core numbers of \(\textbf{G}\) as a feasible augmentation of \(\textbf{G}\).
1.1 E.1: Feasibility of COREA
In this section, we prove that COREA is a feasible augmentation method, i.e., the hyperedges augmented by COREA to \(\textbf{G}\) are guaranteed to preserve all core numbers of \(\textbf{G}\).
Lemma 1
Assuming that after applying F, a feasible augmentation of \(\textbf{G}\), a subsequence of a valid deletion order in the core decomposition process for the nodes having core number k is: \(S_k = [a_1,...,a_q]\). Without F, \(S_k\) is still a subsequence of a valid deletion order for the nodes having core number k in the pruning process of obtaining the \((k+1)\)-core in the original hypergraph \(\textbf{G}\).
Proof
Let \(\textbf{G}'\) denote the result of applying F to \(\textbf{G}\). For \(i=1,...,q\), let \(\textbf{E}_F(a_i)\) be the set of hyperedges augmented by F, each of which has \(\{a_i\} \cup s\), with \(s \subseteq \{a_{i+1},...,a_q\}\) (s may be an empty set) as the set of anchors. Let \(F(a_i) = |\textbf{E}_{F}(a_i)|\). Following the core decomposition process, all the hyperedges in \(\textbf{E}_F(a_i)\) are removed when \(a_i\) is removed.
For \(a_1\), as \(a_1\) can be removed first in the process of obtaining the \((k+1)\)-core from the k-core of \(\textbf{G}'\), its degree at the k-core of \(\textbf{G}'\) is \(d_{\textbf{G}'}^k(a_1) \le k\). As the degree of \(a_1\) in the k-core of \(\textbf{G}\) is equal to \(d_{\textbf{G}}^k(a_1) = d_{\textbf{G}'}^k(a_1) - F(a_1) \le k\), \(a_1\) can also the first node of core number k to be deleted in the core decomposition process of \(\textbf{G}\).
For any \(a_i, i>1\), during the pruning process of obtaining the \((k+1)\)-core, after nodes \(a_1,...,a_{i-1}\) have been removed along with their incident hyperedges, the degree of \(a_i\) in \(\textbf{G}'\) must be lower than or equal to k. In other words, the degree of \(a_i\) at this point is equal to \(k - g(a_i) \le k\) with \(g(a_i) \ge 0\). This value is equal to \(F(a_i)\) plus the degree of \(a_i\) in a sub-hypergraph of \(\textbf{G}\), obtained by removing \(a_1,...,a_{i-1}\) along with their incident hyperedges. If the order \(S_k\) is followed in the core decomposition process of the original hypergraph \(\textbf{G}\) (without the augmentation F), the degree of \(a_i\) at this point, after nodes \(a_1,...,a_{i-1}\) have been removed along with their incident hyperedges, would be: \(k - g(a_i) - F(a_i) \le k\), which also qualifies \(a_i\) for deletion.
Therefore, without the hyperedges augmented by F, \(S_k\) is still a subsequence of a valid deletion order for the nodes having core number k in the pruning process of obtaining the \((k+1)\)-core in the original hypergraph \(\textbf{G}\).
Theorem 1
(Feasibility of COREA) Step 1 of COREA guarantees to construct a pool P of candidate hyperedges that do not change the core number of any node when they are added together to \(\textbf{G}\).
Proof
We show that after COREA augments all the candidate hyperedges in P, the pool of candidate hyperedges constructed in Step 1 of COREA (Sect. 6.2), to \(\textbf{G}=(\textbf{V}, \textbf{E})\) to form \(\textbf{G}'=(\textbf{V}, \textbf{E}')\), the original deletion order \(\mathbb {O}=[v_{i_1}, v_{i_2},...,v_{i_n}]\) of an execution of the core decomposition process on \(\textbf{G}\) in Algorithm 2 is still a valid deletion order in \(\textbf{G}'\) and returns the original core numbers.
We prove by induction on the elements \(v_{i_1},...,v_{i_n}\) in \(\mathbb {O}\) that in \(\textbf{G}'\), \(\mathbb {O}\) is still a valid deletion order and \(N_{\textbf{G}}(v_{i_j}) = N_{\textbf{G}'}(v_{i_j}), j = 1,...,n\).
- Base case: As \(v_{i_1}\) is the first node deleted in \(\textbf{G}\), immediately prior to the removal of \(v_{i_1}\), \(d_{\textbf{G}}(v_{i_1})<N_{\textbf{G}}(v_{i_1}) + 1\), and \(d_{\textbf{G}}(v_{i_1})\ge N_{\textbf{G}}(v_{i_1})\) (no hyperedges have been removed at this point and the degree of \(v_{i_1}\) must be sufficient for the core number of \(v_{i_1}\)). Therefore, \(d_{\textbf{G}}(v_{i_1}) = N_{\textbf{G}}(v_{i_1})\), so the anchor availability \(c(v_{i_1})\) realized for \(v_{i_1}\) is \(c(v) = N_{\textbf{G}}(v_{i_1}) - d_{\textbf{G}}(v_{i_1}) = 0\). As a result, in \(\textbf{G}'\), \(d_{\textbf{G}'}(v_{i_1})=d_{\textbf{G}}(v_{i_1})\), so \(v_{i_1}\) can also be the first node deleted in the core decomposition process in \(\textbf{G}'\) and \(N_{\textbf{G}}(v_{i_1}) = N_{\textbf{G}'}(v_{i_1})\).
- Inductive hypothesis: Assume that in an execution of the core decomposition process on \(\textbf{G}'\), the nodes \(v_{i_1},...,v_{i_{h-1}}\) have been deleted exactly in this order (same order as in \(\textbf{G}\)) and \(N_{\textbf{G}}(v_{i_j}) = N_{\textbf{G}'}(v_{i_j}), j = 1,...,h-1\). We need to show that \(v_{i_h}\) can now also be deleted and \(N_{\textbf{G}}(v_{i_h}) = N_{\textbf{G}'}(v_{i_h})\). Indeed, suppose \(N_{\textbf{G}}(v_{i_h})=k\) and \(c(v_{i_h})\) is the anchor availability realized for \(v_{i_h}\) by COREA. COREA constructs \(c(v_{i_h})\) hyperedges, formed by grouping \(v_{i_h}\) with other nodes from \(\{v_{i_{h+1}},...,v_{i_{n}}\}\) (Line 11 of Algorithm 1) and augments those \(c(v_{i_h})\) hyperedges to \(\textbf{G}\).
Firstly, these \(c(v_{i_h})\) hyperedges do not affect the core numbers of the nodes that have been deleted before v in \(\mathbb {O}\), which are \(v_{i_1},...,v_{i_{h-1}}\).
In addition, as \(\textbf{E}\subseteq \textbf{E}'\), \(N_{\textbf{G}'}(v_{i_j}) \ge N_{\textbf{G}}(v_{i_j})\) for \(j = h,...,n\). Moreover, after \(v_{i_1},...,v_{i_{h-1}}\) have been removed in the core decomposition process of \(\textbf{G}'\), the degree of \(v_{i_j}\) in \(\textbf{G}'\) is no less than the degree of \(v_{i_j}\) in \(\textbf{G}\) after \(v_{i_1},...,v_{i_{h-1}}\) have been removed in the core decomposition process of \(\textbf{G}\), for \(j = h,...,n\).
Following the removal order \(\mathbb {O}\) in the pruning process of obtaining the \((k+1)\)-core from the k-core in \(\textbf{G}\), after the nodes \(v_{i_1},...,v_{i_{h-1}}\) have been removed, the degree of \(v_{i_h}\) in \(\textbf{G}\) immediately prior to its removal is \(k - c(v_{i_h})\). Therefore, in \(\textbf{G}'\), at this point of the core decomposition process when \(v_{i_1},...,v_{i_{h-1}}\) along with their incident hyperedges have been deleted, the degree of \(v_{i_h}\) is \(d_{\textbf{G}'}(v_{i_h}) = k - c(v_{i_h}) + c(v_{i_h}) = k < k+1\), so \(N_{\textbf{G}'}(v_{i_h}) \le k\). Therefore, \(N_{\textbf{G}'}(v_{i_h}) = k = N_{\textbf{G}}(v_{i_h})\), and the degree of \(v_{i_h}\) at this point is equal to k. Thus, in \(\textbf{G}'\), \(v_{i_h}\) can be removed immediately after \(v_{i_1},...,v_{i_{h-1}}\) have been removed. Such removal deletes \(v_{i_h}\) and all of its incident hyperedges, including the newly augmented \(c(v_{i_h})\) hyperedges, thus having no impacts on \(v_{i_{h+1}},...,v_{i_n}\).
By the principle of mathematical induction, in \(\textbf{G}'\), \(\mathbb {O}\) is still a valid deletion order and \(N_{\textbf{G}}(v_{i_j}) = N_{\textbf{G}'}(v_{i_j}), j = 1,...,n\). Thus, Step 1 of COREA guarantees to construct a pool P of candidate hyperedges that do not change the core number of any node when they are added together to \(\textbf{G}\).
When the given budget B is tight, which is usually true in practice, only a subset of P is chosen to augment to \(\textbf{G}\) in Step 2 of COREA 6.3. Whether all hyperedges in P are augmented or only a subset of P is augmented to \(\textbf{G}\), in all cases, the hyperedges augmented by COREA are guaranteed to preserve all the original core numbers.
1.2 E2: Invariance of COREA
Lemma 2
Let \(\mathbb {S}= \{a_1,...,a_n\}\) be a set of n elements, \(\mathcal {F}(\mathbb {S})\) be the set of all subsets of S, and \(t:\mathcal {F}(\mathbb {S}) \mapsto \mathbb {N}\) be a function that maps each subset of \(\mathbb {S}\) to a natural number. Denote \(S^{(i)} \in \mathcal {F}(\mathbb {S})\) as the set of all subsets of \(\mathbb {S}\) contaning the element \(a_i\). Then, the following equality holds:
Proof
It suffices to show the sum on the left-hand side of Equation (2) only involves the subsets of \(\mathbb {S}\) whose cardinalities are no less than 2 and that each term t(s) for each \(s \subseteq \mathbb {S}, |s |\ge 2\), appears exactly \((|s |-1)\) times in this sum.
Indeed, the set \(\bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})\) is the set of all subsets of \(\mathbb {S}\) that contain \(a_i\) and at least 1 element among \(a_1,..., a_{i-1}\). In addition, on the left-hand side of Equation (2), the sum only involves all subsets of \(\mathbb {S}\) having at least 2 elements. It is because each subset needs to involve at least 2 distinct elements and for any subset \(s' \subseteq \mathbb {S}\) such that \(|s' |\ge 2\), take 2 elements \(a_p, a_q \in s', p<q\), then \(s' \in \bigcup \limits _{j=1}^{q-1} (S^{(q)} \cap S^{(j)})\).
Let \(s = \{a_{k_1},...,a_{k_m}\}\) with \(k_1<...<k_m\) and \(|s |=m \ge 2\) be a subset of \(\mathbb {S}\). For each \(i = k_2,...,k_m\), s appears exactly once in \(\bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})\) because for each of those \(i=k_2,...,k_m\), the set s is a subset of \(\mathbb {S}\) that contains \(a_i\) and \(a_{k_1} (k_1 < i)\).
For each \(i \in \mathbb {S} \setminus \{k_2,...,k_m\}\), s does not appear in \(\bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})\) as s fails to contain both \(a_i\) and an element \(a_j (j<i)\).
Therefore, the term t(s) corresponding each set s, \(s \subseteq \mathbb {S}, |s |\ge 2\), appears exactly \((|s |-1)\) times on the left-hand side of Eq. (2). Since both sides of Eq. (2) involve exactly all the subsets of \(\mathbb {S}\) whose cardinalities are greater than or equal to 2, the two sides of Equation (2) are equal to each other.
Lemma 3
(Invariance of COREA in each k-core) For \(k=k_0,...,N_{\textbf{G}}^{*}\), with \(k_0\) as the minimum core number of a node in \(\textbf{G}\), the total number of anchor availabilities of nodes having core number k, realized by COREA, remains unchanged regardless of the order of nodes removed in the core decomposition.
Proof
Without loss of generality, assume a particular order in which the nodes are deleted in the pruning process of obtaining the \((k+1)\)-core from the k-core is \([a_1,..., a_q]\), and their respective anchor availabilities realized by COREA are \(c(a_1),..., c(a_q)\). Note that \(\mathbb {S} = \{a_1,...,a_q\}\) is the set of all nodes having core number k. Denote \(S^{(i)}\) as the set of all subsets of \(\mathbb {S}\) containing \(a_i\). For each subset \(s \subseteq \mathbb {S}\), \(|s |\ge 1\), let t(s) be the number of hyperedges that have s as the set of anchors: \(t(s) = |\{e \in \textbf{E}\mid \textbf{A}_{\textbf{G}}(e) = s \}|\). Denote the set of all subsets of \(\mathbb {S}\) that contain \(a_i\) as \(S^{(i)}\). The set of subsets of \(\mathbb {S}\) that contain \(a_i\) and at least one element among \(a_1,...,a_{i-1}\) is \(\bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})\).
At the k-core, the degree \(d_{\textbf{G}}(v_{i_1})\) of node \(a_i\in \mathbb {S}\) is \(k + R(a_i)\) with \(R(a_i) \ge 0\) since the degree of each node among \(\{a_1,...,a_q\}\) has to be at least k. It should be noticed that \(R(a_i)=d_{\textbf{G}}(v_{i_1})-k\) is independent of the order of node deletions.
Assuming that Algorithm 2 is now at the k-core and undertakes the pruning process to obtain the \((k+1)\)-core while simultaneously obtaining the anchor availability for each node that has core number k. As node \(a_1\) is the first node to delete, its degree is \(\le k\). However, since no hyperedges at the k-core have been deleted yet, the degree of \(a_1\) at this point is \(k + R(a_1) \le k\). Therefore, \(R(a_1) = 0\), and according to COREA, the anchor availability realized for \(a_1\) is \(c(a_1)=0\).
For each \(i=2,...,q\), after nodes \(a_1,..., a_{i-1}\) have been removed, all of the hyperedges anchored at any of those nodes have also been removed from the network. Among those hyperedges, the ones that affect the degree of \(a_i\) are the ones co-anchored by \(a_i\) and at least 1 among \(a_1,..., a_{i-1}\). The number of such hyperedges is: \(\sum _{s \in \bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})} t(s)\). Due to the removals of these hyperedges, the degree of \(a_i\) immediately prior to its deletion is \(k + R(a_i) - \sum _{s \in \bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})} t(s)\). To qualify for deletion, the degree of \(a_i\) must be lower than \((k+1)\). In other words, \(k + R(a_i) - \sum _{s \in \bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})} t(s) \le k\), or \(- R(a_i) + \sum _{s \in \bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})} t(s) \ge 0\). The anchor availability realized for node \(a_i\) by COREA is then equal to: \(c(a_i) = - R(a_i) + \sum _{s \in \bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})} t(s)\).
The sum of anchor availabilities realized by COREA for all nodes in the k-core is: \(c_k = \sum _{i=1}^q c(a_i) = -\sum _{j=1}^{n}R(a_j) + \sum _{i=2}^{q} \sum _{s \in \bigcup \limits _{j=1}^{i-1} (S^{(i)} \cap S^{(j)})} t(s)\). Lemma 2 implies the following equality:
Thus, \(c(k) = -\sum _{j=1}^{q}R(a_j) + \sum _{s \subset \mathbb {S}, |s |\ge 2}(|s |-1)t(s)\). This value is symmetric with respect to each of \(a_1,...,a_q\), which is independent of any particular ordering of \(\mathbb {S} = \{a_1,...,a_q\}\).
Therefore, the total number of anchor availabilities realized by COREA for the nodes in the k-core is constant regardless of the order of deletions.
Lemma 4
For each \(k=k_0,...,N_{\textbf{G}}^{*}\), with \(k_0\) as the minimum core number of a node in \(\textbf{G}\), in the pruning process of obtaining the \((k+1)\)-core from the k-core, assume that in two different valid deletion orders \(\mathbb {O}\) and \(\mathbb {O}'\), \(a_p\) is the p-th node having core number k deleted and \(a_1,...,a_{p-1}\) are the \((p-1)\) nodes of core number k deleted before \(a_p\) (with different orders). The anchor availability realized for \(a_p\) is the same in both \(\mathbb {O}\) and \(\mathbb {O}'\).
Proof
We employ all the notations as in Lemma 3. According to the proof of Lemma 3, the anchor availability realized for \(a_p\) in either \(\mathbb {O}\) or \(\mathbb {O}'\) is \(- R(a_p) + \sum _{s \in \bigcup \limits _{j=1}^{p-1} (S^{(p-1)} \cap S^{(j)})} t(s)\), which is symmetric with respect to \(a_1,...,a_{p-1}\) and does not depend on any particular ordering or \(a_1,...,a_{p-1}\). This demonstrates that the anchor availability realized for each node is only dependent on the set of nodes deleted before it and independent of the deletion order by which those nodes are deleted.
Theorem 2
(Invariance of COREA) The total number of anchor availabilities \(\mathcal {C} = \sum _{v \in \textbf{V}}c(v)\) realized by COREA is always constant with respect to \(\textbf{G}\).
Proof
According to Lemma 3, for each \(k=k_0,...,N_{\textbf{G}}^{*}\), with \(k_0\) as the minimum core number of a node in \(\textbf{G}\), the total anchor availabilities realized by COREA for the nodes having core number k is the same regardless of the order \(\mathbb {O}\) of deletion (\(\mathbb {O}\) is a valid deletion order).
Since the total anchor availabilities realized by Algorithm 1 can be obtained by summing up all anchor availabilities realized at each core level, the total number of anchor availabilities realized by COREA is constant regardless of the order of deletions.
1.3 E3: Exhaustiveness of COREA
Lemma 5
(Exhaustiveness of COREA in each k-core) For \(k=k_0,...,N_{\textbf{G}}^{*}\), with \(k_0\) as the minimum core number of a node in \(\textbf{G}\), the total anchor availabilities for the nodes having core number k realized by COREA always is the maximum number of hyperedges anchored at the nodes of core number k that can be augmented, subject to the constraint of preserving the core number k of those nodes.
Proof
According to Lemma 3, the total anchor availabilities realized by Algorithm 1 for the nodes having core number k, is always the same regardless of the order of node deletions, and let \(T_k\) be such total number.
Assume the contradiction that \(T_k\) is not the maximum number of hyperedges anchored at the nodes of core number k that can be augmented to \(\textbf{G}\) while conserving all core numbers. As a result, there is a feasible augmentation method I that augments \(I_k\) hyperedges anchored at the nodes having core number k-core that preserve all core numbers with \(I_k \ge T_k + 1\). Without loss of generality, assume that with I, in a valid deletion order of the core decomposition process, all nodes having core number k are deleted in the order \([a_1,...,a_q]\) in the pruning process to obtain the \((k+1)\)-core from the k-core. Immediately before the deletion of \(a_i\), its degree is \(k - x(a_i) \le k\), with \(x(a_i) \ge 0\). Denote \(I_k^{(i)}\) as the number of hyperedges augmented by I whose anchors involve \(a_i\) and a subset s of \(\{a_{i+1},...,a_q\}\) (s maybe an empty subset). We then have \(\sum _{i=1}^{q}I_k^{(i)}= I_k\)
By Lemma 1, we know that without I, \([a_1,...,a_q]\) is still a subsequence, involving all the nodes having core number k, of a valid deletion order in the core decomposition of the original hypergraph \(\textbf{G}\). That is, without I, in the original hypergraph \(\textbf{G}\), the pruning process can still delete nodes \(a_1,...,a_q\) in this particular order to obtain the \((k+1)\)-core from the k-core. The degree of \(a_i\) immediately prior to its deletion is \(k - x(a_i) - I_k^{(i)}\). As a result, COREA realizes the anchor availability \(c(a_i) = x(a_i) + I_k^{(i)}\) for node \(a_i\). Lemma 3 proves that the value of \(T_k\) is always equal to: \(T_k = \sum _{i=1}^{q}c(a_i) = \sum _{i=1}^{q} [x(a_i) + I_k^{(i)}] = \sum _{i=1}^{q}I_k^{(i)} + \sum _{i=1}^{q} x(a_i) = I_k + \sum _{i=1}^{q} x(a_i) \ge I_k \ge T_k + 1\), which is a contradiction.
Therefore, the initial assumption is false, which proves that COREA returns the maximum number of hyperedges anchored at the nodes having core number k, subject to the constraint of preserving all core numbers.
Theorem 3
(Exhaustiveness of COREA) There is a maximum number \(\mathcal {M}\) of hyperedges that can be augmented to \(\textbf{G}\) while conserving all core numbers, and the total number of anchor availabilities \(\mathcal {C}\) realized by COREA is equal to \(\mathcal {M}\).
Proof
Lemma 5 shows that COREA always returns the maximum total number of anchor availabilities \(T_k\) of nodes having core number k for \(k=k_0,...,N_{\textbf{G}}^{*}\), with \(k_0\) as the minimum number of core number of a node in \(\textbf{G}\). According to Theorem 2, the total anchor availabilities realized by COREA is always \(\mathcal {C} = \sum _{v \in \textbf{V}}c(v)= \sum _{k=k_0}^{N_{\textbf{G}}^{*}}T_k\). Below, we prove that \(\mathcal {C}\) is actually the maximum number of hyperedges that can be augmented to \(\textbf{G}\) while conserving all core numbers.
Indeed, assume that a feasible augmentation method F augments \(I_k\) hyperedges, anchored at the nodes having core number k, for each \(k=k_0,...,N_{\textbf{G}}^{*}\), without changing any core numbers of the nodes in \(\textbf{G}\). The total anchor availability realized by F is \(I = \sum _{k=1}^{N_{\textbf{G}}^{*}}I_k\). According to Lemma 5, \(I_k \le T_k\), so: \(I = \sum _{k=k_0}^{D}I_k \le \sum _{k=k_0}^{N_{\textbf{G}}^{*}}T_k = \mathcal {C}\). In other words, the total number of hyperedges augmented by F is \(\le \mathcal {C}\).
Thus, the total anchor availabilities \(\mathcal {C}\) found by COREA is the maximum number of hyperedges that any feasible augmentation method can add to \(\textbf{G}\), subject to the constraint of preserving all core numbers. Theorem 2 states that \(\mathcal {C}\) is always constant with respect to \(\textbf{G}\), indicating that \(\mathcal {C}\) is the maximum number \(\mathcal {M}\) of hyperedges that can be augmented to \(\textbf{G}\) while conserving all core numbers, and \(\mathcal {M}=\mathcal {C}\).
1.4 E4: Time complexity of COREA
Theorem 4
(Time Complexity of COREA) Given the hypergraph \(\textbf{G}=(\textbf{V}, \textbf{E})\) with maximum hyperedge cardinality m, the budget B, the total number of anchor availabilities \(\mathcal {C}\) of all nodes (constant with respect to each dataset), and the batch size c by which COREA augments c hyperedges at a time in Step 2, the time complexity of COREA is \(\mathcal {O}\left[ |\textbf{V}|\text {log}|\textbf{V}|+ \mathcal {C}m \log |\textbf{V}|+ (|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |+ \mathcal {C}m^2)\frac{b}{c}\right]\), where \(b = \text {min}\{B, \mathcal {C}\}\).
Proof
As described in Sect. 5.1, computing the core influences of all nodes requires initializing the value 1 for each node and iterating through each node in each hyperedge once, so the time complexity of computing core influences is \(\mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |)\).
Step 1–1 of COREA, presented in Algorithm 1, undertakes the core decomposition process and computes the anchor availability of each node. The core decomposition process requires iterating through each node v for its removal and each hyperedge e for its removal and updating the degrees of its constituent nodes. The total time complexity for these operations is \(\mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |)\). Computing the anchor availability of each node v, \(N_{\textbf{G}}(v) = k\), requires some primitive operations (subtracting the degree immediately prior to the removal from k), so the time complexity of removing nodes, along with their incident hyperedges, and computing anchor availabilities for all nodes is \(\mathcal {O}(|\textbf{V}|)\), which is dominated by \(\mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |)\).
If the tie-breaking scheme \(\textsf{T}\) is being proportional to \(\mathcal{C}\mathcal{S}_{\textbf{G}}/\mathcal{C}\mathcal{I}_{\textbf{G}}\) (or \(1/\mathcal{C}\mathcal{I}_{\textbf{G}}\)), the core strength and core influence of each node v can be computed when v becomes qualified for removal in the core decomposition process. The reason is that the core influences of all the nodes in the k-core can be computed by the time Algorithm 2 completes finding the \((k-1)\)-core (the core influence of v only depends on the hyperedges incident to v having lower core numbers than that of v). Also, when a node v becomes qualified for removal in Algorithm 2, its core strength can be updated with constant time (based on its degree at the beginning of the k-core and its core number, which is determined to be k at this point already). Therefore, computing core strengths and core influences of all nodes for the scheme \(\textsf{T}\) does not affect the time complexity. For each \(k=k_0,...,N_{\textbf{G}}^{*}\), with \(k_0\) as the minimum core number of a node in \(\textbf{G}\), denote \(N_k\) as the number of nodes in \(\textbf{G}\) that have core number k. For each k, in the pruning process of obtaining the \((k+1)\)-core from the k-core, at each step, among the nodes in \(\mathbb{T}\mathbb{D}\) (Line 5 in Algorithm 2) that are the nodes qualified for removal, the tie-breaking scheme \(\textsf{T}\) needs to conduct weighted sampling to select a node to delete first. For each node v among \(N_k\) nodes of core number k, according to Vitter (1987), adding v to \(\mathbb{T}\mathbb{D}\) takes \(\mathcal {O}(1)\) time, sampling v from \(\mathbb{T}\mathbb{D}\) takes \(\mathcal {O}(\text {log}|\mathbb{T}\mathbb{D}|)\) time, and removing v after sampling from \(\mathbb{T}\mathbb{D}\) takes \(\mathcal {O}(\text {log}|\mathbb{T}\mathbb{D}|)\) time. Since \(|\mathbb{T}\mathbb{D}|\le N_k\), the total time complexity the tie-breaking scheme \(\textsf{T}\) to decide the order of nodes to delete in the k-core is \(\mathcal {O}(N_k \text {log}N_k)\). Therefore, the total time complexity for \(\textsf{T}\) to decide the deletion order \(\mathbb {O}\) for \(\textbf{G}\) is \(\sum _{k=k_0}^{N_{\textbf{G}}^{*}}\mathcal {O}(N_k \text {log}N_k)\). We have: \(\sum _{k=k_0}^{N_{\textbf{G}}^{*}}N_k \log N_k \le \sum _{k=k_0}^{N_{\textbf{G}}^{*}}N_k \log |\textbf{V}|= |\textbf{V}|\log |\textbf{V}|\). Therefore, \(\sum _{k=k_0}^{N_{\textbf{G}}^{*}}\mathcal {O}(N_k \text {log}N_k) = \mathcal {O}(|\textbf{V}|\text {log}|\textbf{V}|)\).
As a result, the total time complexity of Step 1–1 of COREA is \(\mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |) + \mathcal {O}(|\textbf{V}|\text {log}|\textbf{V}|) = \mathcal {O}(|\textbf{V}|\text {log}|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |)\).
In Step 1-2 of COREA, for each node v, we need to construct c(v) hyperedges anchored at v. For each hyperedge e among those c(v) hyperedges, this requires sampling a hyperedge size (constant time) and sampling other nodes from \(\mathbb {O}[i+1:]\) (as shown in Line 11 of Algorithm 1). For the sampling scheme \(\textsf{S}\) described in Sect. 6.2.2, the sampling step of other nodes to fill up e takes \(\mathcal {O}(m\log |\textbf{V}|)\) time, according to Vitter (1987). Therefore, the total time complexity of Step 1-2 is \(\mathcal {O}(\sum _{v \in \textbf{V}}c(v)m\log |\textbf{V}|) = \mathcal {O}(\mathcal {C}m \log |\textbf{V}|)\).
In Step 2, we go through b/c iterations, and in each iteration, we add c hyperedges to \(\textbf{G}_{\text {cur}}\). At each iteration, before choosing the hyperedges to augment to \(\textbf{G}_{\text {cur}}\), for each candidate hyperedge e in the pool P, COREA needs to evaluate how much augmenting e improves the term \(f(\textbf{G}_{\text {cur}}) = \sum _{v \in \textbf{V}}\mathcal{C}\mathcal{I}_{\textbf{G}_{\text {cur}}}(v)\mathcal{C}\mathcal{S}_{\textbf{G}_{\text {cur}}}(v)\), with \(\textbf{G}_{\text {cur}}\) as the current hypergraph snapshot. To do this, we maintain a measurement g(v) for each node v, quantifying how much \(f(\textbf{G}_{\text {cur}})\) increases if \(\mathcal{C}\mathcal{I}_{\textbf{G}_{\text {cur}}}(v)\) is incremented by 1 unit. Particularly, if \(\mathcal{C}\mathcal{I}_{\textbf{G}_{\text {cur}}}(v)\) increases by 1 unit, \(f(\textbf{G}_{\text {cur}})\) increases by g(v). In order to achieve this, we reverse the process of calculating all core influences. In the formula of core influence in Sect. 5.1, suppose \(\mathcal{C}\mathcal{I}_{\textbf{G}_{\text {cur}}}(v) = 1 + \sum _{e \in \textbf{E}_{\textbf{G}_{\text {cur}}}^<(v)}(1 + \frac{\Delta }{N_{\textbf{G}_{\text {cur}}}(v) - 1})\left[ (1 - \frac{\mathcal{C}\mathcal{S}_{\textbf{G}_{\text {cur}}}(t)-1}{|\textbf{E}_{\textbf{G}_{\text {cur}}}^=(t) |})\mathcal{C}\mathcal{I}_{\textbf{G}_{\text {cur}}}(t)\right]\), for each \(e \in \textbf{E}_{\textbf{G}_{\text {cur}}}^<(v)\), if \(\mathcal{C}\mathcal{I}_{\textbf{G}_{\text {cur}}}(t)\) increases by 1 unit, \(\mathcal{C}\mathcal{I}_{\textbf{G}_{\text {cur}}}(v)\) increases by \((1 + \frac{\Delta }{N_{\textbf{G}_{\text {cur}}}(v) - 1})\left[ (1 - \frac{\mathcal{C}\mathcal{S}_{\textbf{G}_{\text {cur}}}(t)-1}{|\textbf{E}_{\textbf{G}_{\text {cur}}}^=(t) |})\right]\) units. As a result, g(t) needs to increase by \((1 + \frac{\Delta }{N_{\textbf{G}_{\text {cur}}}(v) - 1})\left[ (1 - \frac{\mathcal{C}\mathcal{S}_{\textbf{G}_{\text {cur}}}(t)-1}{|\textbf{E}_{\textbf{G}_{\text {cur}}}^=(t) |})\right] g(v)\) units. To compute such value g(v) for each node v, we first initialize \(g(v) = \mathcal{C}\mathcal{S}_{\textbf{G}_{\text {cur}}}(v)\), start from the nodes with the highest core number, update the values g(.) until reaching the nodes with the lowest core number. The whole process requires iterating through each node once and each node in each hyperedge once, accounting for the total time complexity of \(\mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}_{\text {cur}}}|e |)\).
Once the values g(.) are up-to-date, for each candidate hyperedge e anchored at \(\{v_1,...,v_a\}\), and the other nodes in e that are not anchors of e are \(\{u_1,...,u_b\}\). Suppose adding e increases the core influences of \(u_1,...,u_b\) by \(\beta _1,...,\beta _b\), respectively, which can be calculated in \(\mathcal {O}(|e |^2)\) time that is upper-bounded by \(\mathcal {O}(m^2)\). The contribution of e into \(f(\textbf{G}_{\text {cur}})\) if augmented is then: \(\sum _{i=1}^{a}\mathcal{C}\mathcal{I}_{\textbf{G}_{\text {cur}}}(v_i) + \sum _{j=1}^{b}\beta _j \times g(u_j)\), which can be calculated in \(\mathcal {O}(m)\) time. Assume that we are at iteration t, for \(t=1,...,b/c\), when \((t-1)c\) candidate hyperedges have been added to \(\textbf{G}\), there are \(\mathcal {C} - (t-1)c\) hyperedges remaining in P. The time complexity of calculating the scores for the candidate hyperedges and choosing c hyperedges with the highest scores is then \(\mathcal {O}([\mathcal {C} - (t-1)c]m^2)\).
At each iteration t, for \(t=1,...,b/c\), of Step 2, after calculating the g(.) values and the score of each candidate hyperedge in P, we add c candidate hyperedges with the highest scores to the hypergraph. Once we augment c more hyperedges into \(\textbf{G}_{\text {cur}}\), we need to update all core strengths, core influences, and the values g(.), whose complexity is \(\mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}_{\text {cur}}}|e |)\).
At each iteration t, since tc hyperedges have been added to \(\textbf{G}\), \(\mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}_{\text {cur}}}|e |)=\mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |+ tcm)\) holds. Thus, the total time complexity of iteration t, for \(t=1,...,b/c\), of Step 2 is: \(\mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |+ tcm)+\mathcal {O}([\mathcal {C} - (t-1)c]m^2) + \mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |+ tcm) = \mathcal {O}(|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |+ [\mathcal {C} - (t-1)c]m^2 + tcm)\).
Summing over all iterations \(t=1,...,b/c\), the total time complexity of Step 2 of COREA is: \(\sum _{t=1}^{b/c}\mathcal {O}\left[ |\textbf{V}|+ \sum _{e \in \textbf{E}}|e |+ [\mathcal {C} - (t-1)c]m^2 + tcm \right] = \mathcal {O}\left[ (|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |+ \mathcal {C}m^2)\frac{b}{c}\right]\).
Summing up the time complexities of Steps 1-1, 1-2, and 2, the total time complexity of COREA is \(\mathcal {O}\left[ |\textbf{V}|\text {log}|\textbf{V}|+ \mathcal {C}m \log |\textbf{V}|+ (|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |+ \mathcal {C}m^2)\frac{b}{c}\right]\).
1.5 E5: Maximum anchor availability of a node
In this section, we discuss the cases when COREA cannot guarantee to afford maximum anchor availabilities for all nodes and the sufficient conditions to achieve the maximum anchor availability of a particular node v. While Theorem 2 shows that the sum of anchor availabilities of all nodes, realized by COREA, is always constant with respect to \(\textbf{G}\), different deletion orders in Step 1 of COREA, governed by the tie-breaking scheme \(\textsf{T}\) in Line 6 of Algorithm 2, may result in different anchor availabilities for each node.
In the pruning process of obtaining the \((k+1)\)-core form the k-core, at any point, there might be several nodes qualified for removal, i.e., they all have degrees \(\le k\). We first show that, deferring the removal of v, while choosing another node to delete first, potentially helps afford a higher anchor availability for v, as stated in Lemma 6.
Lemma 6
In the pruning process of obtaining the \((k+1)\)-core from the k-core, assume that both u and v are up for removal, and a valid deletion order \(\mathbb {O}\) chooses to remove v immediately before u. If we obtain a valid deletion order \(\mathbb {O}'\) by switching the positions of nodes v and u in \(\mathbb {O}\), the anchor availability realized by COREA for v remains the same or increases.
Proof
Assume that by the ordering of \(\mathbb {O}\), immediately prior to the deletion of v, the degrees of u and v are d(u) and d(v), respectively, with \(d(u), d(v) \le k\). Also assume that there remain \(t(\{u,v\}) \ge 0\) hyperedges anchored by both u and v. In \(\mathbb {O}\), we remove v then u and the deletion of v will remove all of its incident hyperedges, along with those \(t(\{u,v\})\) hyperedges anchored by u and v, so the respective degrees of v and u immediately prior to removals are d(v) and \(d(u) - t(\{u,v\})\). As a result, COREA realizes the respective anchor availabilities for v and u as \(c(v) = k - d(v)\) and \(c(u) = k - d(u) + t(\{u,v\})\), respectively.
Switching the positions of v and u in \(\mathbb {O}\), we obtain another valid deletion order \(\mathbb {O}'\). In \(\mathbb {O}'\), the deletion of u will remove all of its incident hyperedges, along with those \(t(\{u,v\})\) hyperedges anchored by u and v, so the respective degrees prior to removals of u and v are d(u) and \(d(v) - t(\{u,v\})\). As a result, the afforded anchor availabilities of u and v become \(c'(u) = k - d_{\textbf{G}}(u)\) and \(c'(v) = k - d(v) + t(\{u,v\})\).
As \(t(\{u,v\}) \ge 0\), \(c'(v) \ge c(v)\). Therefore, if we switch the positions of nodes v and u to obtain another valid deletion order, the anchor availability realized by COREA for v remains the same or increases.
In the proof for Lemma 6, in the case that \(t(\{u,v\}) > 0\), if we swap from a valid deletion order, deleting v first then deleting u, to obtain another valid deletion order, deleting u first then deleting v, the anchor availability for u decreases and that of v increases. Since COREA needs to remove one node at a time, it is clear that if u is deleted before v, u is certainly not afforded its maximum anchor availability, and the same holds for v in the case when v is removed before u. Therefore, if there are several nodes up for deletion and there are hyperedges co-anchored by them, those nodes cannot be afforded their respective maximum anchor availabilities simultaneously.
Lemma 7
In the pruning process of obtaining the \((k+1)\)-core from the k-core, in 2 different valid deletion orders \(\mathbb {O}\) and \(\mathbb {O}'\) where the removal of node v is deferred until a point when v is the only node up for removal, the anchor availabilities of v realized by Algorithm 2 in both \(\mathbb {O}\) and \(\mathbb {O}'\) are the same.
Proof
For each \(x \in \textbf{V}\) and \(N_{\textbf{G}}(x)=k\), refer to the degree of x at the beginning of the pruning process to obtain the \((k+1)\)-core from the k-core, when no nodes of core number k have been deleted, as the core degree of x, denoted as d(x). Denote \(S_{\mathbb {O}}(x)\) and \(S_{\mathbb {O}'}(x)\) as the sets of nodes that have core number k and get removed before x in \(\mathbb {O}\) and \(\mathbb {O}'\), respectively.
We first show that in both \(\mathbb {O}\) and \(\mathbb {O}'\), the sets of nodes deleted before v, denoted as \(S_{\mathbb {O}}(v)\) and \(S_{\mathbb {O}'}(v)\) respectively, are the same.
If \(S_{\mathbb {O}}(v) = \emptyset\), starting at the k-core, \(\mathbb {O}\) has to begin with v in the pruning process of obtaining the \((k+1)\)-core from the k-core. It implies that among the nodes of core number k, v is the only node whose core degree is equal to k. As a result, in \(\mathbb {O}'\), v also has to be the first node of core number k to delete, i.e., \(S_{\mathbb {O}'}(v) = \emptyset\). Therefore, \(S_{\mathbb {O}}(v) = S_{\mathbb {O}'}(v)\). A similar argument is made for the case in which \(S_{\mathbb {O}'}(v) = \emptyset\).
Assume the case that both \(S_{\mathbb {O}}(v)\) and \(S_{\mathbb {O}'}(v)\) are non-empty sets. Note that starting at the k-core, both \(\mathbb {O}\) and \(\mathbb {O}'\) need to begin with a node, other than v, whose core degree is exactly equal to k. Furthermore, all of the nodes, of core number k and other than v, whose core degrees are exactly equal to k must belong to both \(S_{\mathbb {O}}(v)\) and \(S_{\mathbb {O}'}(v)\) as these nodes are always qualified for removal at the beginning of the pruning process. It implies that \(S_{\mathbb {O}}(v) \cap S_{\mathbb {O}'}(v) \ne \emptyset\).
Assume by contradiction that there exists \(u \in S_{\mathbb {O}}(v)\) such that \(u \notin S_{\mathbb {O}'}(v)\). In other words, in \(\mathbb {O}'\), u is deleted after v, so \(d(u) > k\). For u to be deleted before v in \(\mathbb {O}\), the necessary and sufficient condition is that the removals of the nodes in \(S_{\mathbb {O}}(u)\), along with their incident hyperedges, result in the degree of u dropping lower than \(k+1\). We have \(S_{\mathbb {O}}(u) \subset S_{\mathbb {O}}(v)\). In \(\mathbb {O}'\), as we defer removing v to the point when v is the only node up for removal and u is deleted after v, the degree of u never drops lower than \(k+1\) before v is removed. If all nodes in \(S_{\mathbb {O}}(u)\) are also in \(S_{\mathbb {O}'}(v)\), u can be qualified for removal before v is removed in \(\mathbb {O}'\). Therefore, \(\exists t \in S_{\mathbb {O}}(u)\) and \(t \notin S_{\mathbb {O}'}(v)\), which also implies that \(t \in S_{\mathbb {O}}(v)\) and \(d(t)>k\). t is removed before u in \(\mathbb {O}\), \(t \ne u, t \in S_{\mathbb {O}}(v)\), and \(t \notin S_{\mathbb {O}'}(v)\). We now repeat the argument for u on t to derive that \(\exists y \in S_{\mathbb {O}}(v)\), y is removed before t in \(\mathbb {O}\), \(d(y)>k\), \(y \ne u, y \ne t\), and \(y \notin S_{\mathbb {O}'}(v)\). Applying the same argument on y and so on, we can repeat it infinitely many times. However, that is impossible because \(S_{\mathbb {O}}(v)\) has a finite number of elements. Therefore, the assumption that \(u \notin S_{\mathbb {O}'}(v)\) is false, i.e., \(u \in S_{\mathbb {O}'}(v)\).
Thus \(S_{\mathbb {O}}(v) \subseteq S_{\mathbb {O}'}(v)\). Similarly, we can also show \(S_{\mathbb {O}'}(v) \subseteq S_{\mathbb {O}}(v)\). It implies that \(S_{\mathbb {O}}(v) = S_{\mathbb {O}'}(v)\)
According to Lemma 4, even though the orders of the nodes preceding v are different in \(\mathbb {O}\) and \(\mathbb {O}'\), since they are the same set of nodes, the anchor availabilities of v in both \(\mathbb {O}\) and \(\mathbb {O}'\), are the same.
Theorem 5
5 [Maximum anchor availability of a node] If the tie-breaking scheme \(\textsf{T}\) in Algorithm 2 always defers the removal of node v, \(N_{\textbf{G}}(v)=k\), until the point when v is the only node qualified for removal during the pruning process to obtain the \((k+1)\)-core, COREA achieves the maximum anchor availability \(c^*(v)\) for v. For all tie-breaking schemes, the anchor availability c(v) realized for v, in Algorithm 2, is always \(\le c^*(v)\).
Proof
Denote \(S_1\) as a valid deletion order resulting from a tie-breaking scheme \(\textsf{T}\) that always defers the removal of v, \(N_{\textbf{G}}(v) = k\), in the core decomposition process until the point when v is the only node qualified for removal.
According to lemma 7, in all valid deletion orders that defer removing v to the point when v is the only node qualified for removal, the anchor availability realized for v by COREA is always the same, and equal to \(c_{S_1}(v)\), the anchor availability realized by following \(S_1\). If there exists a valid deletion order \(\mathbb {O}_0\) such that when v and at least another node are qualified for removal, v is chosen to be deleted first and afforded anchor availability \(c_{\mathbb {O}_o}(v)\), we can always form another valid deletion order by deferring the removal of v and deleting the other node first until v is the only node qualified for removal. According to Lemma 6, each time we do so, the new anchor availability for v is higher than or equal to the previous value, so \(c_{\mathbb {O}_o}(v) \le c_{S_1}(v)\). Therefore, \(c_{S_1}(v)\) is the maximum anchor availability for v that can be realized by COREA in any valid deletion order.
Thus, if the tie-breaking scheme \(\textsf{T}\) in Algorithm 2 always defers the removal of v until the point when v is the only node qualified for removal, COREA achieves the maximum anchor availability for \(c^*(v)\) for v.
Given a particular valid deletion order \(\mathbb {O}\) of nodes in the core decomposition, governed by the tie-breaking scheme \(\textsf{T}\), the anchor availability for each node is either the maximum possible or sub-optimal. While not guaranteeing to afford the maximum anchor availabilities for all nodes, in Theorem 5, we provide sufficient conditions to achieve the maximum anchor availability for a particular node v. That is, in the core decomposition process, COREA needs to always defer the deletion of v until the point when v is the only node qualified for removal.
However, as previously mentioned, it is important to note that, regardless of whether the availability for each node is sub-optimal, the sum \(\mathcal {C}\) of all anchor availabilities realized by COREA is always constant with respect to each hypergraph (Theorem 2) and equal to the maximum number of hyperedges any method can augment to the hypergraph without altering any core numbers (Theorem 3). Therefore, given the constraint of preserving all core numbers, no feasible augmentation method can augment more than \(\mathcal {C}\) hyperedges.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Do, M.T., Shin, K. Improving the core resilience of real-world hypergraphs. Data Min Knowl Disc 37, 2438–2493 (2023). https://doi.org/10.1007/s10618-023-00958-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-023-00958-0