Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applications | Data Mining and Knowledge Discovery Skip to main content
Log in

Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applications

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Hypergraphs are a powerful abstraction for modeling high-order relations, which are ubiquitous in many fields. A hypergraph consists of nodes and hyperedges (i.e., subsets of nodes); and there have been a number of attempts to extend the notion of \({\varvec{k}}\)-cores, which proved useful with numerous applications for pairwise graphs, to hypergraphs. However, the previous extensions are based on an unrealistic assumption that hyperedges are fragile, i.e., a high-order relation becomes obsolete as soon as a single member leaves it.In this work, we propose a new substructure model, called \({\varvec{(k,t)}}\)-hypercore, based on the assumption that high-order relations remain as long as at least t fraction of the members remains. Specifically, it is defined as the maximal subhypergraph where (1) every node is contained in at least \({\varvec{k}}\) hyperedges in it and (2) at least \({\varvec{t}}\) fraction of the nodes remain in every hyperedge. We first prove that, given \({\varvec{t}}\) (or \({\varvec{k}}\)), finding the \({\varvec{(k,t)}}\)-hypercore for every possible \({\varvec{k}}\) (or \({\varvec{t}}\)) can be computed in time linear w.r.t the sum of the sizes of hyperedges. Then, we demonstrate that real-world hypergraphs from the same domain share similar \({\varvec{(k,t)}}\)-hypercore structures, which capture different perspectives depending on \({\varvec{t}}\). Lastly, we show the successful applications of our model in identifying influential nodes, dense substructures, and vulnerability in hypergraphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. A multiset is a set al.lowing duplicate elements.

  2. In this work, the maximal subgraph (subhypergraph) satisfying some conditions means that every other graph (hypergraph) satisfying such conditions is a subgraph (subhypergraph) of the maximal one.

  3. Similar to clique expansion, we can also have weighted star expansion, which is, however, not used in this work.

  4. We assume that the input hypergraph is in the memory and thus do not count the complexity of loading the hypergraph, which is \(O(\sum _{e \in E} |e|)\).

  5. Recall that \(\ell\)-hypercoreness with \(\ell = 2\) is include in t-hypercoreness with \(t = 0\). For each dataset, we apply min-max normalization to all the possible \(\ell\) values with \(\ell \ge 3\) so that t-hypercoreness and \(\ell\)-hypercoreness can fit in the same x-axis with the range [0, 1].

  6. The average performance over five independent trials is reported.

  7. We count \(\ell\)-hypercoreness with each \(\ell\) value as a separate method (\(\ell = 2\) is not counted since it is already included in the concept of t-hypercoreness with \(t = 0\)).

  8. Simplicial complexes can be seen as a special class of hypergraphs.

  9. The case \(\ell = 2\) is included in the proposed concept of t-hypercoreness with \(t = 0\).

References

  • Adamic LA, Lukose RM, Puniyani AR et al. (2001) Search in power-law networks. Phys Rev E 64(4):046–135

    Article  Google Scholar 

  • Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47

    Article  MathSciNet  MATH  Google Scholar 

  • Alvarez-Hamelin JI, Dall’Asta L, Barrat A, et al. (2006) Large scale networks fingerprinting and visualization using the k-core decomposition. In: NeurIPS

  • Alvarez-Hamelin JI, Dall’Asta L, Barrat A et al. (2008) K-core decomposition of internet graphs: hierarchies, self-similarity and measurement biases. Netw Heterog Media 3(2):371

    Article  MathSciNet  MATH  Google Scholar 

  • Antelmi A, Cordasco G, Spagnuolo C et al. (2021) Social influence maximization in hypergraphs. Entropy 23(7):796

    Article  MathSciNet  Google Scholar 

  • Arafat NA, Khan A, Rai AK, et al. (2023) Neighborhood-based hypergraph core decomposition. PVLDB 16

  • Arya D, Gupta DK, Rudinac S, et al. (2020) Hypersage: Generalizing inductive representation learning on hypergraphs. arXiv preprint arXiv:2010.04558

  • Bai S, Zhang F, Torr PH (2021) Hypergraph convolution and hypergraph attention. Pattern Recogn 110(107):637

    Google Scholar 

  • Batagelj V, Zaversnik M (2003) An \(o(m)\) algorithm for cores decomposition of networks. In: arXiv

  • Benson AR (2019) Three hypergraph eigenvector centralities. SIAM J Math Data Sci 1(2):293–312

    Article  MathSciNet  MATH  Google Scholar 

  • Benson AR, Abebe R, Schaub MT et al. (2018) Simplicial closure and higher-order link prediction. PNAS 115(48):E11221–E11230

    Article  Google Scholar 

  • Benson AR, Kumar R, Tomkins A (2018b) Sequences of sets. In: KDD

  • Blanco R, Lioma C (2012) Graph-based term weighting for information retrieval. Inf Retr 15(1):54–92

    Article  Google Scholar 

  • Bodó Á, Katona GY, Simon PL (2016) Sis epidemic propagation on hypergraphs. Bull Math Biol 78(4):713–735

    Article  MathSciNet  MATH  Google Scholar 

  • Bonacich P, Lloyd P (2001) Eigenvector-like measures of centrality for asymmetric relations. Social Netw 23(3):191–201

    Article  Google Scholar 

  • Bonchi F, Khan A, Severini L (2019) Distance-generalized core decomposition. In: SIGMOD

  • Bu F, Lee G, Shin K (2023) Code, datasets, and supplementary materials. https://github.com/bokveizen/non-fragile-hypercore

  • Chein M, Mugnier ML (2008) Graph-based knowledge representation: computational foundations of conceptual graphs. Springer

    MATH  Google Scholar 

  • Chen Z, Yuan L, Han L, et al. (2021) Higher-order truss decomposition in graphs. In: TKDE

  • Chien E, Pan C, Peng J, et al. (2021) You are allset: a multiset function framework for hypergraph neural networks. arXiv preprint arXiv:2106.13264

  • Corominas-Murtra B, Fuchs B, Thurner S (2014) Detection of the elite structure in a virtual multiplex social system by means of a generalised k-core. PLoS ONE 9(12):e11,2606

    Article  Google Scholar 

  • Cui H, Lu Z, Li P, et al. (2022) On positional and structural node features for graph neural networks on non-attributed graphs. In: CIKM

  • Dai Q, Li RH, Qin L, et al. (2021) Scaling up distance-generalized core decomposition. In: CIKM

  • Debnath S, Ganguly N, Mitra P (2008) Feature weighting in content based recommendation system using social network analysis. In: WWW

  • Do MT, Yoon Se, Hooi B, et al. (2020) Structural patterns and generative models of real-world hypergraphs. In: KDD

  • Feng Y, You H, Zhang Z, et al. (2019) Hypergraph neural networks. In: AAAI

  • Gabert K, Pinar A, Çatalyürek ÜV (2021a) Shared-memory scalable k-core maintenance on dynamic graphs and hypergraphs. In: IPDPSW

  • Gabert K, Pinar A, Çatalyürek ÜV (2021b) A unifying framework to identify dense subgraphs on streams: Graph nuclei to hypergraph cores. In: WSDM

  • Gao Y, Feng Y, Ji S, et al. (2022) Hgnn\(^{+}\): General hypergraph neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence

  • Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 855–864

  • Han Z, Zheng X, Chen C, et al. (2023) Intra and inter domain hypergraph convolutional network for cross-domain recommendation. In: WWW

  • He T, Ong YS, Bai L (2021) Learning conjoint attentions for graph neural nets. In: NeurIPS

  • Hua QS, Zhang X, Jin H et al. (2023) Revisiting core maintenance for dynamic hypergraphs. IEEE Trans Parallel Distrib Syst 34:981–994

    Article  Google Scholar 

  • Huang J, Yang J (2021) Unignn: a unified framework for graph and hypergraph neural networks. arXiv preprint arXiv:2105.00956

  • Jiang J, Wei Y, Feng Y, et al. (2019) Dynamic hypergraph neural networks. In: IJCAI

  • Kim H, Ko J, Bu F, et al. (2023) Characterization of simplicial complexes by counting simplets beyond four nodes. In: WWW

  • Kim J, Oh S, Cho S, et al. (2022) Equivariant hypergraph neural networks. In: ECCV

  • Kitsak M, Gallos LK, Havlin S et al. (2010) Identification of influential spreaders in complex networks. Nat Phys 6(11):888–893

    Article  Google Scholar 

  • Ko J, Kook Y, Shin K (2022) Growth patterns and models of real-world hypergraphs. KAIS 64(11):2883–2920

    Google Scholar 

  • Konstantinova EV, Skorobogatov VA (2001) Application of hypergraph theory in chemistry. Discret Math 235(1–3):365–383

    Article  MathSciNet  MATH  Google Scholar 

  • Lee D, Shin K (2023) I’m me, we’re us, and i’m us: Tri-directional contrastive learning on hypergraphs. In: AAAI

  • Lee G, Shin K (2021) Thyme+: Temporal hypergraph motifs and fast algorithms for exact counting. In: ICDM

  • Lee G, Ko J, Shin K (2020) Hypergraph motifs: concepts, algorithms, and discoveries. PVLDB 13(11):2256–2269

    Google Scholar 

  • Lee G, Choe M, Shin K (2021) How do hyperedges overlap in real-world hypergraphs? - patterns, measures, and generators. In: WWW

  • Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. TKDD 1(1):2–es

  • Li P, Wang H, Li K, et al. (2023) Influence without authority: Maximizing information coverage in hypergraphs. In: SDM

  • Liao X, Xu Y, Ling H (2021) Hypergraph neural networks for hypergraph matching. In: ICCV

  • Limnios S, Dasoulas G, Thilikos DM, et al. (2021) Hcore-init: Neural network initialization based on graph degeneracy. In: ICPR

  • Liu B, Yuan L, Lin X et al. (2020) Efficient (\(\alpha\), \(\beta\))-core computation in bipartite graphs. VLDB J 29(5):1075–1099

    Article  Google Scholar 

  • Lotito QF, Musciotto F, Montresor A et al. (2022) Higher-order motif analysis in hypergraphs. Commun Phys 5(1):79

    Article  Google Scholar 

  • Lu Z, Zhu Y, Zhong M, et al. (2022) On time-optimal (k, p)-core community search in dynamic graphs. In: ICDE

  • Luo F, Li B, Wan XF, et al. (2009) Core and periphery structures in protein interaction networks. In: BMC bioinformatics

  • Luo Q, Yu D, Cai Z, et al. (2021) Hypercore maintenance in dynamic hypergraphs. In: ICDE

  • Luo Q, Yu D, Cai Z et al. (2022) Toward maintenance of hypercores in large-scale dynamic hypergraphs. VLDB J 32:1–18

    Google Scholar 

  • Malliaros FD, Giatsidis C, Papadopoulos AN et al. (2020) The core decomposition of networks: theory, algorithms and applications. VLDB J 29:61–92

    Article  Google Scholar 

  • Mastrandrea R, Fournet J, Barrat A (2015) Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PloS one 10(9):e0136497

    Article  Google Scholar 

  • McGlohon M, Akoglu L, Faloutsos C (2008) Weighted graphs and disconnected components: patterns and a generator. In: KDD

  • Mihalcea R, Radev D (2011) Graph-based Natural Language Processing and Information Retrieval. Cambridge University Press

    Book  MATH  Google Scholar 

  • Peng C, Kolda TG, Pinar A (2014) Accelerating community detection by using k-core subgraphs. In: arXiv

  • Peng Y, Zhang Y, Zhang W, et al. (2018) Efficient probabilistic k-core computation on uncertain graphs. In: ICDE

  • Preti G, De Francisci Morales G, Bonchi F (2021) Strud: Truss decomposition of simplicial complexes. In: WWW

  • Qu C, Tao M, Yuan R (2018) A hypergraph-based blockchain model and application in internet of things-enabled smart homes. Sensors 18(9):2784

    Article  Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Article  Google Scholar 

  • Rossi MEG, Malliaros FD, Vazirgiannis M (2015) Spread it good, spread it fast: Identification of influential nodes in social networks. In: WWW

  • Sarıyüce AE, Pinar A (2018) Peeling bipartite networks for dense subgraph discovery. In: WSDM

  • Seidman SB (1983) Network structure and minimum degree. Social Netw 5(3):269–287

    Article  MathSciNet  Google Scholar 

  • Shin K, Eliassi-Rad T, Faloutsos C (2018) Patterns and anomalies in k-cores of real-world graphs with applications. KAIS 54(3):677–710

    Google Scholar 

  • Shin K, Hooi B, Faloutsos C (2018) Fast, accurate, and flexible algorithms for dense subtensor mining. TKDD 12(3):1–30

    Article  Google Scholar 

  • Silva NB, Tsang R, Cavalcanti GD, et al. (2010) A graph-based friend recommendation system using genetic algorithm. In: CEC

  • Sinha A, Shen Z, Song Y, et al. (2015) An overview of microsoft academic service (mas) and applications. In: WWW

  • Sun B, Chan THH, Sozio M (2020) Fully dynamic approximate k-core decomposition in hypergraphs. TKDD 14(4):1–21

    Article  Google Scholar 

  • Torres L, Blevins AS, Bassett DS et al. (2021) The why, how, and when of representations for complex systems. SIAM Rev 63:435–485

    Article  MathSciNet  MATH  Google Scholar 

  • Tudisco F, Higham DJ (2021) Node and edge nonlinear eigenvector centrality for hypergraphs. Commun Phys 4(1):1–10

    Google Scholar 

  • Victor F, Akcora CG, Gel YR, et al. (2021) Alphacore: data depth based core decomposition. In: KDD

  • Vogiatzis D (2013) Influence study on hyper-graphs. In: AAAI Symposia

  • Wang K, Cao X, Lin X, et al. (2018) Efficient computing of radius-bounded k-cores. In: ICDE

  • Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature 393(6684):440–442

    Article  MATH  Google Scholar 

  • Wood CI, Hicks IV (2015) The minimal k-core problem for modeling k-assemblies. J Math Neurosci 5(1):1–19

    Article  MathSciNet  MATH  Google Scholar 

  • Wu T, Ling Q (2023) Self-supervised heterogeneous hypergraph network for knowledge tracing. Inf Sci 624:200–216

    Article  Google Scholar 

  • Xia L, Huang C, Xu Y, et al. (2022) Hypergraph contrastive collaborative filtering. In: SIGIR

  • Xie M, Zhan X, Liu C, et al. (2023) An efficient adaptive degree-based heuristic algorithm for influence maximization in hypergraphs. Inf Process Manage 60(2):103161

    Article  Google Scholar 

  • Yang C, Wang R, Yao S, et al. (2022) Semi-supervised hypergraph node classification on hypergraph line expansion. In: CIKM

  • Yin H, Benson AR, Leskovec J, et al. (2017) Local higher-order graph clustering. In: KDD

  • Zhang C, Zhang F, Zhang W, et al. (2020) Exploring finer granularity within the cores: efficient (k, p)-core computation. In: ICDE

  • Zhang F, Zhang Y, Qin L, et al. (2017a) Finding critical users for social network engagement: the collapsed k-core problem. In: AAAI

  • Zhang F, Zhang Y, Qin L, et al. (2017b) When engagement meets similarity: efficient (k, r)-core computation on social networks. In: PVLDB

  • Zhang Y, Parthasarathy S (2012) Extracting analyzing and visualizing triangle k-core motifs within networks. In: ICDE

  • Zhu J, Zhu J, Ghosh S et al. (2018) Social influence maximization in hypergraph in social networks. TNSE 6(4):801–811

    MathSciNet  Google Scholar 

  • Zhu W, Chen C, Wang X, et al. (2018b) K-core minimization: an edge manipulation approach. In: CIKM

  • Zhu W, Zhang M, Chen C, et al. (2019) Pivotal relationship identification: the k-truss minimization problem. In: IJCAI

  • Zien JY, Schlag MD, Chan PK (1999) Multilevel spectral hypergraph partitioning with arbitrary vertex sizes. IEEE Trans Comput Aided Des Integr Circuits Syst 18(9):1389–1399

    Article  Google Scholar 

Download references

Funding

This work was supported by National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2020R1C1C1008296) and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00871, Development of AI Autonomy and Knowledge Enhancement for AI Agent Collaboration) (No. 2019-0-00075, Artificial Intelligence Graduate School Program (KAIST)).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kijung Shin.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Responsible editor: Charalampos Tsourakakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 4972 KB)

Appendices

A. Proofs

1.1 A.1 Proof of proposition 1

Proof

Since H is finite, the number of subhypergraphs of H is also finite. Therefore, there exists one subhypergraph with maximal total size (which is possibly an empty hypergraph) where each node has degree at least k and at least t proportion of the constituent nodes remain in each hyperedge, completing the proof of existence. To show the uniqueness, suppose the opposite, and let \(C^1 = (V^1, E^1)\) and \(C^2 = (V^2, E^2)\) be two distinct (kt)-hypercores of H. Then we consider the hypergraph \(C' = (V', E')\) with \(E' = \{e^1_i \cup e^2_i: i \in I_{E^1} \cup I_{E^2}\}\). Clearly, \(C'\) is a subhypergraph of H with a larger total size that satisfies the node-degree and hyperedge-fraction conditions, which contradicts the maximality and completes the proof. \(\square\)

1.2 A.2 Proof of proposition 2

Proof

Suppose that \(C_{k, t_2}(H)\) is not a subhypergraph of \(C_{k, t_1}(H)\). Then we take the union \(C_{k, t_2}(H) \cup C_{k, t_1}(H)\) and we obtain a hypergraph that is strictly larger than \(C_{k, t_1}(H)\) and satisfies the conditions of \((k, t_1)\)-hypercore, which contradicts with the maximality, completing the proof. The second statement can be proved similarly. \(\square\)

1.3 A.3 Proof of lemma 1

Proof

This equivalence is immediate by two facts. First, for each node \(v \in V\), the degree of v in H is equal to the degree of v in \(G_{bp}(H)\). Second, for each hyperedge \(e \in E\), the number of nodes in e is equal to the degree of e in \(G_{bp}(H)\). With the above two facts, this equivalence immediately follows. \(\square\)

1.4 A.4 Proof of lemma 2

Proof

By Definition  5, when \(t = 0\), the definition of \(C_{k; t = 0}(H)\) is the maximal subhypergraph of H where (1) every node in \(C_{k, t=0}(H)\) has degree at least k and (2) at least two nodes remain in every hyperedge of \(C_{k, t=0}(H)\). Such a definition exactly coincides with \({\tilde{C}}_{k; \ell = 2}(H)\), completing the proof. \(\square\)

1.5 A.5 Proof of lemma 3

We can understand the differences between \((k; \ell )\)-hypercores and (kt)-hypercores by two intuitions. When we obtain the \((k; \ell )\)-hypercore of a given H with \(\ell > 2\), all hyperedges of cardinality 2 are removed in the first place. Therefore, if we want to find an \(\ell\) such that \({\tilde{C}}_{k; \ell } = C_{k, t}\) where \(C_{k, t}\) contains any hyperedge of cardinality 2, the only possible \(\ell\) value is \(\ell = 2\). Since the threshold in the (kt)-hypercore is proportional, it imposes different absolute cardinality thresholds for hyperedges of different sizes. On the contrary, the \((k; \ell )\)-hypercore imposes the same absolute cardinality threshold for each hyperedge. We shall show two counterexamples from the two intuitions above.

Proof

Consider \(H = (V, E)\) with \(E = \{\{1, 2\}, \{1, 3\} \{1, 2, 3, 4\}, \{1, 3, 4, 5, 6\}\}.\) The \((k=2, t=3/4)\)-hypercore of H consists of the hyperedges \(\{\{1,2\}, \{1,3\}, \{1, 2, 3\}\},\) where the hyperedge \(\{1,3,4,5,6\}\) is totally removed since only \(3/5 < t = 3/4\) of the constituent nodes remain by the node-degree threshold \(k = 2\). For the \((k; \ell )\)-hypercore, the \((k=2; \ell =2)\)-hypercore of H consists of the hyperedges \(\{\{1, 2\}, \{1, 3\}, \{1, 2, 3, 4\}, \{1, 3, 4\}\}\); the \((k=2; \ell =3)\)-hypercore of H consists of the hyperedges \(\{\{1, 3, 4\}, \{1, 3, 4\}\}\); when \(\ell \ge 4\), the \((k=2;\ell )\)-hypercore of H is empty, completing the proof.

We show another counterexample. Consider \(H = (V, E)\) with \(E = \{\{1,2,3,4\}, \{1,2,5,6\}, \{5,6,7,8\},\{3,4,9,10,11\},\{1,2,3,4,5,6,7,8\}\}.\) The \((k=3, t=1/2)\)-hypercore of H consists of the hyperedges \(\{\{1,2\}, \{1,2,5,6\}, \{5,6\}, \{1,2,5,6\}\}.\) For the \((k; \ell )\)-hypercore, when \(\ell = 2\), the \((k=3; \ell =2)\)-hypercore of H consists of the hyperedges \(\{\{1,2,3,4\},\{1,2,5,6\},\{5,6\},\{3,4\},\{1,2,3,4,5,6\}\};\) when \(\ell \ge 3\), the \((k=3; \ell )\)-hypercore of H is empty, completing the proof. \(\square\)

Remark 1

Our proposed (kt)-hypercore allows arbitrarily fine-grained adjustment since the value of t is continuous in [0, 1], while \(\ell\) must be an integer. In real-world hypergraphs, many hyperedges are of cardinality 2. Therefore, the \((k; \ell )\)-hypercore with \(\ell > 2\) is significantly less meaningful than the (kt)-hypercore since many hyperedges are not taken into consideration at all. See Tbl. 5 for the detailed number of hyperedges of different cardinality in each dataset we have used. See Figs. 8 and 9 for the performance of hypercoreness w.r.t \((k; \ell )\)-hypercore to indicate the influence of nodes. Note again that the \((k; \ell =2)\)-hypercore is included in our proposed concept as the \((k, t=0)\)-hypercore. We observe that in most datasets, the \((k; \ell )\)-hypercores become less meaningful and fail to indicate the influence of nodes when \(\ell\) becomes large, as expected.

1.6 A.6 Proof of theorem 1

Proof

Correctness. The size of a hyperedge changes only when some node in \({\mathcal {R}}\) is removed from it, and the degree of a node changes only when some incident hyperedge is removed. Therefore, when Algorithm 1 ends, each node has degree at least k, otherwise it must have been included in \({\mathcal {R}}\) and removed, and each hyperedge satisfies the hyperedge-fraction condition, otherwise it must have been removed. This implies that the output of Algorithm 1 satisfies both the node-degree and hyperedge-fraction conditions w.r.t H, k, and t. We now show the maximality. Suppose not, and let \((v, e_i)\) be the first node-hyperedge pair that appears during the process of Algorithm 1 with \(v \in e_i \in E(C_{k, t})\) but \(v \notin e_i' \in E'\), where \(C' = (V', E')\) is the returned hypergraph. This implies that v is removed from e, and thus v is included in \({\mathcal {R}}\) because its degree has been below k. However, by the definition of the (kt)-hypercore and the assumption that \((v, e_i)\) is the first pair, before the deletion, the degree of v is at least k, which completes the proof by contradiction.

Time complexity. We assume the input hypergraph has been loaded in the memory and thus do not count the complexity of loading the hypergraph. Checking the initial degrees (Line 1) takes \(O(|V|)\). In the while loop, each node is added to the set of nodes to be removed at most once since each node is added exactly when its degree decreases from k to \(k - 1\). Therefore, this process takes \(O(|V|)\). By checking the incident edges of each node in \({\mathcal {R}}\), we find all \(e'_i\)s intersecting with \({\mathcal {R}}\), which takes \(O(|{\mathcal {R}}|) = O(|V|)\). Hash tables are used to implement the sets. Before a hyperedge \(e \in E\) is totally removed, at least \(\max \left( \lceil t|e| \rceil , 2\right)\) nodes remain in it (otherwise it has been removed earlier), and thus it can be visited at most \(|e| - \max \left( \lceil t|e| \rceil , 2\right) + 1\) times (because one node is removed at each time). This process takes \(O(\sum _{e \in E} (|e| - \max \left( \lceil t|e| \rceil , 2\right) + 1)) = O(|E| + (1 - t) \sum _{e \in E} |e|)\). Therefore, the total time complexity is \(O(|V|) + O(|E| + (1 - t) \sum _{e \in E} |e|) = O(|E| + (1 - t) \sum _{e \in E} |e|)\). \(\square\)

1.7 A.7 Proof of theorem 2

Proof

Correctness. For each node v, the assignment of \(c_t(v)\) happens only once when \(v \in {\mathcal {R}}\), i.e., before its deletion. By Theorem 1, \(c_t(v) = k - 1\) implies that v is not in the (kt)-hypercore but in the previous \((k', t)\)-hypercore where each node has degree at least \(k - 1\), i.e., v is in the \((k-1, t)\)-hypercore.

Time complexity. The values of k increases \(O(c_t^*)\) times, thus the process in Lines 4 and 5 is repeated for \(O(c_t^*)\) times and takes \(O(c_t^* |V|)\). The assignment of t-hypercoreness of each node (Line 7) takes \(O(|V|)\). As shown in the proof of Theorem 1, each hyperedge is visited at most \(|e| - \max (\lceil t|e| \rceil , 2) + 1\) times before being deleted and each node is added to the set of nodes to be removed only once. Therefore, the remaining process takes \(O(|V| + |E| + (1 - t) \sum _{e \in E} |e|)\). \(\square\)

1.8 A.8 Proof of theorem 3

Proof

Correctness. For each node v, the assignment of \(f_k(v)\) happens only once when \(v \in {\mathcal {R}}\), i.e., before its deletion. By Theorem 1, \(f_k(v) = t\) implies that v is in the (kt)-hypercore with degree k and is in at least one hyperedge that is in the (kt)-hypercore but not in any \((k, t')\)-hypercore with \(t' > t\). Therefore, v is not in any \((k, t')\)-hypercore with \(t' > t\).

Time complexity. Recording the hyperedge sizes (Line 1) takes \(O(|E|)\). By Theorem 1, computing \(C_{k, 0}\) (Line 2) takes \(O(\sum _{e \in E} |e|)\). As shown in the previous proofs, the while loop (Lines 4 to 12) takes \(O(|V| + |E| + \sum _{e \in E} |e|) = O(\sum _{e \in E} |e|)\). \(\square\)

Details of datasets

In this section, we provide more details of the datasets used in our experiments.

  • Coauth-DBLP/Geology. In these two coauthorship hypergraphs, each hyperedge represents a publication, and the constituent nodes of a hyperedge represent the authors of the corresponding publication.

  • NDC-classes/substances. In these two hypergraphs from the National Drug Code (NDC) Directory, each hyperedge represents a drug (with its unique NDC code), and the constituent nodes of a hyperedge represent the class labels (for NDC-classes) or the ingredients (for NDC-substances) of the drug.

  • Contact-high/primary. In these two contact hypergraphs, each hyperedge represents a group of interacting individuals (the constituent nodes) within a predetermined time period.

  • Email-Enron/Eu. In these two email hypergraphs, each hyperedge represents an email (possibly sent to multiple people individually at the same time), which contains the sender and all the receivers as its constituent nodes.

  • Tags-ubuntu/math/SO. In these three tags hypergraphs from https://stackoverflow.com/, each node represents a tag, and each hyperedge represents a question, where each constituent node represents a tag applied to the question.

  • Threads-ubuntu/math/SO. In these three threads hypergraphs also from https://stackoverflow.com/, each hyperedge represents a thread, where each constituent node represents a person that participates in it.

We have used the preprocessed version of the datasets where each hyperedge consists of at most 25 nodes. In Table 5, we report the number of hyperedges of different cardinality in each dataset. Notably, on https://www.cs.cornell.edu/~arb/data/, the full version of the datasets, in which the cardinality of the hyperedges is not limited, is also available.

Table 5 The number of hyperedges of different cardinality in each dataset. For each dataset, we list the number of hyperedges of each specific size. Specifically, in most datasets, a large number of hyperedges are of cardinality 2. We use \(E_s\) to denote the set of hyperedges of cardinality s, for each s
Table 6 For each dataset, we report the information gain over degree, for each of the considered quantities: \(\ell\)-hypercoreness with \(\ell \in \{3, 4, 5\}\), neighbor-hypercoreness, and neighbor-degree-hypercoreness

Additional experimental results

In this section, we provide additional experimental results supplementing the main text. In Fig. 15, we report the results regarding the statistical difference between t-hypercoreness and other centrality measures, as well as among t-hypercoreness with different t, on the datasets not covered in the main text. In Fig. 16, we report the results regarding the information gain over degree, on the datasets not covered in the main text. In Table 6, for each dataset, we report the information gain over degree for the following quantities: \(\ell\)-hypercoreness with \(\ell \in \{3, 4, 5\}\),Footnote 9 neighbor-hypercoreness, and neighbor-degree-hypercoreness (Definitions. 9, 12, and 14). Notably, we do not claim that higher information gain is always better, since degree is still a reason measure by cohesiveness, and being too different from degrees can be negative as a cohesiveness measure. In Fig. 17, we report the results on influential-node identification, on the datasets not covered in the main text. In Table 7, we report the full results of the collapsed (kt)-hypercore problem.

Fig. 15
figure 15

Supplementary results for Fig. 6

Fig. 16
figure 16

Supplementary results for Fig. 7

Fig. 17
figure 17

Supplementary results for Fig. 8

Table 7 Full results of the collapsed (kt)-hypercore problem (\(b = 100\)). Time: running time (in seconds). Red.: reduction in the hypercore size

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bu, F., Lee, G. & Shin, K. Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applications. Data Min Knowl Disc 37, 2389–2437 (2023). https://doi.org/10.1007/s10618-023-00956-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00956-2

Keywords

Navigation