Abstract
In this paper we study a problem of determining when entities are active based on their interactions with each other. We consider a set of entities V and a sequence of time-stamped edges E among the entities. Each edge \((u,v,t)\in E\) denotes an interaction between entities u and v at time t. We assume an activity model where each entity is active during at most k time intervals. An interaction (u, v, t) can be explained if at least one of u or v are active at time t. Our goal is to reconstruct the activity intervals for all entities in the network, so as to explain the observed interactions. This problem, the network-untangling problem, can be applied to discover event timelines from complex entity interactions. We provide two formulations of the network-untangling problem: (i) minimizing the total interval length over all entities (sum version), and (ii) minimizing the maximum interval length (max version). We study separately the two problems for \(k=1\) and \(k>1\) activity intervals per entity. For the case \(k=1\), we show that the sum problem is NP-hard, while the max problem can be solved optimally in linear time. For the sum problem we provide efficient algorithms motivated by realistic assumptions. For the case of \(k>1\), we show that both formulations are inapproximable. However, we propose efficient algorithms based on alternative optimization. We complement our study with an evaluation on synthetic and real-world datasets, which demonstrates the validity of our concepts and the good performance of our algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The scripts used to generate the Synthetic and \(\textit{Synthetic} _k\) datasets are available at https://github.com/polinapolina/the-network-untangling-problem. The Twitter dataset used for the case study is not publicly available.
Notes
Due to the property of implication graph, either all or none variables will be set in the component.
The implementation of all algorithms and sample scripts used for the experimental evaluation is available at https://github.com/polinapolina/the-network-untangling-problem.
References
Aspvall B, Plass MF, Tarjan RE (1979) A linear-time algorithm for testing the truth of certain quantified Boolean formulas. Inf Process Lett 8(3):121–123
Becker H, Naaman M, Gravano L (2011) Beyond trending topics: real-world event identification on twitter, vol 11. Association for the Advancement of Artificial Intelligence, Menlo Park
Bellman R (1961) On the approximation of curves by line segments using dynamic programming. Commun ACM 4(6):284
Berlingerio M, Bonchi F, Bringmann B, Gionis A (2009) Mining graph evolution rules. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 115–130
Bogdanov P, Mongiovì M, Singh AK (2011) Mining heavy subgraphs in time-evolving networks. In: 2011 IEEE 11th international conference on data mining. IEEE, pp 81–90
Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, pp 1–10
Chvatal V (1979) A greedy heuristic for the set-covering problem. Math Oper Res 4(3):233–235
Dakiche N, Tayeb FBS, Slimani Y, Benatchba K (2019) Tracking community evolution in social networks: a survey. Inf Process Manag 56(3):1084–1102
Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on very large data bases, pp 181–192
Gauvin L, Panisson A, Cattuto C (2014) Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach. PLoS ONE 9(1):e86028
Guha S, Koudas N, Shim K (2006) Approximation and streaming algorithms for histogram construction problems. ACM Trans Database Syst (TODS) 31(1):396–438
Hartmanis J (1982) Computers and intractability: a guide to the theory of np-completeness (Michael R. Garey and David S. Johnson). SIAM Rev 24(1):90
Hartmann T, Kappes A, Wagner D (2016) Clustering evolving networks. In: Kliemann L (ed) Algorithm engineering. Springer, Berlin, pp 280–329
He J, Chen D (2015) A fast algorithm for community detection in temporal network. Physica A Stat Mech Appl 429:87–94
He D, Parker DS (2010) Topic dynamics: an alternative model of bursts in streams of topics. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452
Holme P, Saramäki J (2012) Temporal networks. Phys Rep 519(3):97–125
Hopcroft JE, Ullman JD (1983) Data structures and algorithms, vol 175. Addison-Wesley, Boston
Hulovatyy Y, Chen H, Milenković T (2015) Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics 31(12):i171–i180
Ihler A, Hutchins J, Smyth P (2006) Adaptive event detection with time-varying Poisson processes. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 207–216
Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (eds) Complexity of computer computations. Springer, Berlin, pp 85–103
Khot S, Regev O (2008) Vertex cover might be hard to approximate to within 2-\(\varepsilon \). J Comput Syst Sci 74(3):335–349
Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Discov 7(4):373–397
Kovanen L, Karsai M, Kaski K, Kertész J, Saramäki J (2013) Temporal motifs. In: Holme P, Saramaki J (eds) Temporal networks. Springer, Berlin, pp 119–133
Lahiri M, Berger-Wolf TY (2008) Mining periodic behavior in dynamic social networks. In: 2008 eighth IEEE international conference on data mining. IEEE, pp 373–382
Lappas T, Arai B, Platakis M, Kotsakos D, Gunopulos D (2009) On burstiness-aware search for document sequences. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 477–486
Leskovec J, Backstrom L, Kleinberg J (2009) Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp 497–506
Liu Y, Safavi T, Dighe A, Koutra D (2018) Graph summarization methods and applications: a survey. ACM Comput Surv (CSUR) 51(3):1–34
Masuda N, Holme P (2019) Detecting sequences of system states in temporal networks. Sci Rep 9(1):1–11
Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 1155–1158
McGregor A (2014) Graph stream algorithms: a survey. ACM SIGMOD Rec 43(1):9–20
Meladianos P, Nikolentzos G, Rousseau F, Stavrakas Y, Vazirgiannis M (2015) Degeneracy-based real-time sub-event detection in twitter stream. In: The international AAAI conference on web and social media (ICWSM), vol 15, pp 248–257
Michail O (2016) An introduction to temporal graphs: an algorithmic perspective. Internet Math 12(4):239–280
Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Paranjape A, Benson AR, Leskovec J (2017a) Motifs in temporal networks. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 601–610
Paranjape A, Benson AR, Leskovec J (2017b) Motifs in temporal networks. In: Proceedings of the tenth ACM international conference on web search and data mining, pp 601–610
Pietilänen AK, Diot C (2012) Dissemination in opportunistic social networks: the role of temporal communities. In: Proceedings of the thirteenth ACM international symposium on mobile ad hoc networking and computing, pp 165–174
Rossetti G, Cazabet R (2018) Community discovery in dynamic networks: a survey. ACM Comput Surv (CSUR) 51(2):1–37
Rozenshtein P, Gionis A (2019) Mining temporal networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3225–3226
Rozenshtein P, Tatti N, Gionis A (2017) The network-untangling problem: from interactions to activity timelines. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 701–716
Shahaf D, Guestrin C, Horvitz E (2012a) Metro maps of science. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1122–1130
Shahaf D, Guestrin C, Horvitz E (2012b) Trains of thought: generating information maps. In: Proceedings of the 21st international conference on world wide web, pp 899–908
Shahaf D, Yang J, Suen C, Jacobs J, Wang H, Leskovec J (2013) Information cartography: creating zoomable, large-scale maps of information. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1097–1105
Shah N, Koutra D, Zou T, Gallagher B, Faloutsos C (2015) Timecrunch: interpretable dynamic graph summarization. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1055–1064
Tatti N (2019) Strongly polynomial efficient approximation scheme for segmentation. Inf Process Lett 142:1–8
Vlachos M, Meek C, Vagena Z, Gunopulos D (2004) Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data, pp 131–142
Wackersreuther B, Wackersreuther P, Oswald A, Böhm C, Borgwardt KM (2010) Frequent subgraph discovery in dynamic networks. In: Proceedings of the eighth workshop on mining and learning with graphs, pp 155–162
Weng J, Lee BS (2011) Event detection in twitter. In: The international AAAI conference on web and social media (ICWSM)
Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 177–186
Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 336–345
Funding
This work was supported by three Academy of Finland Projects (286211, 313927, 317085), the EC H2020 RIA project “SoBigData++” (871042), and the Wallenberg AI, AutonomousSystems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflicting/competing interests.
Code availability
The implementation of all algorithms and sample scripts used for the experimental evaluation is available at https://github.com/polinapolina/the-network-untangling-problem.
Additional information
Responsible editor: Tina Eliassi-Rad, Johannes Fürnkranz.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proofs regarding computational complexity
Proof of Proposition 2
To prove the result we provide a reduction from VertexCover. Assume that we are given a graph H with n vertices and an integer k. We construct a temporal network G as follows: We place H at time stamp 0. We then add k fully-connected graphs \(C_1, \ldots , C_k\) with n vertices at time stamps \(1, \ldots , k\). Figure 14 shows an example.
We claim that G has k-interval cover with zero cost if and only if H can be covered with k vertices. This shows that solving whether there is a zero-cost solution for either \(k\)-MinTimeline \(_{+}\) or \(k\)-MinTimeline \(_\infty \) is NP-complete, which also automatically implies that these problems are inapproximable.
To prove the claim, first assume that H can be covered with k vertices, say \(w_1, \ldots , w_k\). Construct a k-activity timeline by first covering \(w_1, \ldots , w_k\) at time stamp 0 with zero-length intervals. Similarly, cover every vertex v at every time stamp \(t = 1, \ldots , k\) with a zero-length intervals, except if \(v = w_t\). Figure 14 shows an example. By definition H is covered, and \(n - 1\) vertices in each \(C_i\) are also covered. Consequently, the timeline covers G. Since each vertex uses exactly k intervals, we have proven the first direction.
To prove the other direction, assume that there is a zero-solution for G. Since each \(C_i\) is fully-connected, we must have at least \(n - 1\) vertices covered in each \(C_i\). This means that we have at most k spare intervals to cover H. The vertices that these intervals cover form a k-vertex cover, proving the claim. \(\square \)
Proof of Proposition 3
We will prove the hardness by reducing VertexCover to MinTimeline \(_{+}\). Assume that we are given a (static) network \(H = (W, A)\) with n vertices \(W = \{ w_1, \ldots , w_n \}\) and a budget \(\ell \). In the VertexCover problem we are asked to decide whether there exists a subset \(U \subseteq W\) of at most \(\ell \) vertices (\(|U|\le \ell \)) covering all edges in A.
We map an instance of VertexCover to an instance of MinTimeline \(_{+}\) by creating a temporal network \(G = (V, E)\), as follows. The vertices V consist of 2n vertices: for each \(w_i \in W\), we add vertices \(v_i\) and \(u_i\). The edges are as follows: For each edge \((w_i, w_j) \in A\), we add a temporal edge \((v_i, v_j, 0)\) to E. For each vertex \(w_i \in W\), we add two temporal edges \((v_i, u_i, 1)\) and \((v_i, u_i, n + 2)\) to E. Figure 15 shows an example.
Let \(\mathcal {T}^*\) be an optimal timeline covering G. We claim that \( S \mathopen {}\left( \mathcal {T}^*\right) \le \ell \) if and only if there is a vertex cover of H with \(\ell \) vertices.
To prove the if direction, consider a vertex cover of H, say U, with \(\ell \) vertices. Consider the following coverage: cover each \(u_i\) at \(n + 2\), and each \(v_i\) at 1. For each \(w_i \in U\), cover \(v_i\) at 0. Figure 15 shows an example. Now every vertex \(u_i\) has a 0-length activity interval \([n+2, n+2]\). Vertices \(v_i\), which correspond to \(w_i \not \in U\), also have 0-length activity intervals [1, 1]. Only vertices \(v_i\), which correspond to \(w_i \in U\), have 1-length activity intervals [0, 1]. All vertices in V have activity intervals of total length \(\ell \). Every interaction in G belongs to some activity interval: by construction, interactions at \(t=n+2\) are spanned by activity intervals of \(\{u_i\}\), interactions at \(t=1\) are spanned by activity intervals of \(\{v_i\}\), and interactions at \(t=0\) are spanned by activity intervals \(\ell \) of vertices from \(\{v_i\}\). The resulting intervals are indeed forming a timeline with a total span of \(\ell \).
To prove the other direction, first note that if we cover each \(v_i\) by an interval [0, 1] and each \(u_i\) by an interval \([n + 2, n + 2]\), then this yields a timeline \(\mathcal {T}\) covering G. The total span of intervals in \(\mathcal {T}\) is n. Thus, \( S \mathopen {}\left( \mathcal {T}^*\right) \le S \mathopen {}\left( \mathcal {T}\right) = n\). This guarantees that if \(0 \in I_{v_i}\), then \(n + 2 \notin I_{v_i}\), so \(n + 2 \in I_{u_i}\). Otherwise \(\mathcal {T}^*\) contains an \((n+2)\)-length interval, this contradicts to \( S \mathopen {}\left( \mathcal {T}^*\right) \le n\). By the same argument, \(1 \notin I_{u_i}\) and so \(1 \in I_{v_i}\). In summary, if \(0 \in I_{v_i}\), then \( \sigma \mathopen {}\left( I_{v_i}\right) = 1\). This implies that if \( S \mathopen {}\left( \mathcal {T}^*\right) \le \ell \), then we have at most \(\ell \) active vertices at 0. Let U be the set of those vertices. Since \(\mathcal {T}^*\) is a timeline covering G, then U is a vertex cover for H. \(\square \)
Proofs regarding Maximal
Proof of Proposition 4
We first show that a maximal dual solution is a feasible timeline. Let \(e_i = (u, v, t)\) be a temporal edge. If \( p \mathopen {}\left( i; v\right) \cap X_v = \emptyset \) and \( p \mathopen {}\left( i; u\right) \cap X_u = \emptyset \), then we can increase the value of \(\alpha _i\) without violating the dual constraints, so the solution is not maximal. Thus \(t \in I_{v} \cup I_{u}\), making \(\mathcal {T}\) a feasible timeline.
Next we show that the resulting solution \(\mathcal {T}\) is a 2-approximation to Coalesce. Write \(x_v = \min (X_v)\) and \(y_v = \max (X_v)\). Let \(\mathcal {T}^*\) be the optimal solution. Then
where the second equality follows from the definition of \(X_v\), the first inequality follows from the fact that \(\alpha _e \ge 0\), and the last inequality follows from primal-dual theory.\(\square \)
Proof of Lemma 1
We will prove this result by showing that \(\alpha _j \le z(v)\) if and only if all dual constraints related to v are valid. Since the same holds also for u the lemma follows. We consider two cases.
First case: \( t \mathopen {}\left( j\right) < m_v\). In this case we have
before increasing \(\alpha _j\). This guarantees that if \(\alpha _j \le z(v)\), then \( h \mathopen {}\left( v; i\right) \le | t \mathopen {}\left( i\right) - m_v|\), for every \(i \le j\). Moreover, when \(\alpha _j = z(v)\) one of these constraints becomes tight. Since these are the only constraints containing \(\alpha _j\), we have proven the first case.
Second case: \( t \mathopen {}\left( j\right) \ge m_v\). If \(i < j\), the sum \( h \mathopen {}\left( v; i\right) \) does not contain \(\alpha _j\), so the corresponding constraint remains valid. If \(i \ge j\), then the corresponding constraint is valid if and only if \( h \mathopen {}\left( v; j\right) \le | t \mathopen {}\left( j\right) - m_v|\). This is because \(\alpha _\ell = 0\) for all \(\ell > j\). But \(z(v) = t - m_v - a[v]\) corresponds exactly to the amount we can increase \(\alpha _j\) so that \( h \mathopen {}\left( v; j\right) = |t - m_v|\).\(\square \)
Proofs regarding k-Maximal
Proof of Proposition 5
Let us first prove the feasibility of \(\mathcal {T}\), that is, show that every edge is covered. Let \(e_i = (u, v, t)\) be an edge, and let \(\ell \) be an index such that \(i \in S_{v\ell }\).
At least, one of the four cases of left-maximality must hold. Assume Case (i), that is, \( rh \mathopen {}\left( v; j\right) = \eta _{vj}\) for some \(j \in lp \mathopen {}\left( i; v\right) \). Then \( t \mathopen {}\left( j\right) \in X_{v\ell }\) and \(x_{v\ell } \le t \mathopen {}\left( i\right) \le m_{v\ell }\), so \(e_i\) is covered. Case (ii) is similar.
Assume that Cases (i) and (ii) do not hold. Then Case (iii) or Case (iv) holds. Assume Case (iii). If \(x_{v\ell } \le t \mathopen {}\left( i\right) \), then \(e_i\) is covered. Assume that \( t \mathopen {}\left( i\right) < x_{v\ell }\). Then, by definition, \( t \mathopen {}\left( i\right) \in Y_{v\ell }\). Thus \(m_{v(\ell - 1)} < t \mathopen {}\left( i\right) \le y_{v\ell }\), so \(e_i\) is covered. Case (iv) is similar.
We have shown that \(\mathcal {T}\) is feasible. Next we show that the resulting solution \(\mathcal {T}\) is a 2-approximation to k-Coalesce. Let \(\mathcal {T}^*\) be the optimal solution. Then
where the second equality follows from the definition of \(X_{vi}\) and \(Y_{vi}\), the first inequality follows from the fact that \(\alpha _e \ge 0\) and the intervals in \(\mathcal {T}\) do not intersect, and the last inequality follows from primal-dual theory.
\(\square \)
We will prove Proposition 6 in a sequence of lemmas.
Let us write \(a_i[v, \ell ]\) and \(b_i[v, \ell ]\) to be the values of these counters at the beginning of the ith iteration. We maintain the following invariants. The first counter, \(a_{i + 1}[v, \ell ]\) matches \( lh \mathopen {}\left( v; i\right) \). The second counter, \(b_i[v, \ell ]\) is how much we can afford to increase \(\alpha _i\) without violating \( rh \mathopen {}\left( v; s\right) \le \eta _{vs}\), where \(s \le i\). This is formalized in the next lemma.
Lemma 2
Let v be a vertex and \(\ell = 1, \ldots , k + 1\) be an integer. Shorten \(S = S_{v\ell }\) and write \(f(j, i) = \sum _{o \in S, j \le o \le i} \alpha _o\).
Then for any \(i \in S\),
and
Proof
We prove this claim by induction. Let i be the first index in S. Then \(a_i[v, \ell ] = 0\) and \(a_{i + 1}[v, \ell ] = \alpha _i\), and \(b_i[v, \ell ] = \infty \) and \(b_{i + 1}[v, \ell ] = \eta _{vi} - \alpha _{vi}\).
Assume that i is not the first index in S, and let \(j \in S\) be the previous index. Since \(a_i[v, \ell ] = a_{j + 1}[v, \ell ]\), we have
Also, \(b_i[v, \ell ] = b_{j + 1}[v, \ell ]\), and
proving the lemma.\(\square \)
Our next step is to prove the feasibility of the output of k-Maximal. In order to do that, we first bound the counters.
Lemma 3
For each vertex v, index \(\ell = 1, \ldots , k\), and \(i \in S_{v\ell }\),
and
Proof
Since, \(\alpha _i \le \theta _{vi} - a_i[v, \ell ]\) we have
Also since, \(\alpha _i \le \min (b_i[v, \ell ], \eta _{vi})\) we have
This proves the claim.\(\square \)
To prove the feasibility, we first show that \(\alpha _i \ge 0\).
Lemma 4
\(\alpha _i \ge 0\), for all i.
Proof
Let \((u, v, t) = e_i\), and let \(\ell \) such that \(i \in S_{v\ell }\). Let \(z_1\), and \(z_2\) be as defined by the algorithm in the ith round.
If i is the first index in \(S_{v\ell }\), then \(a_i[v, \ell ] = 0\) and \(b_i[v, \ell ] = \infty \). Then \(z_1 \ge 0\), and similarly \(z_2 \ge 0\). Consequently, \(\alpha _i \ge 0\).
Assume that i is not the first index in \(S_{v\ell }\), and let j be the previous edge index. Then Eq. 4 implies
In addition, Eq. 5 implies \(b_i[v, \ell ] \ge 0\), so \(z_1 \ge 0\). Similarly, \(z_2 \ge 0\). Consequently, \(\alpha _i \ge 0\).\(\square \)
Next lemma shows that \(\left\{ \alpha _i\right\} \) satisfies the constraints, making the dual solution feasible.
Lemma 5
Let \(v \in V\), \(\ell = 1, \ldots , k + 1\), and \(i \in S_{v\ell }\). Then \( rh \mathopen {}\left( v; i\right) \le \eta _{vi}\) and \( lh \mathopen {}\left( v; i\right) \le \theta _{vi}\).
Proof
Moreover, \( lh \mathopen {}\left( v; i\right) \) remains constant in the later rounds.
Let j be the last index in \(S_{v\ell }\). Then Eqs. 3 and 5 state that
Moreover, the sum \( rh \mathopen {}\left( v; i\right) \) remains constant in the later rounds. Thus, \(\left\{ \alpha _i\right\} \) is a feasible dual solution.\(\square \)
Proof
Let \(e_i = (u, v, t)\) be an edge, and let \(\ell \) and r be the indices such that \(i \in S_{v\ell }\) and \(i \in S_{ur}\). We need to show that \(e_i\) is left-maximal.
If \(b_{i + 1}[v, \ell ] = 0\), then Eq. 3 states that there is \(j \in X_{v\ell }\) with \(j \le i\) such that \(\eta _{vj} = rh \mathopen {}\left( v; j\right) \). This is the definition of Case (i) of left-maximality. Similarly, \(b_{i + 1}[u, r] = 0\) leads to Case (ii).
Assume \(b_{i + 1}[v, \ell ] > 0\) and \(b_{i + 1}[u, r] > 0\). This can only happen if
Consequently, \(\alpha _i = \theta _{vi} - a_i[v, \ell ]\) or \(\alpha _i = \theta _{ui} - a_i[u, r]\). If the former, then Eq. 2 states \( lh \mathopen {}\left( v; i\right) = a_{i + 1}[v, \ell ] = \theta _{vi}\), leading to Case (iii). Similarly, the latter case leads to Case (iv). Thus, \(e_i\) is left-maximal.\(\square \)
Rights and permissions
About this article
Cite this article
Rozenshtein, P., Tatti, N. & Gionis, A. The network-untangling problem: from interactions to activity timelines. Data Min Knowl Disc 35, 213–247 (2021). https://doi.org/10.1007/s10618-020-00717-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-020-00717-5