Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Graphs are ubiquitous in many different domains such as information technology, social analysis or biology. Graphs are routinely visualized, but their large size is often a barrier. The difficulty comes not from the layout which can be calculated very fast. (For example, by using Brandes and Pich’s algorithm [9] a graph with several thousand nodes and links can be laid out in a few seconds on a regular personal computer.) Rather, viewing and browsing these large graphs is problematic. Firstly, rendering thousands of graphical elements on a computer might take a considerable time and may result in a cluttered image if the graph is dense. Secondly, navigating thousands of elements rendered naively disorients the user.

Fig. 1.
figure 1

A graph (https://github.com/ekoontz/graphviz/blob/master/rtest/graphs/b100.dot.) with 1436 nodes and 5806 edges. (a) The full view with a standard method which draws all nodes and edges regardless of the zoom. (b) The full view rendered by GraphMaps. (c) A view with the zoom close to 9.13 with the standard viewing. (d) A view with zoom 9.26 with GraphMaps.

Our intention is to provide a graph browsing experience similar to that of online geographic maps, for example, Bing or Google Maps. We propose a set of requirements for such a visualization and introduce a method, GraphMaps, fulfilling these requirements. GraphMaps renders a graph as an interactive map by displaying only the most essential elements for the current view. We allow fast interactions using standard pan and zoom operations. The drawing is visually stable, in the sense that during these operations, nodes do not change their relative positions, and edges do not change their geometry. To the best of our knowledge, GraphMaps is the first method having these properties. Figure 1 illustrates the method.

Related Work. The problem of visualizing large graphs has been extensively addressed in the literature, but here we discuss only the approaches most relevant to ours. Most research efforts have concentrated on reducing the number of visual elements to make node-link diagrams readable. We mention three different approaches.

Aggregation techniques group vertices and edges of the graph together to obtain a smaller graph [10]. Most techniques compute a hierarchical partitioning and offer interaction to explore different branches of the tree. Early work by Eades and Feng [11] proposes 3-dimension visualization to navigate in this tree. Abello et al. [2] use treemaps and fisheye view to show a combination of the hierarchy levels. Later research [1] demonstrated that such hierarchy-based techniques can scale up to very large graphs (16 million edges and 200, 000 nodes). Similar approaches attempt to give more clues about the content of the aggregates. Balzer and Deussen [6] represent aggregates by 3-dimensional shapes, whose sizes convey the number of vertices, with bundled edges whose thickness indicates the density of the connection. Zinmaier et al. [26] utilize the GPU to create an aggregated image of a large graph, using heatmaps to convey the number of vertices and edges in the aggregates. Van den Elzen and van Wijk [12] recently introduced a system for interactive exploration of large graphs via manual or automatic selections and aggregations.

While these techniques can scale up to very large graphs, they have several disadvantages. Aggregating nodes involves a loss of information concerning intra- and inter-connectivity. Spatial stability is another issue. The drawing may change dramatically when several entities collapse into one, potentially disorienting the user.

Multiscale techniques allow users to explore the partition hierarchy at different depths. These techniques aim at disambiguating the topology induced by aggregating vertices and edges together. Auber et al. [5] propose a clustered multiscale technique, for which the interiors of the aggregates are shown at a finer scale. However, aggregated edges are shown between clusters, risking misinterpretation. Henry et al. [18] propose a hybrid technique that can only represent one level of clustering. In a similar spirit van Ham and van Wijk [15] propose an aggregate method in which users can expand one aggregate at a time. Henry et al. [17] attempt to indicate inter-aggregate connectivity by duplicating elements, but their solution only works for a single level of clustering.

A different technique by Koren et al. [13] aims at smoothly integrating the level of detail, as opposed to discrete partitioning of the graph. The authors build a hierarchy of graphs and, for each viewpoint, construct a smaller graph by “borrowing” parts of the corresponding hierarchy levels and adjusting the layout of this smaller graph. The strength of this technique is that it avoids potentially misleading partitioning of the graph. However, there is a lack of stability: a small change in viewport may lead to a large change in the viewed graph. The fisheye technique of the paper may also add a spatial distortion, further disrupting the user’s mental map.

Filtering techniques approach the visualization of large graphs by filtering the elements rendered in the view. For example, SocialAction [22] provides a set of measures to rank vertices and edges, rendering node-link diagrams with manageable sizes. A related technique by Perer and van Ham [16] proposes to build a filtered node-link diagram based on the queries made by the user, via the concept of degree-of-interest. The principal disadvantage of these techniques is the lack of overview of the entire graph. The progressive rendering approach proposed by Auber et al. [3, 4] renders the node-link diagram entities in order of their importance. The rendering stops when the view changes. Given enough non-interaction time, all entities intersecting the viewport are rendered. In contrast to the previous filtering techniques, the benefit of this approach is to reveal the key features of the graph first. However, the user does not directly control the level of detail, which potentially disrupts the experience.

GMaps by Gansner et al. [14] also uses the map metaphor to draw graphs. Its main focus is on representing clusters of vertices as countries with map-like borders and coloring. The entire graph is drawn on top of the map as a node-link diagram with straight lines. When zooming in, labels of less important nodes appear gradually.

Design Rationale Motivated by Online Maps. Exploring online geographic maps is probably the most common scenario for browsing large graphs. Millions of people every day browse maps on their cellular phones or computers for finding a location or driving directions. We decided to search for key ideas used in interactive geographic maps that could be applied to browsing general graphs. One insight is that showing everything at all times is counterproductive. In a digital map on the top level we only see major cities and major roads connecting them. Objects on finer levels of detail, like smaller roads, are not shown explicitly. They may be hinted by using, for example, pre-rendered bitmap tiles. When we zoom in, other, less significant features appear and become labeled. Online maps can answer search queries such as finding a route from source to destination or showing a point of interest close to the mouse position.

Design goals identified are as follows:

  1. 1.

    The method should be able to reveal most details of the graph by using only the zoom in, zoom out, and pan operations. As we zoom in, more vertices and edges should appear according to their importance. Interactions such as node or edge highlighting or search by label should help discover further details.

  2. 2.

    During these operations, the user’s mental map must be preserved. In particular, vertex positions and edge trajectories should not change between zoom levels.

  3. 3.

    In order to limit visual clutter, the number of rendered visual elements at each view should not exceed some predefined bound.

2 Method Description

The input to the algorithm is a graph with given node positions; the edge routes are not part of the input. The output is a set of layers containing nodes and edge routes. Let \(G=(V,E)\) be the input graph, where V is the set of nodes and E the set of edges. The input also includes an ordering of V. This ordering should reflect the relative importance of the vertices. If such an ordering is not provided then we can sort the nodes, for example, by using PageRank [21], by node degree, or by shortest-path betweenness [8]. Finding a good order reflecting the node importance is a separate problem which is outside the scope of this research. Here we look at the node order as input and consider \(V =[v_1,\dots ,v_N]\) to be an array. Before giving a detailed description of the algorithm we describe its high level steps.

We build the layer 0, denoted by \(L_0\), as follows. For some number \(k_0>0\) we assign nodes \(v_1,\dots ,v_{k_0}\) to \(L_0\) and route all edges \((v_m,v_n) \in E\) with \(m,n \le k_0\). Suppose we have already built \(L_{i-1}\) containing vertices \(v_j\), for \(j \le k_{i-1}\). Then, if \(k_{i-1} < N\), that is we have vertices that are not assigned to a layer yet, for a number \(k_i \ge k_{i-1}\) we assign nodes \(v_1,\dots ,v_{k_i}\) to \(L_i\) and route all edges \((v_m,v_n) \in E\) with \(m,n \le k_i\). Otherwise we are done. Note that a node can be assigned to several consecutive layers. To achieve the assignment we define a function z from V to the set \(\{2^0,2^1,2^2,\dots \}\). The value z(v) we call the zoom level of the node. For \(n \in \mathbb N_0\), the layer \(L_n\) contains node v if and only if \(z(v) \le 2^{n}\). For each layer an edge is represented by a set of straight line segments called rails. We define function z on rails too, but the layer assignment rule is different for rails; a rail r belongs to \(L_n\) iff z(r) is equal to \(2^n\).

Fig. 2.
figure 2

Graph abstract.dot, \(Q_N=80\), \(Q_R=180\). (a) The mesh containing node boundaries of layer 0 (thick). (b) Nodes of layer 0 (green) and rails of edges routed using the mesh in (a) (black). Adding node 20 would exceed the node quota. (c) The mesh containing rails and node boundaries of layer 0 and boundaries of candidate nodes for layer 1 (thick). (d) Node 37 has inserted nodes 6 and 28 as neighbors. The edges incident to Node 37 are routed through red rails, which are new maximal rails. After adding the red rails the upper left tile would intersect more than \(Q_R/4=45\) maximal rails. (e) Mesh containing rails and node boundaries of layer 1 and candidates for layer 2. (f) All nodes and edges are added to layer 2 without exceeding the quotas (Color figure online).

figure a

Calculation of Layers. Algorithm 1 computes the function z on the nodes and extends it to the rails. The flow of the algorithm is illustrated in Fig. 2.

Let B be the bounding box of G with width w and height h. For i,j,\(n \in \mathbb N_0\) we define \(T_{ij}^n\) as the rectangle with width \(w_n = w/2^n\), height \(h_n = h/2^n\) and the bottom left corner with coordinates \(x=u+i \cdot w_n\) and \(y=v+j \cdot h_n\), where (uv) is the left bottom corner of B. We call \(T_{ij}^n\) a tile. The algorithm is driven by positive integers \(Q_N\) and \(Q_R\), which we call node and rail quota, respectively. We say that tile \(T_{ij}^n\) exceeds node quota \(Q_N\) if it intersects more than \(Q_N/4\) nodes of layer n.

To work with the rail quota \(Q_R\) we need the following definition. For a set of rails R and a rail \(r \in R\) we call r maximal in R if r is not a sub-segment of any other rail in R. During the algorithm we maintain the set of maximal rails among the set of rails already assigned to layers and count intersections between the tiles and the maximal rails only. The union of all maximal rails will always form the same set of points as the union of all rails created so far. Tile \(T_{ij}^n\) exceeds rail quota if it intersects more than \(Q_R/4\) rails which are maximal among all rails of layer n and below. Assume both \(Q_N\) and \(Q_R\) are divisible by 4.

The outer loop of Algorithm 1 in line 3 works as follows. Starting with \(n=0\), each call to ProcessLayer in line 4 tries to greedily assign the nodes to the current layer. Each such attempt starts with the first unassigned node in V. Procedure ProcessLayer terminates if adding the next node in V and its edges incident to already assigned nodes would exceed the node or rail quota of some tile.

After calling ProcessLayer tile dimensions and the node size become twice smaller, and a new attempt starts for \(n+1\) in line 4. The algorithm stops when all nodes are assigned to a layer. Figure 2 illustrates Algorithm 1 for graph abstract.dot Footnote 1 with 47 nodes, labeled from 0 to 46 according to their order in V.

In line 6 tileMap is a map from \(\mathbb {N}_0^2\) to \(\mathbb {N}_0^2\). If for some n we have tileMap (ij) \(=(r,k)\), then r nodes in \(L_n\) and k maximal rails intersect \(T_{ij}^n\).

Consider ProcessLayer for \(n=0\). For this case the domain of tileMap is \(\{(0,0)\}\). The sets assignedNodes and maximalRails are empty, and there is only one tile \(T_{0,0}^0\), which has the size of B (blue in Figs. 2a and 2b). After executing line 7 the set candidateNodes contains the first \(Q_N/4\) nodes of V (green in Fig. 2b). The boundaries of these nodes are represented by regular polygons (thick in Fig. 2a) and used to generate a triangular mesh M. The mesh is a constrained triangulation in a sense that any straight line segment of the input can still be traced in M although it can be split into several segments. The edges with both endpoints in candidateNodes are routed on M.

In the \(i+1\)-th iteration of the loop in line 12 the algorithm tries to add node i, while nodes \(0, \dots , i-1\) have already been added to \(L_0\), and tileMap (0,0)=(ik), where \(k \le Q_R/4\) is the number of rails used by edges routed so far. All these rails are maximal rails by construction.

In line 13 the routes of edges from node i to nodes \(0, \dots , i-1\) are computed as shortest paths on M, and the set rails(v) is the set of all rails of these routes. In line 14 we find maximalRailsOfV, the rails from rails(v) which are maximal with respect to the set maximalRails \(\cup \) rails(v). In the case of \(n=0\) they are all the rails of rails \((v)\setminus \) maximalRails. For \(n \ge 1\), these are the rails from rails(v) covered by no rail from maximalRails. In Fig. 2d, such maximal rails for node 37 are drawn red.

If \(T_{0,0}^0\) still contains no more than \(Q_R/4\) rails after adding maximalRailsOfV, then node i is added to \(L_0\). Otherwise, ProcessLayer terminates. In Fig. 2b, all \(Q_N/4=20\) candidate nodes and the rails on the corresponding edges could be added to \(L_0\).

The procedure works similarly for \(n \ge 1\). One notable difference is that rails from \(L_{n-1}\) are passed as input to the mesh generator in addition to the boundaries of the appropriate nodes in line 9. For more details we refer to the in the full version [19]. Figures 2c, ..., 2f show ProcessLayer for \(n=1,2\).

Using the Layers During the Visualization. Let H be a rectangle. We denote by w(H) the width of H and by h(p) the height of H. Recall that B is the bounding box of G. Then the zoom level of H to B is the value \(l(H)=\min \{ \frac{w(B)}{w(H)}, \frac{h(B)}{h(H)}\}\).

Let  K be the transformation matrix from the graph to the user window W. Then the rectangle \(P=K^{-1}(W)\), where \(K^{-1}\) is the inverse of K, is the current viewport.

To decide which elements of G are displayed to the user, we find the zoom level \(Z = l(P)\) and set the layer index \(n = \max (0, { \lfloor \log _2 Z \rfloor })\). Finally, the elements displayed to the user are all the nodes and rails of layer \(L_n\) intersecting P. We show in the full version [19] that, by following this strategy, we render at most \(Q_N\) nodes, and the rendered rails can be exactly covered by at most \(Q_R\) maximal rails.

Edge Routing and Overlap Removal. Consider the nodes of \(L_0\). To construct a graph on which the edges are routed, we first create a regular polygon for each vertex. Then, we generate a triangular mesh using the Triangle mesh generator by Shewchuk [23]. By inserting additional vertices Triangle creates meshes with a lower-bounded minimum angle, which implies the upper-bounded vertex degree. Each edge between a pair of \(L_0\) nodes is then assigned the corresponding Euclidean shortest path in the mesh, which is computed using the \(A^*\) algorithm. Mesh segments lying on such paths become rails of \(L_0\) and the remaining mesh segments are discarded.

We now proceed with edge routing for \(L_n\) for \(n \ge 1\). Consider Procedure ProcessLayer. Since the initial node placement did not take edge trajectories into account, at the beginning of the procedure some unassigned nodes might overlap the entities of \(L_{n-1}\). We move these nodes away from their initial positions to resolve these overlaps, but this might create overlaps with the nodes that are not assigned to a layer yet.

The overlap removal process happens before line 7. We follow the metro map labeling method of Wu et al. [25]. All line segments and bounding boxes of fixed nodes are drawn on a monochromatic bitmap and the image is dilated by the diameter of a node on \(L_n\). To define a position for a candidate node v at which it does not overlap already placed nodes or rails, we find a free pixel p in the image, ideally close to the initial location of v. We draw a dilated v at p and proceed with the next candidate, etc.

To generate a graph for edge routing on \(L_n\), we use the bounding polygons of nodes from candidateNodes and the nodes of \(L_{n-1}\), and the rails of \(L_{n-1}\), as the input segments for Triangle. Already routed edges maintain their trajectories, while edges incident to a node not belonging to \(L_{n-1}\) are routed over the triangulation created by Triangle in line 13. To create the bundling effect by reusing existing rails, we slightly reduce their weights during routing.

Pre-rendered Tiles. To help users gain spatial orientation, we hint the nodes which are not yet visible at the current zoom level, but will appear if we zoom in further. We create and store on the disk the images of some graph nodes and use them as the background. The images are generated very fast and are loaded and unloaded dynamically by a background thread to keep the visualization responsive. See [19] for more details.

Interaction. We define several interactions in addition to the zoom and pan. Clicking on a node (even if it is hinted, but not visible yet) highlights all edges incident to it and unhides all adjacent nodes. The highlighted elements are always shown regardless of the zoom level. Clicking on a rail highlights the most important edge passing through it, and unhides the edge endpoints. Additionally, nodes can be searched by substrings of their labels.

Fig. 3.
figure 3

Caltech graph on four different zoom levels visualized using our approach.

3 Experiments

Visualizing and Evaluating Clusterings. The test graph of our first experiment is the Caltech graph used by Nocaj et al. [20]. It is the graph of Facebook friendships at California Institute of Technology from September 2005 and contains 769 nodes and 16k edges [24]. The nodes are labeled by class year and residence, or house, of the corresponding student, and colored by the house. Label 0 denotes missing data. A computer science researcher, with focus on clustering algorithms, used GraphMaps to browse this graph. He was interested in discovering the connectivity structure of the graph, e.g., which houses or years have strong ties. The researcher’s tool of choice for visualizing clusterings was Gephi [7]. The node layout was computed by a force-directed layout algorithm applied to the Simmelian backbone of the graph, as proposed by Nocaj et al. [20]. The result is shown in Fig. 4a. The same initial layout was used as input for GraphMaps. The resulting drawing on four different zoom levels is shown in Fig. 3.

Fig. 4.
figure 4

(a) Caltech graph visualized using Gephi. (b), (c): showing neighbors of node “165 2007” (leftmost orange node surrounded by green nodes) using Gephi and our tool (Color figure online).

The user noted that the view in Fig. 4a was too dense and gave no insight into the graph connectivity. On the other hand, he found our result in Fig. 3 less cluttered. The user mentioned that when looking at the drawing created by GraphMaps, for some pairs of nodes one may think that an edge between them exists, when, in fact, it does not. However, by using additional interactions besides zoom and pan, e.g., edge highlighting, the connectivity can be understood.

One interaction mode that the user tested for both tools was selecting all neighbors of a node. In Gephi, when hovering the mouse over a node, all non-incident edges and non-adjacent nodes are grayed out; see Fig. 4b. In our method, when clicking on a node, routes of all its incident edges are highlighted and, additionally, all adjacent nodes are shown, regardless of the zoom level; see Fig. 4c. According to the user, both methods provided satisfactory results. He noted that GraphMaps, by using edge bundling, provides a tidier picture than Gephi.

For dense graphs, like Caltech, the user would prefer to view the neighbors of a node in GraphMaps. The user commented that, contrary to Gephi, GraphMaps exposes the most important nodes and their labels in a readable fashion.

Experiments with Other Graphs. In the video at http://1drv.ms/1IsBEVh we demonstrate browsing the graph “composers”Footnote 2 with GraphMaps. The nodes of the graph represent the articles on Wikipedia on composers, and the edges represent Internet links between the articles. We show the user interactions that help us to explore the graph.

When browsing the graph of InfoVis coauthors, created from ACM data, another user was able to notice two groups of coauthors, one connected to Peter Eades, and another one to Ulrik Brandes and Michael Goodrich. By selecting all direct neighbors of Peter Eades, the user was able to see that only one member of the second group, Roberto Tamassia, has a paper with Peter Eades; see the corresponding figure in the full version [19]. Further analysis showed that, according to the data set, Roberto Tamassia, is the only author with coauthors from both groups. GraphMaps enabled the user to gain insights on the graph structure.

Running Time. GraphMaps processes a graph with 1463 nodes and 5806 edges for 1 min, a graph with 3405 nodes and 13832 edges for 130 seconds, and a graph with 38395 nodes and 85763 for less than 6 hours. The experiments were done on an HP-Z820 with Intel Xeon CPU E5-2690 under Windows 8.1. The required memory was 16 GB. The current bottleneck in performance is the edge routing. We hope to speed up the edge routing by using parallel processing.

The Sources of GraphMaps. GraphMaps is implemented in MSAGL, which is available as Open Source at github.com/Microsoft/automatic-graph-layout.

4 Discussion

The users of GraphMaps appreciate its aesthetics and the similarity to browsing online maps. GraphMaps helps in gaining the first impression of the graph structure and, in spite of the fact that precise knowledge of the connectivity cannot be obtained with GraphMaps by zooming and panning alone, additional interactions allow answering the queries as, for example, finding if two nodes are direct neighbors. A current shortcoming of GraphMaps is that the direction of the edges is lost. It happens for other methods as well, when edges are bundled. Solving this issue is a possible future work item. The labeling algorithm needs improvement, since it does not always respect the node ranking and does not always utilize free space well enough.

Future Work. Currently we cannot guarantee that our layer generation algorithm always reaches the end, although, in all our experiments it did. Creating a version of the algorithm which provably stops, or, even better, guarantees that the number of generated layers is within predefined bounds, is a very interesting problem.

Another problem is finding a node placement that works nicely with the node ranking to improve the distribution of nodes among levels. Ideally, such a layout algorithm is aware of the edge routing too, and avoids the overlap removal step.