Performance and scalability

With this article and some following I’ll focus on the application design and explain technical points I think relevant to understand our approach.

Today’s subject is performance and scalability in the visualization. Although other modules need high-quality performances, the visualization of thousands nodes and edges remains the major challenge. For a visualization-centered software like Gephi it is a key feature we attach great important.

What you can find in other network visualization software is either a poor visualization module or a stunning aspect but not efficient. For instance Pajek has a very efficient core and you can achieve a lot with it but problems starts when you want to visualize your network. With GUESS you are able to produce nice maps but the render engine starts suffering seriously over 2000 nodes. Gephi tries to combine an efficient render engine with looking good results.

In 2007 when we started designing the current version of Gephi we had in mind we want to create a new generation of network visualization software and hence we made some choices I will try to explain here.

Use multi-core
Already in 2007 and even more now multi-core processors impose new rules in software development. It brings appealing features but also some risks. However technology starts to be mature in this, all current Top 10 video games has been thought multi-thread from the beginning. Multi-core brings performance but does it bring scalability as well? I would say YES for Gephi because no matter how many processor you have, what can be parallelized will be parallelized. Graphics card are not able to parallelize yet but we count this would be the case in the future.

Use GPU
You may notice we got some inspiration from video games development. When using the graphic card features, Gephi’s render engine let the processor free for other computing and allows using GPU acceleration to speed up rendering. Apart allowing 3D graphs, many drawings are speed up by the GPU in Gephi. I would say the only problem is compatibility, due to the high number of different graphic cards on the market.

Architecture
The visualization package architecture is a compromise between flexibility and performances. In 3D engine design it is quite impossible to have both in the same time. Hence our engine has flexibility where it doesn’t harm efficiency.

These choices allow good performance for visualizing, and I would say it is only the beginning. Currently, up to 50,000 nodes can be visualized and even more but this depends on edges number and how your graph is spatialized. Indeed we use techniques to avoid computing of parts of the graphs out of the screen:

Octree cubes partition Octree cubes on a 3D graph

The graph is cut in fixed volumes in a structure called Octree. It is easy for the render engine to determine which cubes are hidden and which are visible. Only 3D objects in visible cubes are computed. As a consequence performances don’t depend on how much nodes you have in your network but how many you are currently visualizing. So even with huge graphs, zooming in and exploring parts of it remains fast.

Besides the current 3D engine, which is intended to work on all configurations a new one will be developed in 2009. Using the last features of graphic card, networks size limit around 200,000 nodes may be reached.

Leave a comment