"Everything looks like a graph, but almost nothing should ever be drawn as one."

seb

I get scratched with this statement made by Ben Fry in the book ‘Visualizing Data‘ (2008). Although I have a great respect for Ben Fry’s work and his position may have evolve since then, I want to moderate this statement so that data explorers like danbri can make their own opinion.

Ben Fry in ‘Visualizing Data‘:

Graphs can be a powerful way to represent relationships between data, but they are also a very abstract concept, which means that they run the danger of meaning something only to the creator of the graph. Often, simply showing the structure of the data says very little about what it actually means, even though it’s a perfectly accurate means of representing the data. Everything looks like a graph, but almost nothing should ever be drawn as one.

There is a tendency when using graphs to become smitten with one’s own data. Even though a graph of a few hundred nodes quickly becomes unreadable, it is often satisfying for the creator because the resulting figure is elegant and complex and may be subjectively beautiful, and the notion that the creator’s data is “complex” fits just fine with the creator’s own interpretation of it. Graphs have a tendency of making a data set look sophisticated and important, without having solved the problem of enlightening the viewer.

I totally disagree. Look at this simple plot:

pareto_convergence_r050a01

Can anyone tell me how, simply showing this plot, one is enlightened if I don’t tell how it was done, and what is interesting to look at? It however appears very simple: only one curve, something that you are used to see since the time you discovered this kind of drawing in primary school. And even if I give some insights on how I made it and the context of the work, I’m still, as the creator, the only one able to deeply understand the information that can be extracted because I know the process that built the underlying data. To criticize my conclusions, you will need to learn as much as I did and you will need to get the same data and apply the same manipulations. Depending on the curation, reformatting, filtering or whatever the algorithms you used to capture, extract and use some data, each action has an impact on the meaning carried on by the data. Graph visualization is no exception, and is like any plot except that you can’t hide the structural complexity without explicit filtering.

Let’s enumerate all the dimensions used in a graph visualization: x+y coordinates, size of nodes, color of nodes, thickness of edges. Well, it is not easy to read on 5 dimensions. But is the “simple” plot a better deal? You have x+y coordinates, so 2 dimensions only (we might also have used colors and dot sizes as well, and get 4 dimensions). So you might think that you and your readers can interpret it easily and reliably. You are all wrong because of the hidden dimension: scaling.

Here you see a plot in a log-lin scale, that mean the y-axis is in a logarithm scale, while the x-axis is in a normal scale. I found this visual pattern interesting on these data because of my research question, because I understand the meaning on the process that made them, and because I found it in this particular scale. Plotted in lin-lin scale, I can find less information. Or maybe should I use a cumulative function to plot my data? Maybe an inverse cumulative? Etc. An exploration of both data and projection techniques is required.

By doing one projection, I focus on something very particular on the data, and I still need other plots and statistical tests a) to decide whether it supports an hypothesis I have in mind, or b) if I can find something new, something unexpected. The distortion of vision is therefore at the same time an issue and a tool to better dig inside the data. I could also make very wrong conclusions, even on analyzing this simple drawing, so why external readers should be more protected this way? There is a balance to find between a drawing that looks simple to read so conclusions appear obvious (even if they are not and you might be wrong), and the opposite one that looks too complex to read so little conclusions will be made, if any. Hence this is a fallacy to argue that graphs are meaningful only for their creator, because it is the case for any plot taken solely, and it is a hard job to enter into the work of somebody else anyway.

So graph visualization is not naturally worse compared to any data drawing: we just don’t teach how to read them in primary school. Do you remember the first time you saw a plot? I guess you find it really abstract. Most of the people don’t really know what to look at on a graph, and produce visualizations that don’t show something in particular. I personally think that it is a good thing, because put in context graph visualization is very young compared to other data drawings, and a language of networks that combine layout algorithms and visual variables is still in the making. Moreover, after meeting and discussing with people publishing such visuals, it seems that they already use it in a pragmatic way: by showing their complexity, graphs communicate to the reader that a) data might contain interesting information (“so please, read until the end!”), b) they made things and propose some findings but it was hard and many other things could be done (“hey, let’s try by yourself!”). It is useless to discover the secrets of the universe if nobody listen to you. Before enlightening the viewer, one should attract the viewer enough so that he/she will take the time to read, and graphs are useful for that need.

But drawing graphs as graphs is not only useful to communicate. Their primary use for researchers is exploratory analysis when the study is not focused on the sole structure of the data, but when elements in context matter because you have a prior knowledge on them, and your questions are related to another perspectives (say, sociology). I take the example of our work at Sciences-Po, where we teach the mapping of controversies to students that will become the future decision makers of companies or public policies. Part of the controversies in the public space are expressed on the Web. The dynamics of the discussions and the hyperlink structure of the Web makes this field particularly hard to investigate. We successfully use graph visualization of websites to help the students to orientate in this space, to assist and justify the classification of websites, and to assert the position of the actors of a controversy. This is just one case among others where there is currently no viable alternative to graph drawing and it’s synoptical property (see the whole without reduction of data).

Finally, the different usages of graph drawing are growing as it becomes mainstream and more people are acculturated. I trust on the people to innovate and progressively learn how to read and extract information. Just practice.