Avg. clustering coefficient calculated incorrectly on graph · Issue #625 · gephi/gephi · GitHub
Skip to content

Avg. clustering coefficient calculated incorrectly on graph #625

Closed
@brycethomas

Description

Hi,

Consider the following graph:

http://i.imgur.com/7hMZq.png

Gephi (0.8.1) calculates the following local clustering coefficients:

http://i.imgur.com/ilT1f.png

Gephi averages these: (1.0 + 0.33333 + 1.0 + 0.0) / 4 = 0.583, and reports this number (0.583) as the Avg. Clustering Coefficient. This result is inconsistent with the algorithm from Main-memory Triangle Computations for Very Large (Sparse (Power-Law)) Graphs that Gephi claims to implement. The author of the aforementioned paper (Latapy) has his own C implementation of the algorithm, which produces 0.7777.

I surmise that the mistake Gephi is making is that it follows Latapy's advice of not calculating a local clustering coefficient for nodes with degree <= 1, but then includes this node in the count of total nodes anyway. In the example I gave above, I am referring to node Four, whose clustering coefficient is calculated as 0.0. Note that if we were to do (1.0 + 0.33333 + 1.0 + 0.0) / 3 instead of (1.0 + 0.33333 + 1.0 + 0.0)/4 we would get 0.7777, which is consistent with the result of Latapy's implementation.

To throw another spanner in the works, there is a second potential problem, but I am waiting for this to be confirmed by Latapy. This second problem is that it appears to me that Latapy himself implements the clustering coefficient algorithm differently to how it was first described in Collective dynamics of small world networks by Watts & Strogatz, which he references. I didn't see any mention in Watts & Strogatz paper to suggest nodes with degree <= 1 should be excluded from the calculations at all.

To summarise, I believe clustering coefficient is implemented incorrectly in Gephi. I suggest as a first step at least ensuring it is consistent with Latapy's implementation, and then later on figure out whether Latapy's implementation is itself inconsistent with the original definition of Avg. clustering coefficient. I haven't worked on the Gephi code base before, but if someone can point me to the right files, I'm willing to have a go at fixing it.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions