Introducing Data Laboratory

Eduardo Ramos

My name is Eduardo Ramos and this summer I have been working on a project for Gephi called Data Laboratory, which was initially designed as an idea for a GSoC project. The first specifications can be found at this wiki page . The new data laboratory features will be included in the 0.8 version of Gephi, released later this year.

Presentation

The general purpose of this project is to improve the basic and common features offered in “Data Laboratory” section of Gephi software.
In this section there are two tables (nodes and edges) that show the attributes of every node or edge as table columns. Before starting this project, the available features for modifying the table structure and values were not enough so there was a lack of graph manipulation and edition, refine or even create a graph with attributes in a tabular view.

New key-features coming:

  • Graph edition in many ways
  • Columns add/remove
  • Search & Replace
  • Import & Export CSV files
  • Charts and statistics reports
  • Sparlines
  • Merge columns
  • And the possibility of extending all of this with plugins!

Data Laboratory needed to provide an API/SPI for using and extending the new options available as plugins. A complete API is being designed to be able to use these new general features from any module or plugin. As this API is independent from the user interface, these actions will be included in the toolkit.

Types of manipulators (i.e actions) and how they look in the UI

Click to enlarge image

General actions

These are not related to a specific element like nodes, edges or columns and normally provide a UI. They appear as buttons in the toolbar at the top of data table, and can be grouped in a drop down button called “Plugins” if necessary.

Some of the new basic features of this type are: Add node/edge or Clear graph and clear edges. The rest of general actions in the picture are more special. Search/Replace shows an advanced UI to search and replace values in the table cells. It can do a normal search or a regular expression based search, among other useful options. It is implemented in a separate controller that is part of the Data Laboratory API.

Export CSV table allows the user to export the data in the current table selecting the desired columns, separator and charset to use:

Import CSV does the contrary operation, showing a wizard to configure the import settings in 2 steps:

Click on the image to enlarge.

Click on the image to enlarge.

Nodes/Edges actions

These actions are shown in a context menu on right click on one or more table rows that represent a node or an edge.
The currently implemented manipulators for nodes are:

  • Edit node properties – Shows edit window for the clicked node
  • Select on graph view – Centers graph view on the clicked node
  • Select neighbor nodes on table – Modifies the same table rows selection to highlight neighbor nodes of the clicked node
  • Delete – Deletes the selected node(s)
  • Clear node data… – Shows a UI to choose what columns to clear of the selected node(s)
  • Copy node data to the other selected nodes – Is only enabled when more than one node is selected, and copies the chosen columns of the clicked node to the other nodes
  • Group – Groups the selected nodes
  • Ungroup – Ungroups the selected groups
  • Ungroup recursively – Ungroups the selected groups, and all their descendant groups
  • Move to group… – Shows a UI to choose an available group to move the selected nodes
  • Remove from group – Removes the selected nodes from their group, putting them in the superior level of hierarchy
  • Settle and Free – Lock/Unlock the node(s) position
  • Set node size – Sets the given size in the UI to the selected node(s)
  • Link nodes – Is only enabled when more than one node is selected and shows a UI to select a source node which will be linked to all the other selected nodes
  • Copy node – Makes the desired number of copies of the selected node(s) with their attributes and properties

And for edges:

  • Select source node on graph view – Centers graph view on the source node of the clicked edge
  • Select target node on graph view – Centers graph view on the target node of the clicked edge
  • Select source and target on nodes table – Modifies the nodes table row selection to highlight the source and target nodes of the clicked edge
  • Delete – Deletes the selected edge(s)
  • Delete with nodes – Deletes the selected edge(s) and the nodes that the user chooses in the UI (source and/or target)
  • Clear edge data… – Shows a UI to choose what columns to clear of the selected edge(s)
  • Copy edge data to the other selected edge – Is only enabled when more than one edge is selected, and copies the chosen columns of the clicked edge to the other edges

Attribute columns  actions

Like nodes or edges manipulators, these are designed to operate with a specific type of data, attribute columns in this case.
They appear in Data Laboratory UI as independent (or grouped by type) drop down buttons. Each one represents an action that can be done with a single attribute column, therefore when the button is clicked, a list of the available columns for that operation (being the conditions specified by the manipulator) is shown to select one for execution.

Some of the implemented attribute column manipulators are:
Basic operations like adding columns, deleting columns, clearing and copying columns data, filling a column with some value and duplicating a column to other with the data type that the user needs, doing a data conversion when possible.

Other column manipulators operate with specific types of columns data like boolean or numeric or use regular expressions to obtain a new column from other column.

Click on the image to enlarge.

Click on the image to enlarge.

Columns merge strategies

Finally, other part of the project SPI can define different strategies for merging various columns.

Click on the image to enlarge.

Conclusion

This project has been a great opportunity for me to experience working with an open source community and learn about many programming aspects like API/SPI design, creating better user interfaces and creating modular applications and I will be happy to participate on future projects for Gephi.

Some more information can be found at this wiki page which will be updated soon with more documentation and help about how-to extend Data Laboratory using the new SPI.

Also your opinion and needs are very important to improve Gephi, so you can suggest and ask anything about Data Laboratory project at this forum post.

Eduardo Ramos

Mozilla Drumbeat – Map the web

Mozilla Drumbeat initiative is an open project to build a better web. It gathers communities around various projects to discuss technology and the way we will use the web in the future. It is also possible to submit your own project ideas.

But there is one which interests us in particular already, Map the Web:

Map the Web uses art, design and data to map the internet — to help all of us understand the web, how it works and what it means.

The objectives of the project so far, quoted from the project page:

  1. Transform the big, abstract internet into something simple and emotional that busy people can understand by …
  2. Building a community of artists, designers and data nerds passionate about mapping the internet who …
  3. Create tools and maps that help people understand how the internet works and where they fit in.
  4. Over time: use the insights from these maps to generate other kinds of tools and projects that add to users’ experiences of the web.

At Gephi we do believe in the power of maps to tell stories and help users to understand complex, unsorted data. For instance why not use maps for bookmarks? Represented by a network of associated URLs and tags, bookmarks would appear in thematic clusters naturally. Gary Flake showed recently interesting ideas about data visualization use in browser.

Gephi is a desktop Java application, but in the next months our aim is to launch a web canvas project as well. The idea is to lead or participate building a network visualization library standard for the web. Simple enough to be used in various applications, we propose to do this using WebGL. A data visualization library must be efficient and OpenGL is clearly suitable for this task. We had considered starting a Google Summer of Code project this year about that but we finally decided to wait a bit more. WebGL is getting lots of support and development and promise to be the standard, as Google recently dropped O3D. We think this canvas project has many common interests with the ‘Map the Web’ Drumbeat project and therefore naturally propose to help.

Let’s start the discussion and contribute to this project! Who’s joining?

New GraphViz DOT, CSV and UCINET formats

Gephi now supports GraphViz DOT file format. This new feature is shipped with two others: UCINET DL and CSV formats. With a broader set of input file formats, it reinforces interoperability between tools and allows Gephi to be found effective on different problems. Tabular data and other delimited text files are now supported through the CSV (comma-separated values) importer. Two columns which represents relationships between elements can now easily be pushed to Gephi.

Note that DOT support is still incomplete. Subgraphs, shapes and some attributes are not supported for the moment. Please report on the forum or bug tracker any issue you found using these new features.

To have these features, just update your Gephi application. In Gephi, go to Help > Check for Updates.

Consult the datasets page to find sample networks.

Documentation has been completed for these three new formats: