Semantic plugin: AlchemyAPI

A new plugin is available for Gephi that utilizes the power of natural language processing (NLP) software to analyze text documents and visualize their contents. The plug-in was created by AlchemyAPI (alchemyapi.com), and utilizes the AlchemyAPI REST service to semantically process a web page or text file and show all the subjects of the text (people, places and things, known collectively as named entities) as nodes in Gephi.

 

Graph of the American Revolution wikipedia entry.

The plug-in is a powerful tool to distill dense and unstructured textual data into easy to understand graphs. Extracted entities possess a relevance attribute which is a measure of how pertinent the subject is to the source text, and also a count attribute that indicates the number of times the subject is named in the source text. Both of these attributes can be used to affect the visualization.

Once installed, the plug-in can be accessed through the File->Generate->Semantic Analysis menu. As an example of the functionality of the plug-in, we’ll examine the wikipedia entry for the American Revolution. To make a graph with this article, enter the article’s url into the Semantic Analysis dialog box. The plug-in will extract over 350 people, places, and things from the wikipedia page. You can use this data to create a word cloud type visualization of the article, like the one above.

If subtype analysis is enabled, you can also visualize the types and subtypes of named entities. For example, the nodes in the image below were extracted from a recent news article. They represent Dmitry Medvedev and his ontological classifications. The edges from Medvedev’s node identify him as a Person, Politician, and President (classifications he shares with Mahmoud Ahmadinejad). A complete list of the subtypes AlchemyAPI returns can be found at http://www.alchemyapi.com/api/entity/types.html.

Detail of named entity subtypes

The plug-in can also be used to visualize the connections between multiple text documents. Connections will be drawn between the document node and the entities that the texts share, creating a powerful way of discovering recurring themes within an archive. As an example, see the connections shared between the wikipedia pages for the American Revolution and the French Revolution in the picture below. Common entities like ‘France’, ‘Britain’, and ‘Thomas Paine’ are linked by both the French Revolution and American Revolution articles.

Graph of connections between American and French Revolution wikipedia entries.

As more documents are added to the graph, a web of entities form. The relevance and count of connected entities increase with the number of documents that mention them.

We hope you use this plug-in to make the data in your text more accessible. If you have any questions or suggestions for the makers of this plug-in, please leave them in the comments section.

Our thanks to the Gephi team for their remarkable visualization program, and all the documentation and help that made this plug-in possible.
/seadragon-samples/espn_out_2/seadragon.html
Graph of espn.com front page and linked articles.

Shaun Roach

Download the Gephi plugin for AlchemyAPI here, or find it in your Gephi plug-in center.

Label Adjust

The Label Adjust functionality is a special type of algorithm. It is available through the Spatialization menu but instead of working with nodes position it works with labels. The aim is to automatically avoid label overlapping.

Gephi is built to produce readable maps, which can be published or printed. By default, if a network has more than 1000 nodes it becomes hard to read and even more if labels are displayed. With the Label Adjust algorithm, the boring work when you manually move each node of the network vanished.

When running, the algorithm slightly moves nodes where labels are overlapping. For instance with long labels like URLs this functionality is really time-saving, and it is easy to use. Display labels as you want (font, size, color, …) and start the algorithm. It automatically stops when its detect no more label overlapping, but you can also stop it by hand.

Here is a small demo video of the feature running. Needless to say the algorithm is designed for larger networks.

http://vimeo.com/moogaloop.swf?clip_id=2242916&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1

This functionality is important when exporting map results in Gephi. The standard process of publishing network maps in Gephi would be something like that:
1. Spatialize the network, using for instance Force Atlas algorithm.
2.Use filters to set nodes color and size depending of the network data.
3.Display labels and set text settings.
4.Use Label Adjust to makes all labels readable.
5.Export.