I was inspired by this cool visualization from the Network of thrones analysis to try and recreate it. Thanks to Michael Hunger and William Lyon, I achieved it using Neo4j and Gephi together with Apoc library. Note that I am not a graph theory expert, so i will only focus on visualization aspect.
Download the latest APOC release.
We will use the standard movies dataset, which you can get with
:play movies Cypher command. First, we will create a social network by creating weighted relationships between persons. The assumption is that the more movies people worked at together the more they know each other. We do this by finding common movies and adding a point for each occasion with the following query.
MATCH (p1:Person)-->(:Movie)<--(p2:Person) where id(p1) < id(p2)
ON CREATE SET r.weight = 1
ON MATCH SET r.weight = r.weight + 1
After installing Gephi we need to install Graph streaming plugin, which can also be easily installed using
Tools --> Plugins --> Available Plugins tab in Gephi. Start a new project and turn on the streaming server as shown below.
Export to Gephi:
We use a custom Apoc procedure, which works like this:
apoc.gephi.add(ip,'workspace1',path,'weightproperty') where ‘weightproperty’ is a property of the relationship that holds the weight value. I named it weight before in cypher statement, but it can be any key of the relationship you want. If ip is set to null it will use default localhost. Our specific cypher query will look like this:
MATCH path = (:Person)-[:KNOWS]->(:Person)
CALL apoc.gephi.add(null,'workspace1',path,'weight') yield nodes
Gephi offers lots of cool options, but it can have some learning curve because it has so many features and you can get lost when first using it. I will show you some steps on how to get to do cool visualizations as seen before in the network of thrones visualizations. This is a map of features I learned to use so far.
- Choose a layout and play around with options to see what best fits your need. I chose Force Atlas 2 layout with dissuade hubs and prevent overlap settings. Picture below:
- The second step is to run graph algorithms and get a bunch of graph metrics such as pagerank, centralities,etc….
- The last step is to design the visualization. Gephi offers lots of custom details in how you design your graph visualizations and allows manually dragging nodes as well. I recommend this article to get some basic understanding and get a feeling of what Gephi is capable of.
You can set color, size, label color and label size based on the calculated metrics from graph algorithms as shown below.
After 10 minutes of playing around with all the options, I came up with this graph visualization of actors related to how many movies they worked at together.
If you are still here thank you for reading through and please share some feedback 🙂
Register now for your copy of the O’Reilly book, Graph Algorithms: Practical Examples in Apache Spark and Neo4j by Mark Needham and Amy E. Hodler.