I was inspired by this cool visualization from the Network of thrones analysis to try and recreate it. Thanks to Michael Hunger and William Lyon, I achieved it using Neo4j and Gephi together with Apoc library. Note that I am not a graph theory expert, so i will only focus on visualization aspect.
Requirements:
- Neo4j — Neo4j Site
- Apoc plugin — Apoc plugin
- Gephi — Gephi
Download the latest APOC release.
Dataset:
We will use the standard movies dataset, which you can get with :play movies
Cypher command. First, we will create a social network by creating weighted relationships between persons. The assumption is that the more movies people worked at together the more they know each other. We do this by finding common movies and adding a point for each occasion with the following query.
MATCH (p1:Person)-->(:Movie)<--(p2:Person) where id(p1) < id(p2)
MERGE (p1)-[r:KNOWS]-(p2)
ON CREATE SET r.weight = 1
ON MATCH SET r.weight = r.weight + 1
Setup:
After installing Gephi we need to install Graph streaming plugin, which can also be easily installed using Tools --> Plugins --> Available Plugins
tab in Gephi. Start a new project and turn on the streaming server as shown below.
Export to Gephi:
We use a custom Apoc procedure, which works like this:
apoc.gephi.add(ip,'workspace1',path,'weightproperty')
where ‘weightproperty’ is a property of the relationship that holds the weight value. I named it weight before in cypher statement, but it can be any key of the relationship you want. If ip is set to null it will use default localhost. Our specific cypher query will look like this:
MATCH path = (:Person)-[:KNOWS]->(:Person)
CALL apoc.gephi.add(null,'workspace1',path,'weight') yield nodes
return *
Visualization:
Gephi offers lots of cool options, but it can have some learning curve because it has so many features and you can get lost when first using it. I will show you some steps on how to get to do cool visualizations as seen before in the network of thrones visualizations. This is a map of features I learned to use so far.
Steps:
- Choose a layout and play around with options to see what best fits your need. I chose Force Atlas 2 layout with dissuade hubs and prevent overlap settings. Picture below:
- The second step is to run graph algorithms and get a bunch of graph metrics such as pagerank, centralities,etc….
- The last step is to design the visualization. Gephi offers lots of custom details in how you design your graph visualizations and allows manually dragging nodes as well. I recommend this article to get some basic understanding and get a feeling of what Gephi is capable of.
You can set color, size, label color and label size based on the calculated metrics from graph algorithms as shown below.
After 10 minutes of playing around with all the options, I came up with this graph visualization of actors related to how many movies they worked at together.
If you are still here thank you for reading through and please share some feedback 🙂
Register now for your copy of the O’Reilly book, Graph Algorithms: Practical Examples in Apache Spark and Neo4j by Mark Needham and Amy E. Hodler.
HI,
Thanks for great article. It helped me a lot. One thing that I couldn’t figure out that how to pass all the property to gephi for the different node label than default ones. I’ve posted same questions here: https://stackoverflow.com/questions/46282272/pass-graph-property-names-to-gephi-using-apoc-gephi-add
LikeLike
Hi
I was trying to work with the procedure but I was not able to display properties in gephi (of the relationships), do you have any suggestions?
LikeLike
I have just added this functionality to APOC, now you can export optional properties
LikeLike
you need to create a call like this:
“`
curl “http://localhost:8080/workspace1?operation=updateGraph” -d “{\”an\”:{\”B\”:{\”label\”:\”Streaming Node B\”, \”foo\”:\”bar\”,\”centrality\”:\”2.0\”}}}”
“`
LikeLike
I am confused as to how to connect to an external neo4j DB. Lets say I have a neo4j database at 123.12.1234:7474/browser, where do I put this information into Gephi on my local computer to access this graph the same way my browser can? I can you add more detail to the IP call? Do I use my laptops’s ip?
LikeLike