Neo4j GoT Graph Analysis

Winter has come!

Thanks to Joakim Skoog, who built this cool API of ice and fire. It offers data about GoT universe, that we will examine and do some further analysis.

I will show you how to use just released neo4j graph algorithms plugin on the data, that we got from the above API. As the name suggests, they are user defined procedures of graph algorithms, that you can call with cypher. For most algorithms there are two procedures, one that writes results back to the graph as node-properties and another (named that returns a stream of data, e.g. node-ids and computed values.



Our cool community caretaker Michael Hunger spotted this data and posted a blog post, so that we can easily import the data into our Neo4j using cypher shell. I followed his script, imported the data, and am now sharing the graph.db database, that you can simply copy to your neo4j/data/databases folder and you will have the GoT universe in your Neo4j

Graph Model:

We have imported the data, that describes a network of persons and houses, with some additional meta-data. There are a couple of relationships between persons and houses like SWORN_TO and LED_BY. We can also find family tree hiearchies in PARENT_OF relationship. I will use the SWORN_TO relationship between houses, that describes house hiearchy of allegiances. It is a directed, unweighted network.


Sworn to network


At first glance we can observe, that we have some bridges between different part of communities and that they all link towards the center where the power resides. As you will see this network is from the peaceful times, when the 7 kingdoms were at peace.

Connected Components:

As a part of preprocessing we will check how many disconnected components exist within our network with algo.unionFind.

Connected Components or UnionFind basically finds sets of connected nodes where each node is reachable from any other node in the same set. In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the graph.


CALL algo.unionFind('House', 'SWORN_TO', {write:true, partitionProperty:"partition"})

We see that there is one big component with 393 members and a small one with 5 members. All the rest are disconnected with only one member. That means that either they are a loner house or we are missing some data. We will use centrality algorithms only on the biggest component with 393 members as graph algorithms work better on connected graphs.


In graph theory and network analysis, indicators of centrality identify the most important nodes within a graph. Applications include identifying the most influential person(s) in a social network, key infrastructure nodes in the Internet or urban networks, and super-spreaders of disease. Centrality concepts were first developed in social network analysis, and many of the terms used to measure centrality reflect their sociological origin.

Outgoing degree is always one as each house is sworn to only one house.

Incoming degree:

We simply count how many incoming relationships each house has. In this example it directly translates to how many houses are sworn to each house.

MATCH (h:House)<-[:SWORN_TO]-(:House)
RETURN as house,count(*) as in_degree order by in_degree desc limit 10


House Tyrell leads with a decent margin in front of Lannisters. The main force is house Baratheon, that consists of two branches and has combined number of sworn houses at 77. Others slowly follow, its funny ( sad ? ) to see, that even house Baelish of Harrenhal, which is ruled by Petyr Baelish has more support than house Stark of Winterfell. Books and show divergence a lot for Littlefinger’s character as he is not a ruler of Harrenhal in the show.

Pagerank centrality:

Pagerank is a version of weighted degree centrality with a feedback loop. You only get your “fair share” of your neighbor’s importance. That is, your neighbor’s importance is split between their neighbors, proportional to the number of interactions with that neighbor. Intuitively, PageRank captures how effectively you are taking advantage of your network contacts. In our context, PageRank centrality nicely captures where the power resides. Houses that have the support of other big houses will come on top.

CALL algo.pageRank(
'MATCH (h:House) where h.partition = 151 return id(h) as id',
'MATCH (p1:House)-[:SWORN_TO]->(p2:House) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:30, write: true});


I was surprised with the results to say the least, but after investigating a bit I understood why. Remember in the picture of sworn to network, where I told you that all the different communities point at the center of the graph where the power resides. And at that center are the two Baratheon houses, that are sworn to each other and hence the number is so big. Where incoming degree takes into account only the number of your neighbours, pagerank also takes into account their importance. You can notice that house Stark has a better position than house Baelish, because Stark’s supporters are more important ( have more sworn to houses ).

Betweenness centrality:

Betweenness centrality identifies nodes that are strategically positioned in the network, meaning that information will often travel through that person. Such an intermediary position gives that person power and influence. Betweenness centrality is a raw count of the number of short paths that go through a given node. For example, if a node is located on a bottleneck between two large communities, then it will have high betweenness.

CALL algo.betweenness(
'MATCH (p:House) WHERE p.partition = 151 RETURN id(p) as id',
'MATCH (p1:House)-[:SWORN_TO]->(p2:House) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', write: true, writeProperty: 'betweenness'});

House Baratheon is uniquely positioned at the center of the graph, so it will be the first in all of the centrality measures. Strong second candidate are the Tyrells, which is a bit surprising as they beat the Lannisters at both pagerank and betweenness centrality. In our case betweenness centrality measures which are the bridging houses with big support of other houses at their back.

Closeness centrality:

The most central nodes according to closeness centrality can quickly interact to all others because they are close to all others. This measure is preferable to degree centrality, because it does not take into account only direct connections among nodes, but also indirect connections.

CALL algo.closeness(
'MATCH (p:House) WHERE p.partition = 151 RETURN id(p) as id',
'MATCH (p1:House)-[:SWORN_TO]->(p2:House) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', write: true, writeProperty:'closeness'});

Closeness centrality is different from others in that the lower the closeness weight is, the better. As always house Baratheon leads with a decent margin, Tyrells are standing firmly on the second place. Starks are better than Lannisters at closeness, but are losing in all other measures, so they get fourth place and Lannisters third.


This is a network from peaceful times and is therefore less interesting than having two strong disconnected components, that fight each other. What would be cool is to get a network based on the current show or even better that G.R.R.M publishes the Words of Winter, where alliances will change and people will die, to see how the graph evolves through time.

Check also, where you can see how to expose this data as a GraphQL endpoint.

Thank for reading!


One thought on “Neo4j GoT Graph Analysis

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s