Lately I have been showing how to project a bi-partite graph to mono-partite and run algorithms on it. To mix it up a little I will demonstrate how to project a network between nodes using cosine similarity of certain features that the node possess. We will be using the Network of thrones dataset based on A Storm of Swords book. Check out also their analysis on the same network.

To find communities of people, that are in a similar position of power in the network, we will first calculate pagerank, harmonic centrality, triangles count and cluster coefficient for each node and use those values as features from which we will infer a similarity network using cosine similarity. This will allow us to find communities of people who are in a similar position of power.

### Requirements:

- Neo4j — Neo4j Site
- Graph algorithms — Graph algorithms plugin

### Graph model:

William Lyon has also wrote a blog post how to import and analyse this network. I have stolen both the graph model picture aswell as the cypher query for importing the data into Neo4j.

LOAD CSV WITH HEADERS FROM "https://www.macalester.edu/~abeverid/data/stormofswords.csv" AS row

MERGE (src:Character {name: row.Source})

MERGE (tgt:Character {name: row.Target})

MERGE (src)-[r:INTERACTS]->(tgt)

ON CREATE SET r.weight = toInt(row.Weight)

### Centrality

In graph theory and network analysis, indicators of **centrality** identify the most important nodes within a graph.

Given this assumption we will use centrality indicators as features for the ** k**-nearest neighbors algorithm (

*k*-NN) that will infer a similarity network. As the centralities are not the focus of this blog post I will skip the theory and just run the write back versions of algorithms.

Calculate and write back triangles count and clustering coefficient.

CALL algo.triangleCount('Character', 'INTERACTS', {write:true, writeProperty:'triangles',clusteringCoefficientProperty:'coefficient'}) YIELD nodeCount, triangleCount, averageClusteringCoefficient;

Calculate and write back pageRank score

CALL algo.pageRank( 'MATCH (c:Character) RETURN id(c) as id', 'MATCH (p1:Character)-[:INTERACTS]-(p2:Character) RETURN id(p1) as source, id(p2) as target', {graph:'cypher'});

Calculate and write back harmonic centrality

CALL algo.harmonic( 'MATCH (c:Character) RETURN id(c) as id', 'MATCH (c1:Character)-[:INTERACTS]-(c2:Character) RETURN id(c1) as source, id(c2) as target', {graph:'cypher', writeProperty: 'harmonic'});

### k-NearestNeighbour

First we will normalize our features using the min-max method. Cluster coefficient is already normalized so there is no need to do it again. If you want to do something similar on bigger graphs I would suggest you use apoc.periodic.iterate for batching.

WITH ["pagerank","harmonic","triangles"] as keys UNWIND keys as key MATCH (c:Character) WITH max(c[key]) as max,min(c[key]) as min,key MATCH (c1:Character) WITH c1, key + "normalized" AS newKey, (toFloat(c1[key]) - min) / (max - min) as normalized_value CALL apoc.create.setProperty(c1, newKey, normalized_value) YIELD node RETURN COUNT(*)

First we match all pairs of nodes and compare their cosine similarity. To avoid a complete graph where all nodes are connected between each other, we will set a similarity threshold, meaning that all relationships with cosine similarity less than 0.9 will be ignored and not stored. I am slightly cheating as with the kNN algorithm you define to how many nearest neighbours should the node be connected to and here we are defining how similar or close should the pair of nodes be to store a relationship.

Again for bigger graphs you should use APOC for batching. I wrote a blog post with a very similar example.

MATCH (c1:Character),(c2:Character) where id(c1) < id(c2) WITH c1,c2,apoc.algo.cosineSimilarity([c1.pageranknormalized, c1.coefficient, c1.harmonicnormalized, c1.trianglesnormalized], [c2.pageranknormalized, c2.coefficient, c2.harmonicnormalized, c2.trianglesnormalized]) as cosine_similarity WHERE cosine_similarity > 0.9 MERGE (c1)-[s:SIMILAR_POSITION]-(c2) SET s.cosine = cosine_similarity

### Community Detection

A network is said to have **community structure** if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally. In the particular case of *non-overlapping* community finding, this implies that the network divides naturally into communities of nodes with dense connections internally and sparser connections between communities.

Let’s check what community structure will Louvain algorithm find in our network.

CALL algo.louvain.stream('Character', 'SIMILAR_POSITION', {}) YIELD nodeId, community MATCH (c:Character) where id(c) = nodeId RETURN community, count(*) as communitySize, collect(c.name) as members ORDER BY communitySize ASC LIMIT 20;

Illyrio Mopatis is all alone in Pentos and probably has no network power at all. The most interesting group is community 8, where all the cream of the book is collected ranging from Starks, Targaryens and Lannisters to interestingly also Stannis, Mance and Davos.

Community 106 looks like a community of captains and maesters and differs from largest community in that the largest community has a higher average cluster coefficient.

Let’s try the Label Propagation algorithm and check what it finds.

CALL algo.labelPropagation( 'MATCH (c:Character) RETURN id(c) as id, 1 as weight, id(c) as value', 'MATCH (c1:Character)-[f:INTERACTS]->(c2:Character) RETURN id(c1) as source, id(c2) as target,f.weight as weight', "OUT",{graph:'cypher',partitionProperty:'lpa',iterations:10});

We can immediately observe that LPA returns much more granular communities than Louvain in this network. The biggest community consists of Starks and Lannisters with an addition of Varys. It’s safe to say he deserves this spot. On the other hand Jon is in the community with all members from Night Watch. Daenerys is also left out of the “strongest” community and shares community with Jorah and ser Barristan. She just wasn’t such a badass in season 3 as she has became in season 7 🙂