As I said in the previous blog post, APOC plugin offers more graph algorithms, which you can easily use by simply calling APOC procedures in cypher. Today I will show you how to use centrality and community algorithms. There are also some path-finding algorithms in APOC, like dijkstra or A*, but that may be covered next time. I also found this cool blog post from William Lyon, where he does something similar as I am doing, but uses picture and describes graph algorithms nicely.

## Dataset:

I will just use the standard actors/movies dataset, which you can get by running ` play :movies`

in your Neo4j Browser console.

I will use same dummy features as in previous part.

## Algorithms:

As we noticed in part one of the series, our actors are very similar, given the two features, we used to describe them(age and number of movies). In my case I will use hard similarity, meaning I will set a threshold value of similarity, which will determine if a relation between two persons is created or not. If we do not do this, we end up with an all connected graph, meaning every Person node is connected to all the other Person nodes. As an example community detection algorithms does not work well with that.

### Hard similarity:

MATCH (p1:Person),(p2:Person) where id(p1) < id(p2) AND exists(p1.age) and exists(p2.age) WITH p1,p2,apoc.algo.euclideanSimilarity([p1.count,p2.age_nor],[p2.count,p2.age_nor]) as value //Hard similarity with a threshold of 0.6 WITH p1,p2,value where value > 0.6 MERGE (p1)-[s:SIMILARITY]-(p2) SET s.e_similarity = value

*We have created 3300 relationships between 123 people. We can also see that kNN algorithm results are 6 different isolated clusters.*

### Community detection:

Apoc uses simple label propagation kernel for community detection. More info about algorithm in documentation.

CALL apoc.algo.community(25,null,'community','SIMILARITY','BOTH','e_similarity' ,10000)

Lets check results:

MATCH (n:Person) RETURN distinct(n.community) as community,count(n) as Members, avg(n.age) as avg_age,avg(n.count) as avg_count,stdev(n.age) as age_stdev,stdev(n.count) as count_stdev

We can easily see, that the biggest cluster with 77 members has an average age of 58.5 years and worked at one movie. We can also see that our clusters differ from each other based on the number of movies members worked at and are of similar ages.

Lets try some visualization with APOC virtual nodes and relationships.

MATCH (p1:Person) with p1.community as community,collect(p1) as members CALL apoc.create.vNodes(['Community'], [{name:community}]) YIELD node as comm UNWIND members as member CALL apoc.create.vRelationship(member,'MEMBER_OF_COMMUNITY',{}, comm) YIELD rel RETURN member,rel,comm

I also want to give you some examples of other algorithms. Theory is from Wikipedia.

### Closeness centrality:

In a connected graph, the **closeness centrality** (or **closeness**) of a node is a measure of centrality in a network, calculated as the sum of the length of the shortest paths between the node and all other nodes in the graph. Thus the more central a node is, the closer it is to all other nodes.

MATCH (node:Person) WITH collect(node) AS nodes CALL apoc.algo.closeness(['SIMILARITY'],nodes,'BOTH') YIELD node, score RETURN node, score ORDER BY score DESC

### Betweenness centrality:

In graph theory, **betweenness centrality** is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through (for unweighted graphs) or the sum of the weights of the edges (for weighted graphs) is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.

MATCH (node:Person) WITH collect(node) AS nodes CALL apoc.algo.betweenness(['SIMILARITY'],nodes,'BOTH') YIELD node, score RETURN node, score ORDER BY score DESC

### Pagerank:

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

MATCH (node:Person) WITH collect(node) AS nodes // perform 10 page rank iterations, computing only over relationships of type SIMILARITY CALL apoc.algo.pageRankWithConfig(nodes,{iterations:10,types:'SIMILARITY'}) YIELD node, score RETURN node, score ORDER BY score DESC

## Awesome algorithms:

Apoc also supports Pagerank and Cypher, which does not need direct network, but can use the power of cypher to run algorithms on indirect relationships in graph. That is pretty awesome.

### Pagerank with cypher:

CALL apoc.algo.pageRankWithCypher( {iterations:5, write:true, node_cypher:'MATCH (p:Person) return id(p) as id', rel_cypher: 'MATCH (p:Person)-->()<--(o:Person) return id(p) as source,id(o) as target, 1 as weight' });

### Betwenness with cypher:

CALL apoc.algo.betweennessCypher( {write:true, node_cypher:'MATCH (p:Person) return id(p) as id', rel_cypher: 'MATCH (p:Person)-->()<--(o:Person) return id(p) as source,id(o) as target, 1 as weight' })

## Conclusion:

I like it just how easy it is to run graph algorithm procedures with cypher. You can analyse direct network or even indirect networks using algorithms with cypher without any external tools, which is pretty impressive.

Thanks for reading.

Interesting Post! Do you have any idea of how to use these algorithms on a selected subgraph of a large temporal graph aggregated based on time. That’s to say, select a subgraph from the large graph in the database and apply pagerank or betweennesss centrality on the subgraph not on the large graph. With this one could look at the results of these algorithms incrementally and see how these properties change temporally in a temporal graph. If Neo4j can implement this, it will really be nice to study dynamic networks.

LikeLike

Hi, at this point I’m start to thinking APOC is some kind of magic :). But since there is no docs about `apoc.algo.betweennessCypher` I’m not clear how to use it with this virtual graph:

“`

MATCH (p1:Individuo) -[:ImputatoDi]->(m:Reato)<-[:ImputatoDi]-(p2:Individuo)

WHERE p1 p2

CALL apoc.create.vRelationship(p1,’compagnoDireato’,{name:m.name},p2) YIELD rel as e

RETURN [p1,p2] as n, e

“`

LikeLike

in the node cypher you return ids of nodes you want to take into consideration and for relationship query you put it the pattern of the relationships between those nodes. You do not need to create your own virtual relationships…

something like this

MATCH (p1:Individuo) -[:ImputatoDi]->(m:Reato)(m:Reato)<-[:ImputatoDi]-(p2:Individuo)

WHERE p1 p2 return id(p1) as source,id(p2) as target, 1 as weight'

});

LikeLike

Obiouvsly you’re right about the virtual relationships, but I’m still confused 🙂

First of all, I want to get the betweenness of each node labeled as “Individuo”, in my query p1 and p2 BOTH.

Second, I want to know each node betweenness score, in your article the query “CALL apoc.algo.betweennessCypher…” returns some general infos about the graph and the execution time. I’m new to cypher, there is any docs about the usage of “apoc.algo.betweennessCypher”?

LikeLike

wait… since I do not need that virtual relationship I simply can do that:

MATCH (p1:Individuo) -[:ImputatoDi]->(m:Reato)<-[:ImputatoDi]-(p2:Individuo)

WHERE p1p2

WITH collect(distinct p1) AS nodes

CALL apoc.algo.betweenness([‘ImputatoDi’],nodes,’BOTH’) YIELD node, score

RETURN node, score

ORDER BY score DESC

Do you agree?

Let me explain the purpose: basically all I want is to compute “betweenness” in this projected graph. For the visualization I need that vRelationship, fot the algorithm I just need the pattern 🙂

LikeLike