Neo4j APOC graph algorithms part 2

As I said in the previous blog post, APOC plugin offers more graph algorithms, which you can easily use by simply calling APOC procedures in cypher. Today I will show you how to use centrality and community algorithms. There are also some path-finding algorithms in APOC, like dijkstra or A*, but that may be covered next time. I also found this cool blog post from William Lyon, where he does something similar as I am doing, but uses picture and describes graph algorithms nicely.

Dataset:

I will just use the standard actors/movies dataset, which you can get by running play :movies in your Neo4j Browser console.

I will use same dummy features as in previous part.

Algorithms:

As we noticed in part one of the series, our actors are very similar, given the two features, we used to describe them(age and number of movies). In my case I will use hard similarity, meaning I will set a threshold value of similarity, which will determine if a relation between two persons is created or not. If we do not do this, we end up with an all connected graph, meaning every Person node is connected to all the other Person nodes. As an example community detection algorithms does not work well with that.

Hard similarity:

MATCH (p1:Person),(p2:Person) where id(p1) < id(p2)
AND exists(p1.age) and exists(p2.age)
WITH p1,p2,apoc.algo.euclideanSimilarity([p1.count,p2.age_nor],[p2.count,p2.age_nor]) as value 
//Hard similarity with a threshold of 0.6 
WITH p1,p2,value where value > 0.6
MERGE (p1)-[s:SIMILARITY]-(p2)
SET s.e_similarity = value

Screen Shot 2017-04-27 at 23.01.57

We have created 3300 relationships between 123 people. We can also see that kNN algorithm results are 6 different isolated clusters.

Community detection:

Apoc uses simple label propagation kernel for community detection. More info about algorithm in documentation.

CALL apoc.algo.community(25,null,'community','SIMILARITY','BOTH','e_similarity' ,10000)

Lets check results:

MATCH (n:Person) 
RETURN distinct(n.community) as community,count(n) as Members,
avg(n.age) as avg_age,avg(n.count) as avg_count,stdev(n.age) as age_stdev,stdev(n.count) as count_stdev

community_results

We can easily see, that the biggest cluster with 77 members has an average age of 58.5 years and worked at one movie. We can also see that our clusters differ from each other based on the number of movies members worked at and are of similar ages.

Lets try some visualization with APOC virtual nodes and relationships.

MATCH (p1:Person) with p1.community as community,collect(p1) as members 
CALL apoc.create.vNodes(['Community'], [{name:community}]) YIELD node as comm 
UNWIND members as member 
CALL apoc.create.vRelationship(member,'MEMBER_OF_COMMUNITY',{}, comm) YIELD rel 
RETURN member,rel,comm

virtual_.png

I also want to give you some examples of other algorithms. Theory is from Wikipedia.

Closeness centrality:

In a connected graph, the closeness centrality (or closeness) of a node is a measure of centrality in a network, calculated as the sum of the length of the shortest paths between the node and all other nodes in the graph. Thus the more central a node is, the closer it is to all other nodes.

MATCH (node:Person)
WITH collect(node) AS nodes
CALL apoc.algo.closeness(['SIMILARITY'],nodes,'BOTH') YIELD node, score
RETURN node, score
ORDER BY score DESC

Betweenness centrality:

In graph theory, betweenness centrality is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through (for unweighted graphs) or the sum of the weights of the edges (for weighted graphs) is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.

MATCH (node:Person)
WITH collect(node) AS nodes
CALL apoc.algo.betweenness(['SIMILARITY'],nodes,'BOTH') YIELD node, score
RETURN node, score
ORDER BY score DESC

Pagerank:

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

MATCH (node:Person)
WITH collect(node) AS nodes
// perform 10 page rank iterations, computing only over relationships of type SIMILARITY
CALL apoc.algo.pageRankWithConfig(nodes,{iterations:10,types:'SIMILARITY'}) YIELD node, score
RETURN node, score
ORDER BY score DESC

Awesome algorithms:

Apoc also supports Pagerank and Cypher, which does not need direct network, but can use the power of cypher to run algorithms on indirect relationships in graph. That is pretty awesome.

Pagerank with cypher:

CALL apoc.algo.pageRankWithCypher(
{iterations:5, write:true, 
node_cypher:'MATCH (p:Person) return id(p) as id',
rel_cypher: 'MATCH (p:Person)-->()<--(o:Person) return id(p) as source,id(o) as target, 1 as weight'
});

Betwenness with cypher:

CALL apoc.algo.betweennessCypher(
{write:true, 
node_cypher:'MATCH (p:Person) return id(p) as id',
rel_cypher: 'MATCH (p:Person)-->()<--(o:Person) return id(p) as source,id(o) as target, 1 as weight'
})

Conclusion:

I like it just how easy it is to run graph algorithm procedures with cypher. You can analyse direct network or even indirect networks using algorithms with cypher without any external tools, which is pretty impressive.

Thanks for reading.

Advertisements

5 thoughts on “Neo4j APOC graph algorithms part 2

  1. Interesting Post! Do you have any idea of how to use these algorithms on a selected subgraph of a large temporal graph aggregated based on time. That’s to say, select a subgraph from the large graph in the database and apply pagerank or betweennesss centrality on the subgraph not on the large graph. With this one could look at the results of these algorithms incrementally and see how these properties change temporally in a temporal graph. If Neo4j can implement this, it will really be nice to study dynamic networks.

    Like

  2. Hi, at this point I’m start to thinking APOC is some kind of magic :). But since there is no docs about `apoc.algo.betweennessCypher` I’m not clear how to use it with this virtual graph:

    “`
    MATCH (p1:Individuo) -[:ImputatoDi]->(m:Reato)<-[:ImputatoDi]-(p2:Individuo)
    WHERE p1 p2
    CALL apoc.create.vRelationship(p1,’compagnoDireato’,{name:m.name},p2) YIELD rel as e
    RETURN [p1,p2] as n, e
    “`

    Like

  3. in the node cypher you return ids of nodes you want to take into consideration and for relationship query you put it the pattern of the relationships between those nodes. You do not need to create your own virtual relationships…

    something like this

    MATCH (p1:Individuo) -[:ImputatoDi]->(m:Reato)(m:Reato)<-[:ImputatoDi]-(p2:Individuo)
    WHERE p1 p2 return id(p1) as source,id(p2) as target, 1 as weight'
    });

    Like

  4. Obiouvsly you’re right about the virtual relationships, but I’m still confused 🙂
    First of all, I want to get the betweenness of each node labeled as “Individuo”, in my query p1 and p2 BOTH.
    Second, I want to know each node betweenness score, in your article the query “CALL apoc.algo.betweennessCypher…” returns some general infos about the graph and the execution time. I’m new to cypher, there is any docs about the usage of “apoc.algo.betweennessCypher”?

    Like

  5. wait… since I do not need that virtual relationship I simply can do that:

    MATCH (p1:Individuo) -[:ImputatoDi]->(m:Reato)<-[:ImputatoDi]-(p2:Individuo)
    WHERE p1p2
    WITH collect(distinct p1) AS nodes
    CALL apoc.algo.betweenness([‘ImputatoDi’],nodes,’BOTH’) YIELD node, score
    RETURN node, score
    ORDER BY score DESC

    Do you agree?
    Let me explain the purpose: basically all I want is to compute “betweenness” in this projected graph. For the visualization I need that vRelationship, fot the algorithm I just need the pattern 🙂

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s