Graph algorithms plugin for Neo4j is getting bigger and bigger by the day, so here are some new algorithms, that have been added to the collection. I will use Facebook’s ego network of friends to show triangle count algorithms and betweenness centrality algorithms on an *undirected* graph in Neo4j.

### Requirements:

- Neo4j — Neo4j Site
- Graph algorithms — Graph algorithms plugin

## Graph model:

We will import a friends social network of Facebook’s users. There are 4039 nodes labeled User and 88234 unweighted relationships indicating friendship between users as shown below. You can always get the schema of your graph using `CALL dbms.schema()`

.

We create a constraint on User labels, so that our queries will run faster.

CREATE CONSTRAINT ON (u:User) ASSERT u.id is unique

### Import:

Direct link to data is here. We will model this network as *undirected*. There is a twist here, that Neo4j supports storing only *directed* relationships. That is no problem as we can still treat it as *undirected* in our queries and algorithms.

USING PERIODIC COMMIT 5000 LOAD CSV FROM "file:///facebook_combined.txt" as row fieldterminator ' ' MERGE (u:User{id:row[0]}) MERGE (u1:User{id:row[1]}) MERGE (u)-[:FRIEND]-(u1)

We are using `MERGE`

without specifying direction of the relationship as it does not matter in an *undirected* graph. Using `MERGE`

like this will check for any existing relationships in both direction and if none exists it will choose a random direction to store the new relationship.

### Graph Analysis:

#### Find (weakly) connected components in the graph ?

I always start with checking how many connected components are there in the graph. Most graph algorithms run best on an connected component, so I usually choose the largest component to run further algorithms on. Check documentation for more.

CALL algo.unionFind.stream('User', 'FRIEND', {}) YIELD nodeId,setId return setId as component, count(*) as size

As we can see there exists only one connected component in this graph. This is very good as there are no disconnected components. Graph algorithms library also support finding strongly connected components with `algo.scc`

. In *undirected* graph every weakly connected component is also strongly connected so we can skip it.

#### Get average number of friends and standard deviation.

MATCH (u:User) return avg(size((u)--())) as average_degree, stdev(size((u)--())) as stdev

A user has on average 43 friends. Standard deviation is quite high so the number of friends is spread out over a wider range of values. We can check now for users with most friends to start looking for influencers in our network.

#### Degree:

MATCH (u:User) RETURN u.id as id,size((u)--()) as degree order by degree desc limit 10

Most basic metric is degree, that we can get using plain cypher. This directly translates to the number of friends a user has. It can be used as a good starting ground to finding influencers in our network. Top 4 users have more than 10x the average friends with user 107 being really high at 1045 friends (26% of all users).

#### Calculate triangle count and clustering coefficient for each node and return total triangle count and average clustering coefficient of the network.

CALL algo.triangleCount('User', 'FRIEND', {concurrency:4, write:true, writeProperty:'triangles', clusteringCoefficientProperty:'coefficient'}) YIELD nodeCount, triangleCount, averageClusteringCoefficient, loadMillis, computeMillis, writeMillis

There are 4039 node with 1612010 triangles in our network. Average clustering is quite high at 0.6055. This is expected as evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterized by a relatively high density of relationships. [1]

## Check user local triangle count and cluster coefficient.

The **local clustering coefficient** of a node in a graph quantifies how close its neighbours are to being a complete graph, where **complete graph** is a simple undirected graph in which every pair of distinct nodes is connected by a unique relationship.[1]

If put simple the clustering coeffient would be 1 when all of my friends knew eachother.

Above algorithms also saved values of individual nodes as properties of nodes, so we can now query the results.

MATCH (u:User) return u.id,u.triangles as triangles, u.coefficient as cluster_coefficient order triangles desc limit 10

User 1912 is a member of more triangles then user 107 , who has the most friends. They both have a small clustering coefficient, which indicates, that their friends don’t really know eachother and they have a great potencial for introducing people to one another. User 1912,107 were third and first by the number of friends, so combined with this results they are solidifying their importance in the network. There is quite a drop in triangle count to third place, while on the other hand their neighbour communities are more tightly knit.

#### Betweenness centrality:

Betweenness centrality is useful in finding nodes that serve as a bridge from one group of users to another in a graph. Consequently, betweenness is a rudimentary measure of the control, that a specific node exerts over the information flow throughout the full graph.

We use `direction:'BOTH'`

for algorithms on an *undirected* graph.

Check documentation for more.

CALL algo.betweenness.stream('User', 'FRIEND', {direction:'BOTH'}) YIELD nodeId,centrality MATCH (u:User) where id(u) = nodeId RETURN u.id, centrality order by centrality desc limit 15

User 107 tops the betweenness centrality results, which is not surprising as he has the most amount of friends with a very low clustering coeffient. So he in the position to act as a broker of informations between different groups of users. He sure does appear as the main influencer in this network based on the metrics we checked. Based on the combined view of centrality metrics we can create a list with users, who are likely to be important in our network and do some further research on them.

## Conclusion:

As you can see graph algorithms plugin greatly improves our ability to analyse the network in Neo4j. You should check them out! If you think some algorithms are still missing or find some bugs please let us know on github or the Neo4j slack community.

[1] https://en.wikipedia.org/wiki/Clustering_coefficient