Exact betweenness centrality is computationally intensive and isn’t practical for (near) real-time calculation on large graphs. Because of this approximation algorithms of betweenness centrality were developed to allow for a faster calculation.

We will use Twitter dataset made available by Stanford University to demonstrate how to approximate betweenness centrality with Neo4j graph algorithms library.

## Import

First define the graph schema.

`CREATE CONSTRAINT ON (u:User) ASSERT u.id IS UNIQUE;`

There are a total of 1,8 million relationships in the dataset, so we will import only a single egonet to keep the graph small for the example.

LOAD CSV FROM "file:///61086747.edges" as row fieldterminator " " MERGE (u1:User{id:row[0]}) MERGE (u2:User{id:row[1]}) MERGE (u1)-[:FOLLOW]->(u2);

## Betweenness Centrality

Betweenness centrality is a way of detecting the amount of influence a node has over the flow of information in a graph. It is often used to find nodes that serve as a bridge from one part of a graph to another.

*Taken from documentation.*

Run the original Brandes betweenness centrality for future reference.

CALL algo.betweenness.stream('User','FOLLOW') YIELD nodeId, centrality RETURN nodeId,centrality ORDER BY centrality desc limit 10;

## The RA-Brandes algorithm

RA-Brandes algorithm is different from original Brandes’ algorithm in only one main respect: Brandes’ algorithm considers all nodes within the graph, whereas RA-Brandes considers only a subset of nodes from the graph.

*Taken from documentation.*

There are two strategies for selecting the subset of nodes.

Default strategy is the “random” strategy where nodes are selected at random with defined probability of selection. If we set probability to one it behaves as exact betweenness centrality where all of the nodes are selected and considered in calculation of the centrality values.

Second strategy is the “dense” strategy where first the mean degree of nodes is calculated and then only nodes with degree higher than the mean are selected.

Run the RA-Brandes algorithm with parameter `probability : 1`

to validate above statement that the two algorithms are fundamentally the same and deliver the same results if all nodes are selected.

CALL algo.betweenness.sampled.stream('User','FOLLOW', {probability:1}) YIELD nodeId, centrality RETURN nodeId,centrality order by centrality desc limit 10;

### Approximation of Betweenness Centrality

If we set the `probability : 0.1`

we will consider only one tenth of nodes at random in the calculation of betweenness centrality. You could experiment with lower values of probability and compare results.

CALL algo.betweenness.sampled.stream('User','FOLLOW', {probability:0.1}) YIELD nodeId, centrality RETURN nodeId,centrality ORDER BY centrality DESC LIMIT 10;

### Ego networks

Ego network has one “ego” node that is in the center of the network. Neighbours are connected to it with a relationship and are called “altar” nodes.

For more details on ego networks check out the following YT video.

First step in calculating betweenness centrality is running the shortest path algorithm. Parameter `maxDepth`

limits the depth of traversal used by the shortest path algorithm. This effectively allows us to run betweenness centrality on ego networks where for each node we only load its neighbours or neighbours x hops away when calculating betweenness centrality.

CALL algo.betweenness.sampled.stream('User','FOLLOW', {probability:1,maxDepth:1}) YIELD nodeId, centrality RETURN nodeId,centrality ORDER BY centrality DESC LIMIT 10;

## Conclusion

Now that we learned how to approximate betweenness centrality with Neo4j graph algorithms we can import the whole 1.8 million relationships into our graph and play around with different strategies and traversal limits to optimize the approximation of betweenness centrality.