Depth-first search analysis on London Tube network with Neo4j graph algorithms

The Depth-first search algorithm is an algorithm used to search or traverse through the graph. It is a basis for many other graph algorithms but was not directly exposed in Neo4j graph algorithms until recently.

Here is a visual representation of the order in which the Depth-first search algorithm traverses a graph.

390px-depth-first-tree.svg_

Image source: Wikipedia

Graph schema

The graph has a very simple schema. It consists of nodes labeled Station that have relationships with a type CONNECTION to other stations.

We will treat the relationships in our graph as undirected or bidirectional.

schema

Define the constraint and index to optimize all further queries.

CREATE CONSTRAINT ON (s:Station) ASSERT s.id is unique;

CREATE INDEX ON :Station(name);

Import

We will use the London tube dataset made available by Nicola Greco.

Import all nodes (stations with belonging metadata)

LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/nicola/tubemaps/master/datasets/london.stations.csv" as row 
MERGE (s:Station{id:row.id}) 
SET s += apoc.map.clean(row,['id'],[])

Import all relationships (lines between stations)

LOAD CSV WITH HEADERS FROM 
"https://raw.githubusercontent.com/nicola/tubemaps/master/datasets/london.connections.csv" as row 
MATCH (s1:Station{id:row.station1}),(s2:Station{id:row.station2}) 
MERGE (s1)-[:CONNECTION{time:toInteger(row.time),line:row.line}]->(s2)

Analysis

We will focus on the Picadilly Circus station in this demonstration. It will be used as the starting node for the Depth-first search algorithm.

Let’s start with visualizing the Picadilly Circus neighbor stations in Neo4j Browser.

MATCH p=(s:Station{name:"Picadilly Circus"})-[*..3]-()
RETURN p

Results

picadilly

Depth-first search algorithm

The Depth-first search algorithm in Neo4j supports three different exit strategies that stop the algorithm from further traversal through the graph. This is to avoid unnecessary traversal of the whole graph.

To treat relationships as undirected we use value ‘BOTH’ for parameter direction.

Max depth strategy

As the name suggests the algorithm stops traversing the graph once it reaches the max depth specified.

MATCH (s:Station{name:'Picadilly Circus'})
WITH id(s) as start_id 
CALL algo.dfs.stream('Station', 'CONNECTION', 'BOTH', start_id, 
  {maxDepth:2})
YIELD nodeIds
UNWIND nodeIds as nodeId
RETURN algo.asNode(nodeId).name as station

Results

station
“Picadilly Circus”
“Charing Cross”
“Embankment”
“Oxford Circus”
“Regent’s Park”
“Tottenham Court Road”
“Warren Street”
“Bond Street”
“Leicester Square”
“Covent Garden”
“Green Park”
“Hyde Park Corner”
“Victoria”

 

Target nodes strategy

The algorithm keeps on traversing the graph until it reaches any of the target nodes specified.

The targetNodes parameter expects Neo4j internal ids of the nodes to be provided.

MATCH (start:Station{name:'Picadilly Circus'}),
      (target:Station{name:'Canada Water'}) 
CALL algo.dfs.stream('Station', 'CONNECTION', 'BOTH', id(start), 
  {targetNodes:[id(target)]}) YIELD nodeIds 
UNWIND nodeIds as nodeId 
RETURN algo.asNode(nodeId).name as station

Results

station
“Picadilly Circus”
“Charing Cross”
“Embankment”
“Waterloo”
“Lambeth North”
“Elephant & Castle”
“Borough”
“London Bridge”
“Bermondsey”
“Canada Water”

Max cost strategy

The max cost strategy expects relationships to be weighted. It finds all the nodes in the graph that are less than or equal to maxCost parameter distanced from the starting node.

In this case, we use the time attribute of relationships as weights.

MATCH (start:Station{name:'Picadilly Circus'})
CALL algo.dfs.stream('Station', 'CONNECTION', 'BOTH', id(start), 
  {maxCost:3, weightProperty:'time'})
YIELD nodeIds
UNWIND nodeIds as nodeId
RETURN algo.asNode(nodeId).name as station

Results

station
“Picadilly Circus”
“Charing Cross”
“Embankment”
“Oxford Circus”
“Bond Street”
“Leicester Square”
“Covent Garden”
“Green Park”

Conclusion

The Depth-first search algorithm is a nice addition to the Neo4j graph algorithms library and can be used in many scenarios. We can also use more than one exit strategy at once if we would want to.

Register now for your copy of the O’Reilly book, Graph Algorithms: Practical Examples in Apache Spark and Neo4j by Mark Needham and Amy E. Hodler.

Social-AlgoTwitter2 .png

2 thoughts on “Depth-first search analysis on London Tube network with Neo4j graph algorithms

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s