Neo4j APOC graph algorithms part 1

In the last release of APOC plugin, there are some new graph algorithm, and one of them is a kNN algorithm, which is cool and easy to use. I have created my own kNN euclidian distance algorithm a few months ago with cypher, and yes it worked, but it was slow, because you are basically doing a cartesian product. I was pleasantly surprised how fast apoc version is.

Requirements:

Download the latest APOC release.

Data:

I will use the standard movies data, which I feel is a good benchmarking data, to show what cypher with APOC combined is capable of.

:play movies

Features:

We will use age and count of movies a person worked at as our two dummy features, so we can run some new apoc.algo functions, which provide us with lots of cool algorithms.

MATCH (p:Person) 
set p.age = 2017 - p.born
MATCH (p:Person)
with p,size((p)-->(:Movie)) as s
set p.count = s

Lets draw distributions with spoonJS, which you can easily attach to Neo4j browser. It augments it with chart visualization capacibilities, which are very useful and easy to use.

Age distribution:

We run this query and visualize the results.

//few :Person node have age property null so we must filter them out
MATCH (p:Person) where exists (p.age) 
return distinct(p.age) as age,count(*) as count order by count desc

Screen Shot 2017-04-20 at 22.39.51

Number of movies distribution:

MATCH (p:Person) return distinct(p.count) as number_of_movies,
count(*) as count order by count desc

Screen Shot 2017-04-20 at 22.45.19

As we can easily tell most persons are between 40-60 years old and have played in a movie or two. So now if we want to use both together as a feature, we should use do some sort of normalization.

Normalization:

Version 1:

I came up with a simple function that copies what I understand minmax to do.

//filter out outliers
MATCH (p:Person) where p.age > 25 and p.count < 10
//get the span
WITH max(p.age) - min(p.age) as age_span,max(p.count) - min(p.count) as count_span
WITH toFLOAT(age_span) / toFLOAT(count_span) as coefficient
MATCH (p1:Person)
SET p1.age_nor = p1.age / coefficient

Version 2:

You can also just normalize each feature between 0 and 1.Example for one feature

//filter out outliers
MATCH (p:Person) where p.age > 25 and p.count < 10
//get the the max and min value
WITH max(p.age) as max,min(p.age) as min
MATCH (p1:Person)
//normalize
SET p1.age_nor = (1.0 * p1.age - min) / (max - min)

I used the first version in all the visualizations below.

kNN queries:

Cosine similarity:

MATCH (p1:Person),(p2:Person) where id(p1) < id(p2) and exists(p1.age) and exists(p2.age)
WITH p1,p2,apoc.algo.cosineSimilarity([p1.count,p1.age_nor],[p2.count,p2.age_nor]) as value
MERGE (p1)-[s:SIMILARITY]-(p2)
SET s.cosine = value

Query for distribution:

We need to bucketize a bit for a better visualization.

match ()-[s:SIMILARITY]->()
WITH distinct(s.cosine) as cosine
return round(cosine * 100),count(*)

Screen Shot 2017-04-20 at 23.56.07

Euclidian distance:

MATCH (p1:Person),(p2:Person) where id(p1) < id(p2) and exists(p1.age) and exists(p2.age)
WITH p1,p2,apoc.algo.euclideanDistance([p1.count,p2.age_nor],[p2.count,p2.age_nor]) as value
MERGE (p1)-[s:SIMILARITY]-(p2)
SET s.e_distance = value

Screen Shot 2017-04-20 at 23.44.33.png

Euclidian similarity:

MATCH (p1:Person),(p2:Person) where id(p1) < id(p2) 
and exists(p1.age) and exists(p2.age)
WITH p1,p2,apoc.algo.euclideanSimilarity([p1.count,p2.age_nor],[p2.count,p2.age_nor]) as value
MERGE (p1)-[s:SIMILARITY]-(p2)
SET s.e_similarity = value

Screen Shot 2017-04-20 at 23.49.23

Conclusion:

With the help of APOC and spoonJS we can easily run graph algorithms and quickly visualize results, to help us get a feeling how the data looks like. You do not need any external tools for simple chart visualizations , which is pretty amazing. APOC holds more graph algorithms, so the next blog will probably come soon. Stay tuned

Advertisements

One thought on “Neo4j APOC graph algorithms part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s