GraphDB

Linkedin Graph Search using Neo4j - Work in Progress

Overview

Facebook Graph Search is nice, it still lacks many desirable features, but its a good start. While its good to find Friends who went to same school, or lived in same city, or have same interests, the real power of Graph Search would be when it is applied to site like Linkedin.

At the time of writing this blog, I will be honest, i have not done research of any similar tools already present for LinkedIn. I am more interested in understanding Graph Database and how to create my own Facebook Graph Search.

Special Note *

This is a work in Progress and this blog was written to capture my thinking at this point of time. So far I have written a Node Script to pull data from Linkedin and currently working on loading the data into Neo4J

Step 1 - Pull Data from LinkedIn

If we have to pull basic data about a person or his/her connection, it would appear somewhat like the following.

{
    firstName: 'Rohit',
    headline: 'Director of Engineering - New Technologies',
    id: 'ovX6AJPGnk',
    industry: 'Information Technology and Services',
    lastName: 'Ghatol',
    location: {
        country: {
            code: 'us'
        },
        name: 'Houston, Texas Area'
    },
    numConnections: 500,
    numConnectionsCapped: true,
    positions: {
        _total: 16,
        values: [{
            company: {
                id: 65662,
                industry: 'Computer Software',
                name: 'Synerzip Softech',
                size: '201-500 employees',
                type: 'Privately Held'
            },
            id: 27449797,
            isCurrent: true,
            startDate: {
                month: 7,
                year: 2008
            },
            summary: 'Define Technical Direction of the Company. Develop key skills for bleeding edge Technology and lead the clients/prospects on choices on Technology and Architecture Front.\n\nCreate more leaders in the Organization to expand the Company',
            title: 'Director of Engineering'
        },..,..,..]
    },
    siteStandardProfileRequest: {
        url: 'http://www.linkedin.com/profile/view?id=18377511&authType=name&authToken=**&trk=******'
    }
}

 

Step 2 - Present Data in Graph Database

The above data is very rich, cause it had relationship with in itself. And think about now adding all your connections in that as well.

In a Graph Database like Neo4J, You have Nodes and Nodes are connected by Relationships. Both Node and Relationship can have properties.

Here is a rough representation of People, Industry, Company, 500+ Group etc in a Entity like Diagram. (Note Entity like diagrams can not represent Neo4J structure as Relationship also have properties)

[singlepic id=75 w=800 h=600 float=]

Current Progress

While the above relationship is ideal, so far I am able to load up some 30-40 connections with following relationship

[singlepic id=78 w=400 h=300 float=]

Neo4j Dashboard

[singlepic id=76 w=800 h=600 float=]

Neo4j Data Browser

[singlepic id=77 w=800 h=600 float=]

Indexes

Please note while we loaded the information in Neo4J we used some indexes to make it easier for us to fetch data back.

There are two indexes for 2 different type of nodes

  1. industry - As the name suggests it indexes industry by its name
  2. person - As the name suggests it indexes a person but by its id

There is another index for relationship, we call it

  1. knowledge

In short think of this as follows in simple english (although it would have been better if we called it belongsTo. :)

[person]-[knowledge]->[industry]

[singlepic id=79 w=800 h=600 float=]

Step 3 - Run Queries on Neo4J Using Cypher

Now that partial data is loaded we can run some basic cypher queries on Neo4j. Since we don't have all relationships we can only do so much.

Basic Queries

Search for the Industry

[singlepic id=80 w=800 h=600 float=]

Advanced Queries

Search for all person belonging to "Information Technology and Services" Industry

[singlepic id=81 w=800 h=600 float=]

Search for all person belonging to "Information Technology and Services" Industry who have less than 200 connections

[singlepic id=82 w=800 h=600 float=]

Search for all person who belong to same industry which "Rohit" belongs to

[singlepic id=83 w=800 h=600 float=]

Search for all person who belong to same industry which "Rohit" belongs to and have 500+ connections

[singlepic id=84 w=800 h=600 float=]

Future Possibilities

Now if we are able to get the following graph loaded into Neo4J even for one person the numerous possibilities of traversing the graph open up. Add to it if we are able to capture information in Neo4J not from one person but from an Organization perspective, the power of Graph Search will open new frontiers.

[singlepic id=75 w=800 h=600 float=]

Remember, this is only partial data loaded from Linkedin. Imagine if we are able to load a detailed profile (breaking it down into further relationships) and augment by adding other information to this Graph e.g

  1. Blog Feeds
  2. Twitter Feeds
  3. Facebook Feeds
  4. Yammer Feeds
  5. etc

some how managing relationships between them.

 

Disclaimer * - While I say the above about doing this on an Organization level. This is simply a what if scenario to see what can be done. The intension of this blog post is not to call an all out war on privacy and morality :).

Any one's who name appears in this blog is result of Linkedin Connections. If any one has any objections about it, please feel free to contact me and we will works things out.