Cloud

Linkedin Graph Search using Neo4j - Work in Progress

Overview

Facebook Graph Search is nice, it still lacks many desirable features, but its a good start. While its good to find Friends who went to same school, or lived in same city, or have same interests, the real power of Graph Search would be when it is applied to site like Linkedin.

At the time of writing this blog, I will be honest, i have not done research of any similar tools already present for LinkedIn. I am more interested in understanding Graph Database and how to create my own Facebook Graph Search.

Special Note *

This is a work in Progress and this blog was written to capture my thinking at this point of time. So far I have written a Node Script to pull data from Linkedin and currently working on loading the data into Neo4J

Step 1 - Pull Data from LinkedIn

If we have to pull basic data about a person or his/her connection, it would appear somewhat like the following.

{
    firstName: 'Rohit',
    headline: 'Director of Engineering - New Technologies',
    id: 'ovX6AJPGnk',
    industry: 'Information Technology and Services',
    lastName: 'Ghatol',
    location: {
        country: {
            code: 'us'
        },
        name: 'Houston, Texas Area'
    },
    numConnections: 500,
    numConnectionsCapped: true,
    positions: {
        _total: 16,
        values: [{
            company: {
                id: 65662,
                industry: 'Computer Software',
                name: 'Synerzip Softech',
                size: '201-500 employees',
                type: 'Privately Held'
            },
            id: 27449797,
            isCurrent: true,
            startDate: {
                month: 7,
                year: 2008
            },
            summary: 'Define Technical Direction of the Company. Develop key skills for bleeding edge Technology and lead the clients/prospects on choices on Technology and Architecture Front.\n\nCreate more leaders in the Organization to expand the Company',
            title: 'Director of Engineering'
        },..,..,..]
    },
    siteStandardProfileRequest: {
        url: 'http://www.linkedin.com/profile/view?id=18377511&authType=name&authToken=**&trk=******'
    }
}

 

Step 2 - Present Data in Graph Database

The above data is very rich, cause it had relationship with in itself. And think about now adding all your connections in that as well.

In a Graph Database like Neo4J, You have Nodes and Nodes are connected by Relationships. Both Node and Relationship can have properties.

Here is a rough representation of People, Industry, Company, 500+ Group etc in a Entity like Diagram. (Note Entity like diagrams can not represent Neo4J structure as Relationship also have properties)

[singlepic id=75 w=800 h=600 float=]

Current Progress

While the above relationship is ideal, so far I am able to load up some 30-40 connections with following relationship

[singlepic id=78 w=400 h=300 float=]

Neo4j Dashboard

[singlepic id=76 w=800 h=600 float=]

Neo4j Data Browser

[singlepic id=77 w=800 h=600 float=]

Indexes

Please note while we loaded the information in Neo4J we used some indexes to make it easier for us to fetch data back.

There are two indexes for 2 different type of nodes

  1. industry - As the name suggests it indexes industry by its name
  2. person - As the name suggests it indexes a person but by its id

There is another index for relationship, we call it

  1. knowledge

In short think of this as follows in simple english (although it would have been better if we called it belongsTo. :)

[person]-[knowledge]->[industry]

[singlepic id=79 w=800 h=600 float=]

Step 3 - Run Queries on Neo4J Using Cypher

Now that partial data is loaded we can run some basic cypher queries on Neo4j. Since we don't have all relationships we can only do so much.

Basic Queries

Search for the Industry

[singlepic id=80 w=800 h=600 float=]

Advanced Queries

Search for all person belonging to "Information Technology and Services" Industry

[singlepic id=81 w=800 h=600 float=]

Search for all person belonging to "Information Technology and Services" Industry who have less than 200 connections

[singlepic id=82 w=800 h=600 float=]

Search for all person who belong to same industry which "Rohit" belongs to

[singlepic id=83 w=800 h=600 float=]

Search for all person who belong to same industry which "Rohit" belongs to and have 500+ connections

[singlepic id=84 w=800 h=600 float=]

Future Possibilities

Now if we are able to get the following graph loaded into Neo4J even for one person the numerous possibilities of traversing the graph open up. Add to it if we are able to capture information in Neo4J not from one person but from an Organization perspective, the power of Graph Search will open new frontiers.

[singlepic id=75 w=800 h=600 float=]

Remember, this is only partial data loaded from Linkedin. Imagine if we are able to load a detailed profile (breaking it down into further relationships) and augment by adding other information to this Graph e.g

  1. Blog Feeds
  2. Twitter Feeds
  3. Facebook Feeds
  4. Yammer Feeds
  5. etc

some how managing relationships between them.

 

Disclaimer * - While I say the above about doing this on an Organization level. This is simply a what if scenario to see what can be done. The intension of this blog post is not to call an all out war on privacy and morality :).

Any one's who name appears in this blog is result of Linkedin Connections. If any one has any objections about it, please feel free to contact me and we will works things out.

Getting started with deploying SPA, Node and Mongo Apps on Heroku - Developer Focused

Overview

First of all this blog is developer focused, hence I avoid going into details like Heroku for production. The purpose is to help developers quickly deploy apps on heroku for testing

What is Heroku?

Heroku is a Cloud Application Platform. It enables you to quickly and painlessly deploy your Applications on the cloud.

[singlepic id=59 w=800 h=600 float=]

Heroku supports following languages

[singlepic id=63 w=600 h=400 float=]

Why use Heroku?

In today's world a developer's Resume is really the following

Linkedin Profile Pointing to the following

    1. Your Opinions - e.g Blog
    2. Your Ideas - e.g Twitter
    3. Your Skills and Accomplishments as Ployglot programmer- Series of Hosted Applications on the cloud
    4. Online Source Code e.g github.com
    5. Online Forum Activity and Scores e.g StackOverflow
    6. Online Presentations e.g Slide share
    7. Online Videos e.g Youtube

Now lets focus on the part in Red - Series of Hosted Applications on the cloud

A Developer writing, testing and showcasing application on the cloud has to as easy as he/she does this on his/her laptop. And Developers love

  1. Free Quotas on Servers, Database and Hosting
  2. Command line tools to work on cloud as they are working locally
  3. Popular AddOns e.g for Node easy and ready addons like Mongo, Redis, etc makes lot of sense
  4. Use known tools to deploy the app

Well Heroku is built thinking about all these things.

Getting Started Guide Available for Popular Languages

[singlepic id=62 w=600 h=400 float=]

Command Line Tool Available to deploy apps and check logs and status

[singlepic id=60 w=600 h=400 float=]

Easy to Add Free Quota based Addons

[singlepic id=61 w=600 h=400 float=]

How to use Heroku?

Enough of Bluff, now lets move to Stuff

Steps

  1. Create Heruko Account & Install Command Line Tool
  2. Clone existing Node, SPA (Single Page Application - HTML/JS), MongoDB locally
  3. Run the existing Node, SPA, MongoDB Application locally
  4. Run the existing Node, SPA, MongoDB Application locally using Heroku
  5. Run the existing Node, SPA, MongoDB Application on cloud using Heroku

Lets get Started

Create Heruko Account & Install Command Line Tool

Step 1

Goto https://id.heroku.com/signup/devcenter and sign up

Step 2

Install Heroku Toolbet from https://toolbelt.heroku.com/

Step 3

Ensure Heroku is properly installed by trying heroku tool on command prompt

$> heroku

Clone existing Node, SPA, MongoDB locally

SPA stands for Single Page Application typically built using HTML5, JavaScript and CSS3

We will refer to an already existing Todo MVC from this source - https://github.com/Stackato-Apps/node-backbone-mongo

* Todo MVC - An App which demonstrates how to create a simple Todo App, in various technology stacks. This is the new Hello World

Prerequistes

  1. git command line client is installed
  2. node and npm are installed
  3. mongodb 2.4.x downloaded, unzipped and in path
  4. configure git and npm to work around any firewall restrictions you have

Step 1

Clone the example from github

$>git clone https://github.com/Stackato-Apps/node-backbone-mongo.git

Run the existing Node, SPA, MongoDB Application locally

Step 1

Start your mongodb database

$>mongod

Verify whether mongodb is running by trying following command on a different command prompt

$>mongo

* mongod is the mongo daemon/server and mongo is the client to connect to the server to try few mongo commands. Do not close the command prompt running mongod

 Step 2

Fetch all the dependencies

$>cd node-backbone-mongo

Download all dependencies

$>npm install

Step 3

Start the application

$>node app.js

Open Browser - http://localhost:3000

[singlepic id=64 w=800 h=600 float=]

 

Run the existing Node, SPA, MongoDB Application locally using Heroku

Step 1

Login into Heroku using command prompt tool

$>cd node-backbone-mongo

$>heroku login

Step 2

Create Application on Heroku Cloud using command prompt Tool

$>cd node-backbone-mongo

$>heroku apps:create todomvc-trial

* the app name is unique across heroku, try some other unique name

Step 3

Create a Procfile

$>cd node-backbone-mongo

$>echo "web: node app.js" > Procfile

* Basically use any text editor and create a Procfile with following text

web: node app.js

Step 4

Run App locally using Heroku

The tool heroku uses to run the app locally is called as foreman and the it reads the Procfile to find out what it needs to run

$>cd node-backbone-mongo

$>foreman start

Start browser - http://localhost:5000/

[singlepic id=65 w=800 h=600 float=]

Run the existing Node, SPA, MongoDB Application on cloud using Heroku

Lets first of all foresee what problems we will face if we run this application as it is on Cloud

  1. It assumes there is a MongoDB server running on localhost at predefined port
  2. It assumes it will run on port 5000 - Not to a good port to run web app on

For here onwards there are 3 steps

  1. Add a Mongo DB add on to your Heroku App
  2. Make Code Changes to get correct MongoDB Url and Port Name while running App on the Cloud
  3. Push the App to Heroku (we are running it locally so far)

Add a Mongo DB add on to your Heroku App

Heroku has concept of add ons, which you can add to your existing apps. We will see how to add these from Command Prompt

Step 1

Find the Addon and Command for it

[singlepic id=72 w=800 h=600 float=]

[singlepic id=71 w=800 h=600 float=]

[singlepic id=70 w=800 h=600 float=]

Step 2

Run the Command

$> heroku addons: add mongolab

[singlepic id=69 w=800 h=600 float=]

Step 3

Verify the effects of the Command

[singlepic id=68 w=800 h=600 float=]

Optional

You can proceed and look at the database if you want

[singlepic id=67 w=800 h=600 float=]

 

Make Code Changes

We need to make code changes so our Application use the correct information for the following

  1. Which MongoDB to connect when on cloud and when running locally
  2. Which Port to run the Web Server on when on cloud and when running locally

Lets Open app.js and see what code we need to change

[singlepic id=73 w=800 h=600 float=] Orignal Code

var port = process.env.VCAP_APP_PORT || 3000;

if(process.env.VCAP_SERVICES){
  var services = JSON.parse(process.env.VCAP_SERVICES);
  var dbcreds = services['mongodb'][0].credentials;
}

if(dbcreds){
  console.log(dbcreds);
  mongoose.connect(dbcreds.host, dbcreds.db, dbcreds.port, {user: dbcreds.username, pass: dbcreds.password});
}else{
  mongoose.connect("127.0.0.1", "todomvc", 27017);
}

Modified Code

Heroku has concept of Environment Variables. You need to see things as follows

  1. When you are running app locally using foreman, your code will have access to locally declared environment variables. e,g Port and Mongo DB Urls can come from environment variables or can be hard coded (as we have done)
  2. When you are running app in the cloud using heroku, its running on a server which has its own set of Environment variables. When we added the Mongo DB AddOn, heroku added an Environment variable named "MONGOLAB_URI"

JavaScript has a very easy way of doing if else (for fallback scenarios)

var port = process.env.PORT || 5000; really means if you can find process.env.PORT value (in short read PORT environment variable) then use if , if you don't find it then fallback to 5000

So on the cloud port is 80 and on your local machine it is 5000

Same thing applies to uristring, locally it falls to a locally running mongodb, on the cloud it uses one of the Add On Mongo DB (provided you have added them)

var port = process.env.PORT || 5000;

var uristring =
    process.env.MONGOLAB_URI ||
        process.env.MONGOHQ_URL ||
        'mongodb://localhost/todomvc';

mongoose.connect(uristring, function (err, res) {
    if (err) {
        console.log ('ERROR connecting to: ' + uristring + '. ' + err);
    } else {
        console.log ('Succeeded connected to: ' + uristring);
    }
});

* MONGOLAB_URI and MONGOHQ_URL - Well Heroku provides two addons for Mongo DB, depending on which one you choose heruko will declare one of the above Environment variable on the server/cloud machine where it runs your code

 

Push the App to Heroku

Now the final Step make this app run on the cloud. Well this part can not get any simpler

Heroku pushes code from local machine to its cloud using git mechanims. Its like checking in your code and the server checks it out and runs it (like Continuos build cycles)

Got thing with source code versioning tools like git is that it is a 2 step process

  1. Local Commit for tracking
  2. Push Changes to Remote Repository

Now we said Heroku uses git for pushing code to the cloud, this means your code has to be in a git repository

  1. Either Locally declared git repository
  2. OR cloned git repository as we are doing

 

First step carefully read the following part

 

Recall We did git clone from github

Unlike SVN or CVS, in git, we clone the entire repository on client machine and all the code, history is available. These cloned repository know where they can push changes, in this case the original github repository

Further Recall we did $>heroku apps:create

When we did $>heroku apps:create, it added a remote repository entry in our local git repository (to be precise with the name heroku)

Now when you commit changes locally and the local git repository goes out of sync (or advances) we have an option of pushing those changes to not only the

  1. Original github repository where we cloned from
  2. But also to Heroku cloud

Actual Steps to push code to Heroku

We have made code changes in app.js

Step 1

Commit changes to local repository

$>git commit -a -m "Code changes to make code run on Heroku"

Step 2

Push changes to Heroku

$>git push heroku master

You will see a long log here, which will show you what Heroku is doing on the server when it receives the latest code

Step 3

Access the App running on the cloud

$>heroku open

[singlepic id=74 w=800 h=600 float=]

 

Quick Recap

Following are the commands we used

$>git clone https://github.com/Stackato-Apps/node-backbone-mongo.git

$>heroku login

$>heroku apps:create todomvc-trial

$> heroku addons: add mongolab

$>git commit -a -m "Code changes to make code run on Heroku"

$>git push heroku master

Special Notes

In case you want to link an existing code (which is part of some git repo) with already existing heroku app, you can do the following

 

$ heroku git:remote -a todomvc-trial