Hadoop

Setting up Hue on Amazon AWS

Amazon AWS now supports Hue. This blog explains the steps required to set up and access Hue on Amazon EMR Step 1 - Create Amazon EMR Cluster with Hue application selected

Hue-1
Hue-1
Hue-2
Hue-2
Hue-3
Hue-3

Step 2 - Once you start the Cluster, click on "Enable Web Connection", setup ssh tunnel and web proxy to access Hue

Hue-4
Hue-4
Hue-5
Hue-5
Hue-11
Hue-11
Hue-6
Hue-6
Hue-7
Hue-7
Hue-8
Hue-8

Step 3 - After installing and configuring FoxyProxy access the hue web and set your login and password. Now access Hue on EMR

Hue-9
Hue-9
Hue-10
Hue-10

Step 4 -  Run Hive Query

Choose the AWS Sample: ELB access logs project and then Execute the Hive query

Hive Query
Hive Query
Hive Query Result
Hive Query Result

Install Sqoop on Amazon EMR

Overview

This is a detailed Tutorial on how to install Sqoop on Amazon EMR Cluster and import data from MySQL Database to Amazon S3 Bucket

Tutorial Video

https://www.youtube.com/watch?v=3YJwDJOyDE0

Prerequisites

Download the following files and upload them to your S3 Bucket

  1. Sqoop Binary - http://archive.apache.org/dist/sqoop/1.4.4/sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
  2. MySQL JDBC Connector - http://dev.mysql.com/downloads/connector/j/5.1.html

Install-Sqoop.sh

#!/bin/bash

cd /home/hadoop
hadoop fs -copyToLocal s3://synerzip-sqoop-scripts/sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
tar -xzf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
hadoop fs -copyToLocal s3://synerzip-sqoop-scripts/mysql-connector-java-5.1.33.tar.gz mysql-connector-java-5.1.33.tar.gz
tar -xzf mysql-connector-java-5.1.33.tar.gz
cp mysql-connector-java-5.1.33/mysql-connector-java-5.1.33-bin.jar sqoop-1.4.4.bin__hadoop-2.0.4-alpha/lib/

Ensure no CRLF characters are present in the file

Sqoop-Import-all.sh

 

#!/bin/bash

cd /home/hadoop/sqoop-1.4.4.bin__hadoop-2.0.4-alpha/bin

./sqoop import --connect jdbc:mysql://db.c5zzejm1gdnx.us-west-1.rds.amazonaws.com/test --username root --password password
--table User_Profile --target-dir s3://synerzip-imported-data/User_Profile-`date +"%m-%d-%y_%T"`

Ensure no CRLF characters are present in the file

 

Steps

Step 1

Slide01

Step 2

Slide02

Step 3

Slide03

Step 4

Slide04

Step 5

Slide05

Step 6

Slide06

Step 7

Slide07

Step 8

Slide08

Step 9

Slide09

Step 10

Slide10

Step 11

Slide11

Step 12

Slide12

Step 13

Slide13

Step 14

Slide14