Thursday, February 7, 2013

Hadoop 20S VM

1 Install VM player
- www.vmware.com

2 Copy Hadoop 0.20.S
- copy the Hadoop 0.20.S VM into a location on hard drive

3 VM User Account
id: hadoop-***
psw: ****

4 Get authentication token
kinit
psw: hadoop******

5 Shutting down the VM
sudo poweroff
--View the hadoop commands 
hadoop
--View the version of hadoop 
hadoop version

Tuesday, February 5, 2013

R with Hadoop

1 R + Streaming
 With this approach, you use MapReduce to execute R scripts in the map and reduce phrases.
The R package needs to be installed on each Data-Node, but packages are available on pubicly available Yum repositories for easy installation.

2 RHipe
 Rhipe is an open source priject which allows MapReduce to be closely integrated with R on the client side.An R package that integrates the R environment with Hadoop, the open source implementation of Google’s MapReduce. 
Using Rhipe, it is possible to write MapReduce algorithms in R, launch and monitor MapReduce jobs from R and interact with the HDFS.
R must be installed on each Data-Node, in conjunction with Protocal Buffers, and Rhipe itself. 

3 RHadoop
RHadoop like Rhipe, provides an R wrapper around Map-Reduce so that they can be seamlessly integrated on the client side.
R must be installed on each Data-Node, and RHadoop has dependencies on other R packages. But these packages can be installled with CRAN, and the RHadoop installlation,, while not via CRAN, is straight-forward.

4 RHive
RHive is an R extension facilitating distributed computing via HIVE query. It provides an easy to use HQL like SQL and R objects and functions in HQL. It requires Hadoop core and Hive system.

5 Segue
An R language segue into parallel processing on Amazon’s Web Serives (in the cloud). Not a full map/reduce framework for R. Currently runs on Mac or Linux.