How to Install Hadoop in Ubuntu

To Install Hadoop in Ubuntu

Hadoop is a Java-based programming framework that supports the storage of large data-sets on a cluster. This article shows the installation process of Hadoop in Ubuntu.


Installation of Hadoop

Initially, you need to update your system with the following command.

root@linuxhelp1:~# apt-get update
Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [94.5 kB]
Hit:2 http://in.archive.ubuntu.com/ubuntu xenial InRelease
Get:3 http://in.archive.ubuntu.com/ubuntu xenial-updates InRelease [95.7 kB]        
Get:4 http://in.archive.ubuntu.com/ubuntu xenial-backports InRelease [92.2 kB]       
Get:5 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [163 kB]
Get:6 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [426 kB]
Get:7 http://security.ubuntu.com/ubuntu xenial-security/main i386 Packages [159 kB]
.
.
.
Get:20 http://in.archive.ubuntu.com/ubuntu xenial-updates/universe i386 Packages [359 kB]                                           
Get:21 http://in.archive.ubuntu.com/ubuntu xenial-updates/universe Translation-en [131 kB]                                          
Get:22 http://in.archive.ubuntu.com/ubuntu xenial-backports/main amd64 DEP-11 Metadata [195 B]                                      
Get:23 http://in.archive.ubuntu.com/ubuntu xenial-backports/universe amd64 DEP-11 Metadata [200 B]                                  
Fetched 3,371 kB in 18s (181 kB/s)                                                                                                  
Reading package lists... Done

Then install OpenJDK, whcih is the default Java Development Kit in Ubuntu machine.

root@linuxhelp1:~# apt-get install default-jdk -y
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  ca-certificates-java default-jdk-headless default-jre default-jre-headless fonts-dejavu-extra java-common libbonobo2-0
  libbonobo2-common libgif7 libgnome-2-0 libgnome2-common libgnomevfs2-0 libgnomevfs2-common libice-dev liborbit-2-0
  libpthread-stubs0-dev libsm-dev libx11-dev libx11-doc libxau-dev libxcb1-dev libxdmcp-dev libxt-dev openjdk-8-jdk
  openjdk-8-jdk-headless openjdk-8-jre openjdk-8-jre-headless x11proto-core-dev x11proto-input-dev x11proto-kb-dev
  xorg-sgml-doctools xtrans-dev
Suggested packages:
  default-java-plugin libbonobo2-bin desktop-base libgnomevfs2-bin libgnomevfs2-extra gamin | fam gnome-mime-data libice-doc
  libsm-doc libxcb-doc libxt-doc openjdk-8-demo openjdk-8-source visualvm icedtea-8-plugin openjdk-8-jre-jamvm fonts-ipafont-gothic
  fonts-ipafont-mincho ttf-wqy-microhei | ttf-wqy-zenhei fonts-indic
The following NEW packages will be installed:
  ca-certificates-java default-jdk default-jdk-headless default-jre default-jre-headless fonts-dejavu-extra java-common
  libbonobo2-0 libbonobo2-common libgif7 libgnome-2-0 libgnome2-common libgnomevfs2-0 libgnomevfs2-common libice-dev liborbit-2-0
  libpthread-stubs0-dev libsm-dev libx11-dev libx11-doc libxau-dev libxcb1-dev libxdmcp-dev libxt-dev openjdk-8-jdk
  openjdk-8-jdk-headless openjdk-8-jre openjdk-8-jre-headless x11proto-core-dev x11proto-input-dev x11proto-kb-dev
  xorg-sgml-doctools xtrans-dev
0 upgraded, 33 newly installed, 0 to remove and 429 not upgraded.
Need to get 41.4 MB of archives.
After this operation, 169 MB of additional disk space will be used.
Get:1 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 libbonobo2-common all 2.32.1-3 [34.7 kB]
Get:2 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 liborbit-2-0 amd64 1:2.14.19-1build1 [140 kB]
Get:3 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 libbonobo2-0 amd64 2.32.1-3 [211 kB]
Get:4 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 java-common all 0.56ubuntu2 [7,742 B]
Get:5 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 default-jre-headless amd64 2:1.8-56ubuntu2 [4,380 B]
Get:6 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 ca-certificates-java all 20160321 [12.9 kB]
Get:7 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openjdk-8-jre-headless amd64 8u111-b14-2ubuntu0.16.04.2 [26.9 MB]
Get:8 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libgif7 amd64 5.1.4-0.3~16.04 [30.5 kB]                         
Get:9 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openjdk-8-jre amd64 8u111-b14-2ubuntu0.16.04.2 [69.4 kB]        
Get:10 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 default-jre amd64 2:1.8-56ubuntu2 [980 B]                           
.
.
.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jdb to provide /usr/bin/jdb (jdb) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/serialver to provide /usr/bin/serialver (serialver) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/wsgen to provide /usr/bin/wsgen (wsgen) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jcmd to provide /usr/bin/jcmd (jcmd) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jarsigner to provide /usr/bin/jarsigner (jarsigner) in auto mode
Setting up default-jdk-headless (2:1.8-56ubuntu2) ...
Setting up openjdk-8-jdk:amd64 (8u111-b14-2ubuntu0.16.04.2) ...
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/appletviewer to provide /usr/bin/appletviewer (appletviewer) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jconsole to provide /usr/bin/jconsole (jconsole) in auto mode
Setting up default-jdk (2:1.8-56ubuntu2) ...
Processing triggers for libc-bin (2.23-0ubuntu3) ...

Once the installation is completed, use the following command to check the version.

root@linuxhelp1:~# java -version
openjdk version " 1.8.0_111" 
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

Now you need to download the hadoop package.

root@linuxhelp1:~# wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
--2016-11-22 03:07:07--  http://apache.mirrors.tds.net/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
Resolving apache.mirrors.tds.net (apache.mirrors.tds.net)... 216.165.129.134
Connecting to apache.mirrors.tds.net (apache.mirrors.tds.net)|216.165.129.134|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 214092195 (204M) [application/x-gzip]
Saving to: ‘ hadoop-2.7.3.tar.gz’ 

hadoop-2.7.3.tar.gz               100%[==========================================================> ] 204.17M  87.4KB/s    in 31m 30s

2016-11-22 03:38:39 (111 KB/s) - ‘ hadoop-2.7.3.tar.gz’  saved [214092195/214092195]

In order to check the downloaded file using SHA-256.

root@linuxhelp1:~# wget https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz.mds
--2016-11-22 03:39:20--  https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz.mds
Resolving dist.apache.org (dist.apache.org)... 209.188.14.144
Connecting to dist.apache.org (dist.apache.org)|209.188.14.144|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 958 [application/x-gzip]
Saving to: ‘ hadoop-2.7.3.tar.gz.mds’ 

hadoop-2.7.3.tar.gz.mds           100%[==========================================================> ]     958  --.-KB/s    in 0s      

2016-11-22 03:39:21 (42.7 MB/s) - ‘ hadoop-2.7.3.tar.gz.mds’  saved [958/958]

Utilize the following command for verification.

root@linuxhelp1:~# shasum -a 256 hadoop-2.7.3.tar.gz
d489df3808244b906eb38f4d081ba49e50c4603db03efd5e594a1e98b09259c2  hadoop-2.7.3.tar.gz

Here you need to compare the value with the SHA-256 value in the .mds file.

root@linuxhelp1:~# cat hadoop-2.7.3.tar.gz.mds
hadoop-2.7.3.tar.gz:    MD5 = 34 55 BB 57 E4 B4 90 6B  BE A6 7B 58 CC A7 8F A8
hadoop-2.7.3.tar.gz:   SHA1 = B84B 8989 3426 9C68 753E  4E03 6D21 395E 5A4A B5B1
hadoop-2.7.3.tar.gz: RMD160 = 8FE4 A91E 8C67 2A33 C4E9  61FB 607A DBBD 1AE5 E03A
hadoop-2.7.3.tar.gz: SHA224 = 23AB1EAB B7648921 7101671C DCF9D774 7B84AD50
                              6A74E300 AE6617FA
hadoop-2.7.3.tar.gz: SHA256 = D489DF38 08244B90 6EB38F4D 081BA49E 50C4603D
                              B03EFD5E 594A1E98 B09259C2
hadoop-2.7.3.tar.gz: SHA384 = EFB42E60 3AF4FFB2 BA9F4CF4 1B56F71B D3F3BD8F
                              23331C25 27267762 FDEB67F0 F2B6F56D 797842DB
                              BB8C9F75 9DBA195D
hadoop-2.7.3.tar.gz: SHA512 = 52452D2F 7D0B308F 8BB53ADD B81D98D6 D71F3A7C
                              F5A0C5D8 311C17DD 902E052C 3F4ADD3F EE3C5EA2
                              E6C749D3 476E452F ED50818D 11001D87 CFEC039D
                              9A8BADE5

Extract the downloaded hadoop package.

root@linuxhelp1:~# tar -xvzf hadoop-2.7.3.tar.gz
hadoop-2.7.3/
hadoop-2.7.3/bin/
hadoop-2.7.3/bin/hadoop
hadoop-2.7.3/bin/hadoop.cmd
hadoop-2.7.3/bin/rcc
hadoop-2.7.3/bin/hdfs
hadoop-2.7.3/bin/hdfs.cmd
hadoop-2.7.3/bin/container-executor
hadoop-2.7.3/bin/test-container-executor
hadoop-2.7.3/bin/yarn
hadoop-2.7.3/bin/yarn.cmd
hadoop-2.7.3/bin/mapred
hadoop-2.7.3/bin/mapred.cmd
hadoop-2.7.3/etc/
hadoop-2.7.3/etc/hadoop/
hadoop-2.7.3/etc/hadoop/core-site.xml
hadoop-2.7.3/etc/hadoop/hadoop-env.cmd
hadoop-2.7.3/etc/hadoop/hadoop-metrics2.properties
hadoop-2.7.3/etc/hadoop/hadoop-policy.xml
.
.
.
hadoop-2.7.3/share/hadoop/tools/lib/asm-3.2.jar
hadoop-2.7.3/include/
hadoop-2.7.3/include/hdfs.h
hadoop-2.7.3/include/Pipes.hh
hadoop-2.7.3/include/TemplateFactory.hh
hadoop-2.7.3/include/StringUtils.hh
hadoop-2.7.3/include/SerialUtils.hh
hadoop-2.7.3/LICENSE.txt
hadoop-2.7.3/NOTICE.txt
hadoop-2.7.3/README.txt

Then move the extracted files into /usr/local.

root@linuxhelp1:~# mv hadoop-2.7.3 /usr/local/hadoop

Now configure the java home Hadoop requires that you set the path to Java, either as an environment variable or in the Hadoop configuration file.

root@linuxhelp1:~# readlink -f /usr/bin/java | sed " s:bin/java::" 
/usr/lib/jvm/java-8-openjdk-amd64/jre/

Now open the hadoop-env.sh file and add the static value and dynamic value

root@linuxhelp1:~# nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
export JAVA_HOME=$(readlink -f /usr/bin/java | sed " s:bin/java::" )

Now its time to run the hadoop.

root@linuxhelp1:~# /usr/local/hadoop/bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar             run a jar file
                       note: please use " yarn jar"  to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp   copy file or directories recursively
  archive -archiveName NAME -p  *  create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

Once the Hadoop is configured successfully, run it in stand-alone mode. Then create a new directory called input in home directory.

Copy the Hadoop' s configuration files into the newly created directory to use those files as our data.

root@linuxhelp1:~# mkdir ~/input
root@linuxhelp1:~# cp /usr/local/hadoop/etc/hadoop/*.xml ~/input

Finally run the following command to grep the MapReduce hadoop-mapreduce-examples program and to execute its code.

root@linuxhelp1:~# /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep ~/input ~/grep_example ' principle[.]*' 
16/11/22 03:47:50 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/11/22 03:47:50 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/11/22 03:47:50 INFO input.FileInputFormat: Total input paths to process : 8
16/11/22 03:47:51 INFO mapreduce.JobSubmitter: number of splits:8
16/11/22 03:47:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local586866770_0001
16/11/22 03:47:51 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/11/22 03:47:51 INFO mapreduce.Job: Running job: job_local586866770_0001
16/11/22 03:47:51 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/11/22 03:47:51 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/22 03:47:51 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/11/22 03:47:52 INFO mapred.LocalJobRunner: Waiting for map tasks
16/11/22 03:47:52 INFO mapred.LocalJobRunner: Starting task: attempt_local586866770_0001_m_000000_0
16/11/22 03:47:52 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/22 03:47:52 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
16/11/22 03:47:52 INFO mapred.MapTask: Processing split: file:/home/user1/input/hadoop-policy.xml:0+9683
16/11/22 03:47:52 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
.
.
.
File System Counters
        FILE: Number of bytes read=1247198
        FILE: Number of bytes written=2317778
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=0
        Map output records=0
        Map output bytes=0
        Map output materialized bytes=6
        Input split bytes=115
        Combine input records=0
        Combine output records=0
        Reduce input groups=0
        Reduce shuffle bytes=6
        Reduce input records=0
        Reduce output records=0
        Spilled Records=0
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=42
        Total committed heap usage (bytes)=264036352
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=98
    File Output Format Counters
        Bytes Written=8
Comment
akshayagashe
Oct 02 2018
Is this the same installation for multi-node Hadoop system?
krynomore007
Sep 06 2017
Good one mate
jagannatharumugam
Mar 05 2017
Try executing from root user, since its a /usr location it needs root access....
soundaryakeerty
Mar 05 2017
when i'm trying to move the extracted files into /usr/local, i am getting an error saying permission denied, cannot move what can i do now? please replay Asap
Add a comment
FAQ
Q
What is the spark in Hadoop?
A
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.
Q
Does Hadoop require SSH?
A
Hadoop provided scripts use ssh in order to start and stop the various daemons and some other utilities. The Hadoop framework in itself does not require ssh. Daemons can also be started manually on each node without the script's help.
Q
Where can I get the latest source packages for installing Haddop?
A
For getting the latest source packages of Haddop prefer going to http://apache.mirrors.tds.net/hadoop/common/.
Q
What is Hadoop?
A
Hadoop is a Java-based programming framework that supports the storage of large data-sets on a cluster.
Q
Is there any way to access Hadoop web UI in Linux?
A
In this case what you can do is just format the name node for Hadoop:

$ bin/hdfs namenode -format

then start NameNode daemon and DataNode daemon by

$ sbin/start-dfs.sh