Setting Up a Standalone Hadoop Environment on CentOS 6.8

Environment Choices

Server Selection

  • Cloud provider: Alibaba Cloud (pay-as-you-go entry tier)
  • Operating system: Linux CentOS 6.8
  • CPU: 1 core
  • Memory: 1 GB
  • Disk: 40 GB
  • Public IP: 39.108.77.250

Software Versions

  • JDK: 1.8 (jdk-8u144-linux-x64.tar.gz)
  • Hadoop: 2.8.2 (hadoop-2.8.2.tar.gz)

Download Locations

Initial Server Configuration

Before installing Hadoop, perform the following steps.

1. Change the Hostname

Update the hostname for easier administration:

hostname

Edit /etc/sysconfig/network and modify the HOSTNAME value:

vim /etc/sysconfig/network

Set it to your desired name (e.g., test1). A reboot is required for the change to take effect.

Add a hostname-to-IP mapping in /etc/hosts:

vim /etc/hosts

Append a line like:

39.108.77.250   test1

This mapping is essential when using hostnames in configuration files.

2. Disable the Firewall

Disable the firewall to allow external access. For CentOS 6.x:

service iptables stop

For CentOS 7+:

systemctl stop firewalld.service

3. Verify System Time

Check the server time:

date

If its incorrect, update it with:

date -s 'MMDDhhmmYYYY.ss'

Hadoop Environment Installation

1. Prepare Directories and Extract Archives

Move the downloaded JDK and Hadoop tarballs to /home, then create target directories:

mkdir /home/java
mkdir /home/hadoop

Extract both archives:

tar -xvf jdk-8u144-linux-x64.tar.gz
tar -xvf hadoop-2.8.2.tar.gz

Move the extracted folders into the target directories and rename them:

mv jdk1.8.0_144 /home/java/jdk1.8
mv hadoop-2.8.2 /home/hadoop/hadoop2.8

2. Configure JDK

Check if Java is already installed:

java -version

If an incompatible version exists, remove it before proceeding.

Edit /etc/profile:

vim /etc/profile

Add the following lines:

export JAVA_HOME=/home/java/jdk1.8
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=.:${JAVA_HOME}/bin:$PATH

Make the configuration effective:

source /etc/profile

Verify the installation:

java -version

3. Configure Hadoop

3.1 Update profile

Edit /etc/profile again and append Hadoop variables:

export HADOOP_HOME=/home/hadoop/hadoop2.8
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH

Reload the profile:

source /etc/profile

3.2 Create Required Directories

Create directories in /root/hadoop to avoid accidental deletion:

mkdir -p /root/hadoop/tmp
mkdir -p /root/hadoop/var
mkdir -p /root/hadoop/dfs/name
mkdir -p /root/hadoop/dfs/data

3.3 Edit core-site.xml

Navigate to the Hadoop configuration directory:

cd /home/hadoop/hadoop2.8/etc/hadoop

Edit core-site.xml:

vim core-site.xml

Inside the <configuration> element, add:

<property>
    <name>hadoop.tmp.dir</name>
    <value>/root/hadoop/tmp</value>
    <description>A base for other temporary directories.</description>
</property>
<property>
    <name>fs.default.name</name>
    <value>hdfs://test1:9000</value>
</property>

Replace test1 with your hostname or IP address.

3.4 Edit hadoop-env.sh

Open hadoop-env.sh:

vim hadoop-env.sh

Locate the line export JAVA_HOME=${JAVA_HOME} and replace it with the explicit JDK path:

export JAVA_HOME=/home/java/jdk1.8

3.5 Edit hdfs-site.xml

Edit hdfs-site.xml:

vim hdfs-site.xml

Insert the following inside <configuration>:

<property>
    <name>dfs.name.dir</name>
    <value>/root/hadoop/dfs/name</value>
    <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
    <name>dfs.data.dir</name>
    <value>/root/hadoop/dfs/data</value>
    <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
    <name>dfs.replication</name>
    <value>2</value>
</property>
<property>
    <name>dfs.permissions</name>
    <value>false</value>
    <description>If false, permission checking is disabled (not recommended for production).</description>
</property>

Setting dfs.permissions to false avoids permission errors during setup but should be set to true (or the property removed) in a production environment.

3.6 Edit mapred-site.xml

If mapred-site.xml does not exist, create it from the template:

cp mapred-site.xml.template mapred-site.xml

Edit the file:

vim mapred-site.xml

Add the following inside <configuration>:

<property>
    <name>mapred.job.tracker</name>
    <value>test1:9001</value>
</property>
<property>
    <name>mapred.local.dir</name>
    <value>/root/hadoop/var</value>
</property>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>

Replace test1 with your hostname or IP.

Starting Hadoop

Initialization

Before the first start, format the NameNode:

cd /home/hadoop/hadoop2.8/bin
./hadoop namenode -format

If successful, you will see a current directory and files under /root/hadoop/dfs/name.

Start Services

Navigate to the sbin directory:

cd /home/hadoop/hadoop2.8/sbin

Start HDFS:

./start-dfs.sh

When prompted, type yes and provide the SSH password (if applicable).

Start YARN:

./start-yarn.sh

Verify the processes using:

jps

Access the Web Interfaces

The setup is complete.

Tags: Hadoop centos single-node installation big-data

Posted on Thu, 02 Jul 2026 16:41:27 +0000 by troublemaker