Environment Choices
Server Selection
- Cloud provider: Alibaba Cloud (pay-as-you-go entry tier)
- Operating system: Linux CentOS 6.8
- CPU: 1 core
- Memory: 1 GB
- Disk: 40 GB
- Public IP: 39.108.77.250
Software Versions
- JDK: 1.8 (jdk-8u144-linux-x64.tar.gz)
- Hadoop: 2.8.2 (hadoop-2.8.2.tar.gz)
Download Locations
-
Official sources:
-
Mirror backup (Baidu Cloud):
- Link: http://pan.baidu.com/s/1pLqS4kF Password: yb79
Initial Server Configuration
Before installing Hadoop, perform the following steps.
1. Change the Hostname
Update the hostname for easier administration:
hostname
Edit /etc/sysconfig/network and modify the HOSTNAME value:
vim /etc/sysconfig/network
Set it to your desired name (e.g., test1). A reboot is required for the change to take effect.
Add a hostname-to-IP mapping in /etc/hosts:
vim /etc/hosts
Append a line like:
39.108.77.250 test1
This mapping is essential when using hostnames in configuration files.
2. Disable the Firewall
Disable the firewall to allow external access. For CentOS 6.x:
service iptables stop
For CentOS 7+:
systemctl stop firewalld.service
3. Verify System Time
Check the server time:
date
If its incorrect, update it with:
date -s 'MMDDhhmmYYYY.ss'
Hadoop Environment Installation
1. Prepare Directories and Extract Archives
Move the downloaded JDK and Hadoop tarballs to /home, then create target directories:
mkdir /home/java
mkdir /home/hadoop
Extract both archives:
tar -xvf jdk-8u144-linux-x64.tar.gz
tar -xvf hadoop-2.8.2.tar.gz
Move the extracted folders into the target directories and rename them:
mv jdk1.8.0_144 /home/java/jdk1.8
mv hadoop-2.8.2 /home/hadoop/hadoop2.8
2. Configure JDK
Check if Java is already installed:
java -version
If an incompatible version exists, remove it before proceeding.
Edit /etc/profile:
vim /etc/profile
Add the following lines:
export JAVA_HOME=/home/java/jdk1.8
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=.:${JAVA_HOME}/bin:$PATH
Make the configuration effective:
source /etc/profile
Verify the installation:
java -version
3. Configure Hadoop
3.1 Update profile
Edit /etc/profile again and append Hadoop variables:
export HADOOP_HOME=/home/hadoop/hadoop2.8
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH
Reload the profile:
source /etc/profile
3.2 Create Required Directories
Create directories in /root/hadoop to avoid accidental deletion:
mkdir -p /root/hadoop/tmp
mkdir -p /root/hadoop/var
mkdir -p /root/hadoop/dfs/name
mkdir -p /root/hadoop/dfs/data
3.3 Edit core-site.xml
Navigate to the Hadoop configuration directory:
cd /home/hadoop/hadoop2.8/etc/hadoop
Edit core-site.xml:
vim core-site.xml
Inside the <configuration> element, add:
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://test1:9000</value>
</property>
Replace test1 with your hostname or IP address.
3.4 Edit hadoop-env.sh
Open hadoop-env.sh:
vim hadoop-env.sh
Locate the line export JAVA_HOME=${JAVA_HOME} and replace it with the explicit JDK path:
export JAVA_HOME=/home/java/jdk1.8
3.5 Edit hdfs-site.xml
Edit hdfs-site.xml:
vim hdfs-site.xml
Insert the following inside <configuration>:
<property>
<name>dfs.name.dir</name>
<value>/root/hadoop/dfs/name</value>
<description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/root/hadoop/dfs/data</value>
<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>If false, permission checking is disabled (not recommended for production).</description>
</property>
Setting dfs.permissions to false avoids permission errors during setup but should be set to true (or the property removed) in a production environment.
3.6 Edit mapred-site.xml
If mapred-site.xml does not exist, create it from the template:
cp mapred-site.xml.template mapred-site.xml
Edit the file:
vim mapred-site.xml
Add the following inside <configuration>:
<property>
<name>mapred.job.tracker</name>
<value>test1:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/root/hadoop/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Replace test1 with your hostname or IP.
Starting Hadoop
Initialization
Before the first start, format the NameNode:
cd /home/hadoop/hadoop2.8/bin
./hadoop namenode -format
If successful, you will see a current directory and files under /root/hadoop/dfs/name.
Start Services
Navigate to the sbin directory:
cd /home/hadoop/hadoop2.8/sbin
Start HDFS:
./start-dfs.sh
When prompted, type yes and provide the SSH password (if applicable).
Start YARN:
./start-yarn.sh
Verify the processes using:
jps
Access the Web Interfaces
- YARN ResourceManager: http://39.108.77.250:8088/cluster
- HDFS NameNode: http://39.108.77.250:50070
The setup is complete.