Setting Up Hadoop 2.10 Pseudo-Distributed Mode on CentOS 7

This guide walks through the steps to set up a Hadoop 2.10 pseudo-distributed cluster on a single CentOS 7 virtual machine.

1. Create a Hadoop User and Group

We will create a dedicated user hdfs and configure it with appropriate permissions.

As root user:

Create the hdfs user and set a password:

adduser hdfs
passwd hdfs

Add the user to the hdfs group (create the group if it doesn't exist):

groupadd hdfs
usermod -a -G hdfs hdfs

Verify the user and group:

cat /etc/group
groups hdfs

You should see something like: hdfs:x:1001:hdfs

Grant sudo privileges (optional but recommended):

Edit the sudoers file:

visudo  # or vim /etc/sudoers

Add the following line below root ALL=(ALL) ALL:

hdfs ALL=(ALL) ALL

Create a software installation directory:

mkdir /opt/soft
chown -R hdfs:hdfs /opt/soft

2. Install JDK and Configure Environment Variables

Download a JDK (e.g., jdk-8u231-linux-x64.tar.gz) and extract it:

tar -zxvf jdk-8u231-linux-x64.tar.gz -C /opt/soft/

Create a symbolic link for easier version management:

cd /opt/soft
ln -s jdk1.8.0_231 jdk

Set environment variables in /etc/profile:

vim /etc/profile

Add the following lines:

# jdk
export JAVA_HOME=/opt/soft/jdk
export PATH=$PATH:$JAVA_HOME/bin

Reload the profile:

source /etc/profile

Verify installation:

java -version

3. Install Hadoop 2.10.0

Download hadoop-2.10.0.tar.gz and extract it:

tar -zxvf hadoop-2.10.0.tar.gz -C /opt/soft/

Create a symbolic link:

cd /opt/soft
ln -s hadoop-2.10.0 hadoop

Add Hadoop environment variables to /etc/profile:

# hadoop
export HADOOP_HOME=/opt/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Reload and verify:

source /etc/profile
hadoop version

Configure Pseudo-Distributed Mode

Edit the following Hadoop configuration files in /opt/soft/hadoop/etc/hadoop/.

core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost/</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

mapred-site.xml (if mapred-site.xml.template exists, copy it first)

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>localhost</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

4. Configure SSH Passwordless Login

Hadoop uses SSH to manage daemons. Set up passwordless SSH for the hdfs user.

  1. Check if SSH packages are installed:

    yum list installed | grep ssh
    
  2. Ensure the SSH daemon is running:

    ps -Af | grep sshd
    
  3. Generate an SSH key pair (as the hdfs user):

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    
  4. Append the public key to the authorized keys file:

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    
  5. Set appropriate permissions:

    chmod 644 ~/.ssh/authorized_keys
    
  6. Test the setup:

    ssh localhost
    

    You should not be prompted for a password.

5. Format and Start Hadoop

Switch to the hdfs user:

su - hdfs

Format the NameNode:

hadoop namenode -format

First Start Attempt and Troubleshooting

Start all Hadoop daemons:

start-all.sh

If you get a JAVA_HOME not found error, edit the file $HADOOP_HOME/etc/hadoop/hadoop-env.sh and uncomment/set the JAVA_HOME environment variable:

export JAVA_HOME=/opt/soft/jdk

Then try again:

stop-all.sh
start-all.sh

Check if all processes are running:

jps

If the NameNode is missing, check the logs:

tail -200f $HADOOP_HOME/logs/hadoop-hdfs-namenode-*.log

A common error is:

Directory /tmp/hadoop-hdfs/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.

This can happen if the NameNode directory was lost (e.g., after a reboot). Solution: Re-format the NameNode.

hadoop namenode -format
stop-all.sh
start-all.sh

Verify all processes are running:

jps

Expected processes:

  • NameNode
  • DataNode
  • SecondaryNameNode
  • ResourceManager
  • NodeManager

6. Verify the Setup

Open a web browser and go to:

http://<your-server-ip>:50070

For example, if your VM IP is 192.168.30.141, visit http://192.168.30.141:50070.

You should see the Hadoop NameNode Web UI.

Firewall Configuration (if needed)

If you cannot access the web UI, check if the firewall is running:

firewall-cmd --state

To stop the firewall temporarily (as root):

systemctl stop firewalld.service

To disable it permanently:

systemctl disable firewalld.service

Tags: Hadoop centos Big Data Tutorial Pseudo-Distributed

Posted on Fri, 15 May 2026 14:47:48 +0000 by ron8000