Setting Up Hadoop 2.10 Pseudo-Distributed Mode on CentOS 7

This guide walks through the steps to set up a Hadoop 2.10 pseudo-distributed cluster on a single CentOS 7 virtual machine.

1. Create a Hadoop User and Group

We will create a dedicated user hdfs and configure it with appropriate permissions.

As root user:

Create the hdfs user and set a password:

adduser hdfs
passwd hdfs

Add the user to the hdfs group (create the group if it doesn't exist):

groupadd hdfs
usermod -a -G hdfs hdfs

Verify the user and group:

cat /etc/group
groups hdfs

You should see something like: hdfs:x:1001:hdfs

Grant sudo privileges (optional but recommended):

Edit the sudoers file:

visudo  # or vim /etc/sudoers

Add the following line below root ALL=(ALL) ALL:

hdfs ALL=(ALL) ALL

Create a software installation directory:

mkdir /opt/soft
chown -R hdfs:hdfs /opt/soft

2. Install JDK and Configure Environment Variables

Download a JDK (e.g., jdk-8u231-linux-x64.tar.gz) and extract it:

tar -zxvf jdk-8u231-linux-x64.tar.gz -C /opt/soft/

Create a symbolic link for easier version management:

cd /opt/soft
ln -s jdk1.8.0_231 jdk

Set environment variables in /etc/profile:

vim /etc/profile

Add the following lines:

# jdk
export JAVA_HOME=/opt/soft/jdk
export PATH=$PATH:$JAVA_HOME/bin

Reload the profile:

source /etc/profile

Verify installation:

java -version

3. Install Hadoop 2.10.0

Download hadoop-2.10.0.tar.gz and extract it:

tar -zxvf hadoop-2.10.0.tar.gz -C /opt/soft/

Create a symbolic link:

cd /opt/soft
ln -s hadoop-2.10.0 hadoop

Add Hadoop environment variables to /etc/profile:

# hadoop
export HADOOP_HOME=/opt/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Reload and verify:

source /etc/profile
hadoop version

Configure Pseudo-Distributed Mode

Edit the following Hadoop configuration files in /opt/soft/hadoop/etc/hadoop/.

`core-site.xml`

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost/</value>
    </property>
</configuration>

`hdfs-site.xml`

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

`mapred-site.xml` (if `mapred-site.xml.template` exists, copy it first)

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

`yarn-site.xml`

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>localhost</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

4. Configure SSH Passwordless Login

Hadoop uses SSH to manage daemons. Set up passwordless SSH for the hdfs user.

Check if SSH packages are installed:
```
yum list installed | grep ssh
```
Ensure the SSH daemon is running:
```
ps -Af | grep sshd
```

Generate an SSH key pair (as the hdfs user):

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Append the public key to the authorized keys file:
```
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
```
Set appropriate permissions:
```
chmod 644 ~/.ssh/authorized_keys
```
Test the setup:
```
ssh localhost
```
You should not be prompted for a password.

5. Format and Start Hadoop

Switch to the hdfs user:

su - hdfs

Format the NameNode:

hadoop namenode -format

First Start Attempt and Troubleshooting

Start all Hadoop daemons:

start-all.sh

If you get a JAVA_HOME not found error, edit the file $HADOOP_HOME/etc/hadoop/hadoop-env.sh and uncomment/set the JAVA_HOME environment variable:

export JAVA_HOME=/opt/soft/jdk

Then try again:

stop-all.sh
start-all.sh

Check if all processes are running:

jps

If the NameNode is missing, check the logs:

tail -200f $HADOOP_HOME/logs/hadoop-hdfs-namenode-*.log

A common error is:

Directory /tmp/hadoop-hdfs/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.

This can happen if the NameNode directory was lost (e.g., after a reboot). Solution: Re-format the NameNode.

hadoop namenode -format
stop-all.sh
start-all.sh

Verify all processes are running:

jps

Expected processes:

NameNode
DataNode
SecondaryNameNode
ResourceManager
NodeManager

6. Verify the Setup

Open a web browser and go to:

http://<your-server-ip>:50070

For example, if your VM IP is 192.168.30.141, visit http://192.168.30.141:50070.

You should see the Hadoop NameNode Web UI.

Firewall Configuration (if needed)

If you cannot access the web UI, check if the firewall is running:

firewall-cmd --state

To stop the firewall temporarily (as root):

systemctl stop firewalld.service

To disable it permanently:

systemctl disable firewalld.service

Tags: Hadoop centos Big Data Tutorial Pseudo-Distributed

Posted on Fri, 15 May 2026 14:47:48 +0000 by ron8000

Freaks City

Setting Up Hadoop 2.10 Pseudo-Distributed Mode on CentOS 7

1. Create a Hadoop User and Group

As root user:

Grant sudo privileges (optional but recommended):

Create a software installation directory:

2. Install JDK and Configure Environment Variables

3. Install Hadoop 2.10.0

Configure Pseudo-Distributed Mode

`core-site.xml`

`hdfs-site.xml`

`mapred-site.xml` (if `mapred-site.xml.template` exists, copy it first)

`yarn-site.xml`

4. Configure SSH Passwordless Login

5. Format and Start Hadoop

First Start Attempt and Troubleshooting

6. Verify the Setup

Firewall Configuration (if needed)

Hot Tags

Freaks City

Setting Up Hadoop 2.10 Pseudo-Distributed Mode on CentOS 7

1. Create a Hadoop User and Group

As root user:

Grant sudo privileges (optional but recommended):

Create a software installation directory:

2. Install JDK and Configure Environment Variables

3. Install Hadoop 2.10.0

Configure Pseudo-Distributed Mode

core-site.xml

hdfs-site.xml

mapred-site.xml (if mapred-site.xml.template exists, copy it first)

yarn-site.xml

4. Configure SSH Passwordless Login

5. Format and Start Hadoop

First Start Attempt and Troubleshooting

6. Verify the Setup

Firewall Configuration (if needed)

Hot Tags

`core-site.xml`

`hdfs-site.xml`

`mapred-site.xml` (if `mapred-site.xml.template` exists, copy it first)

`yarn-site.xml`