Hadoop Cluster Deployment Guide

Hadoop Distributed Cluster Setup

This guide explains how to set up a fully distributed Hadoop cluster using three or more physical or virtual machines.

Cluster Architecture

  • Master Node (hadoop0): NameNode, JobTracker, SecondaryNameNode
  • Worker Nodes (hadoop1, hadoop2): DataNode, TaskTracker

Virtual Machine Setup

Create three virtual machines using cloning:

# Create base VM and then clone
# Use full cloning for independent instances
# Recommended memory: 400MB per node
# Node names: CentOS, CentOS1, CentOS2

Network Configuration

Configure IP addresses to each node:

# CentOS1: 192.168.80.101
# CentOS2: 192.168.80.102
# Restart network service
service network restart

Hostname Configuraton

Set hostnames for worker nodes:

# Edit network configuration
vi /etc/sysconfig/network
# Reboot to apply changes
reboot -h now

Clean Previous Configurations

Remove existing SSH and configuration files:

# Clean SSH directory
rm -rf /root/.ssh/*

# Remove local installations
rm -rf /usr/local/*

# Reset environment profile
vi /etc/profile

SSH Key Setup

Configure password-less SSH access:

# Generate SSH keys
ssh-keygen -t rsa

# Authorize keys
cd /root/.ssh/
cat id_rsa.pub >> authorized_keys

# Test local connection
ssh localhost
exit

Hosts File Configuration

Add host entries to /etc/hosts on all nodes:

192.168.80.100 hadoop0
192.168.80.101 hadoop1
192.168.80.102 hadoop2

Cross-Node SSH Setup

Enable mutual SSH access between all nodes:

# Copy public keys between nodes
ssh-copy-id -i hadoop1
ssh-copy-id -i hadoop2

# Verify connectivity
ssh hadoop1
ssh hadoop2

Software Distribution

Copy Hadooop and JDK installations to worker nodes:

# Clean master node directories
rm -rf /usr/local/hadoop/logs/
rm -rf /usr/local/hadoop/tmp/

# Copy to worker nodes
scp -r /usr/local/jdk hadoop1:/usr/local/
scp -r /usr/local/jdk hadoop2:/usr/local/
scp -r /usr/local/hadoop hadoop1:/usr/local/
scp -r /usr/local/hadoop hadoop2:/usr/local/

Environment Configuration

Distribute environment settings:

# Copy profile to workers
scp /etc/profile hadoop1:/etc/
scp /etc/profile hadoop2:/etc/

# Apply changes
source /etc/profile

Cluster Configuration

Configure slave nodes in Hadoop configuration:

# Edit slaves file
vi /usr/local/hadoop/conf/slaves
# Remove localhost and add:
hadoop1
hadoop2

Cluster Startup

Initialize and start the cluster:

# Format HDFS
hadoop namenode -format

# Start all services
start-all.sh

Web Interface Access

Add host entries to Windows hosts file (C:\Windows\System32\drivers\etc\hosts):

192.168.80.100 hadoop0
192.168.80.101 hadoop1
192.168.80.102 hadoop2

Access web interfaces at hadoop0:50070 (NameNode) and hadoop0:50030 (JobTracker)

Node Management

To add a new node:

# Configure new node environment
# Add to slaves file
# Start services on new node
hadoop-daemon.sh start datanode
hadoop-daemon.sh start tasktracker
# Refresh nodes
hadoop dfsadmin -refreshNodes

To remove a node:

# Stop specific daemon
kill -9 [process_id]

Tags: Hadoop HDFS mapreduce cluster ssh

Posted on Thu, 04 Jun 2026 17:57:49 +0000 by Knifee