Setting Up Hadoop 2.7.1 on Windows and Managing HDFS Storage

Hadoop Installation on Windows

This guide covers the complete process of deploying Hadoop 2.7.1 on a Windows system, configuring HDFS, and performing file operations.

Prerequisites

Windows operating system
JDK 8 or compatible Java version installed
Hadoop 2.7.1 binary distribution from Apache archives
Windows-specific Hadoop binaries (hadooponwindows-master)

Installation Steps

1. Download and Extract Hadoop

Obtain Hadoop 2.7.1 from the Apache archive:

https://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/

Extract the archive to a desired location, such as D:\home\hadoop\hadoop-2.7.1.

2. Apply Windows Compatibility Files

The official Hadoop distribution lacks Windows support binaries. Download the Windows-compatible package and extract its contents:

Replace the bin directory in the Hadoop installation with the Windows version
Replace the etc directory similarly

3. Configure Environment Variables

Set the following system environment variables:

Variable	Value
JAVA_HOME	JDK installation path (e.g., C:\Program Files\Java\jdk1.8.0_xxx)
HADOOP_HOME	D:\home\hadoop\hadoop-2.7.1
PATH	Add %HADOOP_HOME%\bin and %HADOOP_HOME%\sbin

Configuration Files

All configuration file are located in HADOOP_HOME\etc\hadoop.

4. Configure core-site.xml

This file defines core Hadoop settings:

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

5. Configure hdfs-site.xml

Before editing, create the necessary directory structure:

HADOOP_HOME\
  └── data\
        ├── namenode\
        └── datanode\

Add the following configuration:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/D:/home/hadoop/hadoop-2.7.1/data/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/D:/home/hadoop/hadoop-2.7.1/data/datanode</value>
    </property>
</configuration>

Important: Ensure forward slashes and leading / are used in path definitions.

6. Configure hadoop-env.cmd

Modify the Java home configuration:

set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_xxx

Replace the path with your actual JDK installation directory.

7. Install Native DLL

Copy hadoop.dll from HADOOP_HOME\bin to C:\Windows\System32. This DLL is required for MapReduce operations on Windows.

Starting Hadoop Cluster

8. Format the NameNode

Open Command Prompt as Administrator and execute:

hdfs namenode -format

Look for confirmation messages indicating successful initialization.

9. Start Hadoop Services

Launch the complete Hadoop cluster:

start-all.cmd

This command opens four console windows for:

NameNode
DataNode
ResourceManager
NodeManager

Verify running Java processes:

jps

Expected output should include:

NameNode
DataNode
ResourceManager
NodeManager

Web Interfaces

Access the Hadoop administration interfaces:

Service	URL	Purpose
HDFS UI	http://localhost:50070	File system browser, cluster status
YARN UI	http://localhost:8088	Application management, resource monitoirng

Through the HDFS file browser (Utilities → Browse the file system), you can create directories, view files, and manage the file system visually.

File Operations on HDFS

Create Directories

hadoop fs -mkdir hdfs://localhost:9000/user
hadoop fs -mkdir hdfs://localhost:9000/user/project

Upload Files

hadoop fs -put D:\Data\sample.csv hdfs://localhost:9000/user/project/

List Contents

hadoop fs -ls hdfs://localhost:9000/user/project/

Remove Files and Directories

# Remove file
hadoop fs -rm hdfs://localhost:9000/user/project/sample.csv

# Remove directory recursively
hadoop fs -rm -r -skipTrash hdfs://localhost:9000/user/project

Troubleshooting

Nodes Not Starting

If NameNode or DataNode fail to start:

Stop all Hadoop services:
```
stop-all.cmd
```

Remove temporary and data directories:

rmdir /s /q %HADOOP_HOME%\logs
rmdir /s /q %HADOOP_HOME%\data

Recreate the data directories with the structure shown in Step 5.
Reformat the NameNode:
```
hdfs namenode -format
```
Restart the cluster:
```
start-all.cmd
```

Common issues include:

Incorrect path formats in XML configuration (missing leading /)
Incomplete file replacement during Windows compatibility setup
Java home path errors in hadoop-env.cmd

Shutting Down

When operations are complete, stop the cluster:

stop-all.cmd

Keep all console windows open during operation; closing them terminates the corrresponding service.

Tags: Hadoop HDFS Windows Big Data Distributed Systems

Posted on Fri, 08 May 2026 18:54:35 +0000 by Eclesiastes

Freaks City