Setting Up Hadoop 2.7.1 on Windows and Managing HDFS Storage

Hadoop Installation on Windows

This guide covers the complete process of deploying Hadoop 2.7.1 on a Windows system, configuring HDFS, and performing file operations.

Prerequisites

  • Windows operating system
  • JDK 8 or compatible Java version installed
  • Hadoop 2.7.1 binary distribution from Apache archives
  • Windows-specific Hadoop binaries (hadooponwindows-master)

Installation Steps

1. Download and Extract Hadoop

Obtain Hadoop 2.7.1 from the Apache archive:

https://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/

Extract the archive to a desired location, such as D:\home\hadoop\hadoop-2.7.1.

2. Apply Windows Compatibility Files

The official Hadoop distribution lacks Windows support binaries. Download the Windows-compatible package and extract its contents:

  • Replace the bin directory in the Hadoop installation with the Windows version
  • Replace the etc directory similarly

3. Configure Environment Variables

Set the following system environment variables:

Variable Value
JAVA_HOME JDK installation path (e.g., C:\Program Files\Java\jdk1.8.0_xxx)
HADOOP_HOME D:\home\hadoop\hadoop-2.7.1
PATH Add %HADOOP_HOME%\bin and %HADOOP_HOME%\sbin

Configuration Files

All configuration file are located in HADOOP_HOME\etc\hadoop.

4. Configure core-site.xml

This file defines core Hadoop settings:

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

5. Configure hdfs-site.xml

Before editing, create the necessary directory structure:

HADOOP_HOME\
  └── data\
        ├── namenode\
        └── datanode\

Add the following configuration:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/D:/home/hadoop/hadoop-2.7.1/data/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/D:/home/hadoop/hadoop-2.7.1/data/datanode</value>
    </property>
</configuration>

Important: Ensure forward slashes and leading / are used in path definitions.

6. Configure hadoop-env.cmd

Modify the Java home configuration:

set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_xxx

Replace the path with your actual JDK installation directory.

7. Install Native DLL

Copy hadoop.dll from HADOOP_HOME\bin to C:\Windows\System32. This DLL is required for MapReduce operations on Windows.

Starting Hadoop Cluster

8. Format the NameNode

Open Command Prompt as Administrator and execute:

hdfs namenode -format

Look for confirmation messages indicating successful initialization.

9. Start Hadoop Services

Launch the complete Hadoop cluster:

start-all.cmd

This command opens four console windows for:

  • NameNode
  • DataNode
  • ResourceManager
  • NodeManager

Verify running Java processes:

jps

Expected output should include:

  • NameNode
  • DataNode
  • ResourceManager
  • NodeManager

Web Interfaces

Access the Hadoop administration interfaces:

Service URL Purpose
HDFS UI http://localhost:50070 File system browser, cluster status
YARN UI http://localhost:8088 Application management, resource monitoirng

Through the HDFS file browser (Utilities → Browse the file system), you can create directories, view files, and manage the file system visually.

File Operations on HDFS

Create Directories

hadoop fs -mkdir hdfs://localhost:9000/user
hadoop fs -mkdir hdfs://localhost:9000/user/project

Upload Files

hadoop fs -put D:\Data\sample.csv hdfs://localhost:9000/user/project/

List Contents

hadoop fs -ls hdfs://localhost:9000/user/project/

Remove Files and Directories

# Remove file
hadoop fs -rm hdfs://localhost:9000/user/project/sample.csv

# Remove directory recursively
hadoop fs -rm -r -skipTrash hdfs://localhost:9000/user/project

Troubleshooting

Nodes Not Starting

If NameNode or DataNode fail to start:

  1. Stop all Hadoop services:

    stop-all.cmd
    
  2. Remove temporary and data directories:

    rmdir /s /q %HADOOP_HOME%\logs
    rmdir /s /q %HADOOP_HOME%\data
    
  3. Recreate the data directories with the structure shown in Step 5.

  4. Reformat the NameNode:

    hdfs namenode -format
    
  5. Restart the cluster:

    start-all.cmd
    

Common issues include:

  • Incorrect path formats in XML configuration (missing leading /)
  • Incomplete file replacement during Windows compatibility setup
  • Java home path errors in hadoop-env.cmd

Shutting Down

When operations are complete, stop the cluster:

stop-all.cmd

Keep all console windows open during operation; closing them terminates the corrresponding service.

Tags: Hadoop HDFS Windows Big Data Distributed Systems

Posted on Fri, 08 May 2026 18:54:35 +0000 by Eclesiastes