Hadoop Installation on Windows
This guide covers the complete process of deploying Hadoop 2.7.1 on a Windows system, configuring HDFS, and performing file operations.
Prerequisites
- Windows operating system
- JDK 8 or compatible Java version installed
- Hadoop 2.7.1 binary distribution from Apache archives
- Windows-specific Hadoop binaries (hadooponwindows-master)
Installation Steps
1. Download and Extract Hadoop
Obtain Hadoop 2.7.1 from the Apache archive:
https://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/
Extract the archive to a desired location, such as D:\home\hadoop\hadoop-2.7.1.
2. Apply Windows Compatibility Files
The official Hadoop distribution lacks Windows support binaries. Download the Windows-compatible package and extract its contents:
- Replace the
bindirectory in the Hadoop installation with the Windows version - Replace the
etcdirectory similarly
3. Configure Environment Variables
Set the following system environment variables:
| Variable | Value |
|---|---|
| JAVA_HOME | JDK installation path (e.g., C:\Program Files\Java\jdk1.8.0_xxx) |
| HADOOP_HOME | D:\home\hadoop\hadoop-2.7.1 |
| PATH | Add %HADOOP_HOME%\bin and %HADOOP_HOME%\sbin |
Configuration Files
All configuration file are located in HADOOP_HOME\etc\hadoop.
4. Configure core-site.xml
This file defines core Hadoop settings:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
5. Configure hdfs-site.xml
Before editing, create the necessary directory structure:
HADOOP_HOME\
└── data\
├── namenode\
└── datanode\
Add the following configuration:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/D:/home/hadoop/hadoop-2.7.1/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/D:/home/hadoop/hadoop-2.7.1/data/datanode</value>
</property>
</configuration>
Important: Ensure forward slashes and leading / are used in path definitions.
6. Configure hadoop-env.cmd
Modify the Java home configuration:
set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_xxx
Replace the path with your actual JDK installation directory.
7. Install Native DLL
Copy hadoop.dll from HADOOP_HOME\bin to C:\Windows\System32. This DLL is required for MapReduce operations on Windows.
Starting Hadoop Cluster
8. Format the NameNode
Open Command Prompt as Administrator and execute:
hdfs namenode -format
Look for confirmation messages indicating successful initialization.
9. Start Hadoop Services
Launch the complete Hadoop cluster:
start-all.cmd
This command opens four console windows for:
- NameNode
- DataNode
- ResourceManager
- NodeManager
Verify running Java processes:
jps
Expected output should include:
- NameNode
- DataNode
- ResourceManager
- NodeManager
Web Interfaces
Access the Hadoop administration interfaces:
| Service | URL | Purpose |
|---|---|---|
| HDFS UI | http://localhost:50070 | File system browser, cluster status |
| YARN UI | http://localhost:8088 | Application management, resource monitoirng |
Through the HDFS file browser (Utilities → Browse the file system), you can create directories, view files, and manage the file system visually.
File Operations on HDFS
Create Directories
hadoop fs -mkdir hdfs://localhost:9000/user
hadoop fs -mkdir hdfs://localhost:9000/user/project
Upload Files
hadoop fs -put D:\Data\sample.csv hdfs://localhost:9000/user/project/
List Contents
hadoop fs -ls hdfs://localhost:9000/user/project/
Remove Files and Directories
# Remove file
hadoop fs -rm hdfs://localhost:9000/user/project/sample.csv
# Remove directory recursively
hadoop fs -rm -r -skipTrash hdfs://localhost:9000/user/project
Troubleshooting
Nodes Not Starting
If NameNode or DataNode fail to start:
-
Stop all Hadoop services:
stop-all.cmd -
Remove temporary and data directories:
rmdir /s /q %HADOOP_HOME%\logs rmdir /s /q %HADOOP_HOME%\data -
Recreate the data directories with the structure shown in Step 5.
-
Reformat the NameNode:
hdfs namenode -format -
Restart the cluster:
start-all.cmd
Common issues include:
- Incorrect path formats in XML configuration (missing leading
/) - Incomplete file replacement during Windows compatibility setup
- Java home path errors in hadoop-env.cmd
Shutting Down
When operations are complete, stop the cluster:
stop-all.cmd
Keep all console windows open during operation; closing them terminates the corrresponding service.