Core Concepts and Architecture of the Hadoop Distributed File System

HDFS Overview

HDFS (Hadoop Distributed File System) is a distributed storage system designed to handle massive datasets, typically in terabytes or petabytes. It forms the storage layer of the Hadoop ecosystem, enabling applications to work with large-scale data using a unified interface similar to a conventional file system. HDFS streams data during access, allowing command-line operations and MapReduce jobs to interact with it directly. Engineered for fault tolerance and high-throughput reads, its write-once, read-many model simplifies concurrency control and enhances throughput, a critical factor for big data workloads.

Design Objectives

  • Lower cost by distributing data and computation across clusters of commodity hardware.
  • Achieve reliability through automatic replication and fault recovery mechanisms.
  • Provide scalable storage and processing capacity for extremely large datasets.

Historical Background

The origins trace back to challenges faced when building web crawlers for Lucene, where large-scale data storage, cluster elasticity, and dynamic fault handling were problematic. Inspired by Google's GFS paper describing a scalable, fault-tolerant distributed file system, the HDFS architecture was developed to adress these needs.

Architecture

  • NameNode: Centralized master server managing namespace metadata and coordinating client access. It does not handle actual file content I/O, preventing it from becoming a bottleneck.
  • DataNode: Worker nodes responsible for storing and serving file blocks. All block-level data flows between clients and DataNodes; the NameNode only provides location hints.
  • Replica Placement: NameNode decides where to place replicas based on cluster topology. For reads, it directs clients to the nearest replica to minimize network latency.
  • Health Monitoring: NameNode receives periodic heartbeats and block reports from each DataNode to assess liveness and track stored blocks.
NameNode DataNode
Stores metadata Stores file block data
Metadata kept in memory Block data persisted on disk
Tracks file-block-DataNode mapping Maintains block ID to local file mapping

Block Storage and Replication

Files are split into fixed-size blocks. Default sizes evolved from 64 MB (early Hadoop) to 128 MB (later versions), configurable via hdfs-site.xml:

<property>
  <name>dfs.block.size</name>
  <value>{size_in_bytes}</value>
</property>

Benefits of Block Abstraction

  • Enables files larger than any single disk.
  • Simplifies storage subsystem design.
  • Facilitates replication for fault tolerance and availability.

Block Caching

Frequently accessed blocks may be cached in a DataNode's off-heap memory. By default, one replica per block is cached, but policies can increase this. Cached blocks improve performance for operations like map-side joins with small lookup tables. Users define caching directives specifying files and duration within a cache pool—an administrative unit controlling permissions and resource usage.

Example: A 130 MB file becomes two blocks stored separately, occupying exactly 130 MB on disk, not 256 MB.

Permistion Model

Mirrors Linux-style permissions (r=read, w=write, x=execute). Execute bit is ignored for files and applies to directory traversal. Ownership aligns with the OS user executing the Hadoop command. Permissions aim to prevent accidental misuse rather than enforce strict security.

Metadata Management with SecondaryNameNode

A single NameNode stores metadata in FsImage (checkpoint) and Edits (operation log). Locations are set in hdfs-site.xml:

<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:///export/servers/hadoop-3.1.1/datas/namenode/namenodedatas</value>
</property>
<property>
  <name>dfs.namenode.edits.dir</name>
  <value>file:///export/servers/hadoop-3.1.1/datas/dfs/nn/edits</value>
</property>
  • Edits: Records recent write operations. Updated before clients see changes.
  • FsImage: Snapshot of full metadata, loaded into memory when needed. To avoid costly reloads, incremental updates go to Edits. Periodically, Edits must be merged into a new FsImage.

Inspection commands:

hdfs oiv -i fsimage_0000000000000000864 -p XML -o hello.xml
hdfs oev -i edits_0000000000000000865-0000000000000000866 -o myedit.xml -p XML

SecondaryNameNode Workflow

  1. Requests NameNode to switch to a new edit log (edits.new).
  2. Retrieves current FsImage and Edits via HTTP.
  3. Loads FsImage into memory, applies Edits to produce a new image.
  4. Sends merged FsImage back to NameNode via HTTP.
  5. NameNode replaces old FsImage with new one and renames edits.new to edits.

SecondaryNameNode should run on separate hardware due to comparable memory needs as NameNode.

Write Operation Flow

  1. Client contacts NameNode via RPC to create a file; NameNode validates path and permissions.
  2. Client requests block locations.
  3. NameNode replies with a pipeline of DataNodes (default replication factor 3) chosen by rack-aware policy.
  4. Client sends data to first DataNode; each node forwards to the next, forming a pipeline.
  5. Data sent in packets (~64 KB); each node queues acknowledgments.
  6. After block completion, client requests placement for the next block.

Read Operation Flow

  1. Client queries NameNode for block locations.
  2. NameNode returns sorted DataNode lists based on proximity and health.
  3. Client reads from the closest DataNode; supports short-circuit reads if client is co-located.
  4. Reads proceed via FSDataInputStream; checksums validate integrity.
  5. On failure, client fetches the block from next available replica.
  6. Parallel block reads are performed; results assembled into complete file.

Command-Line Interface

Syntax: hdfs dfs [options]

Common commands:

hdfs dfs -ls /                         # List root directory
hdfs dfs -mkdir -p /path/to/dir        # Recursively create directories
hdfs dfs -moveFromLocal src dst        # Move local file to HDFS
hdfs dfs -mv src dst                   # Rename/move within HDFS
hdfs dfs -put local dst                # Upload local file
hdfs dfs -appendToFile a.txt b.txt /out.txt  # Append to file
hdfs dfs -cat /path                    # Display file contents
hdfs dfs -cp src dst                   # Copy within HDFS
hdfs dfs -rm [-r] path                 # Delete file or directory
hdfs dfs -chmod -R 777 /path           # Change permissions
hdfs dfs -chown -R user:group /path    # Change ownership

Programmatic Access via Java API

Enviroment setup on Windows involves placing Hadoop binaries in a non-localized path, setting HADOOP_HOME, copying hadoop.dll to C:\Windows\System32, and rebooting.

Maven Dependencies

<repositories>
  <repository>
    <id>cloudera</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
  </repository>
</repositories>
<dependencies>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>3.1.1</version>
    <scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>3.1.1</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs-client</artifactId>
    <version>3.1.1</version>
    <scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.1.1</version>
  </dependency>
</dependencies>

Core Classes

  • Configuration: Encapsulates client/server settings.
  • FileSystem: Entry point for file operations; obtain via FileSystem.get(conf). If fs.defaultFS is unset, defaults to local filesystem.

Obtaining FileSystem Instances

Configuration cfg = new Configuration();
cfg.set("fs.defaultFS", "hdfs://node01:8020");
FileSystem fs = FileSystem.get(cfg);

// Alternative using URI
FileSystem fs2 = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());

// Using newInstance
FileSystem fs3 = FileSystem.newInstance(cfg);
FileSystem fs4 = FileSystem.newInstance(new URI("hdfs://node01:8020"), new Configuration());

Common Operations

  • List Files Recursively:
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.52.250:8020"), new Configuration());
RemoteIterator<LocatedFileStatus> iter = fs.listFiles(new Path("/"), true);
while (iter.hasNext()) {
    System.out.println(iter.next().getPath());
}
fs.close();
  • Create Directory:
fs.mkdirs(new Path("/hello/mydir/test"));
  • Download File:
FSDataInputStream in = fs.open(new Path("/timer.txt"));
IOUtils.copy(in, new FileOutputStream("e:\\timer.txt"));
  • Upload File:
fs.copyFromLocalFile(new Path("file:///c:\\install.log"), new Path("/hello/mydir/test"));
  • Permission Enforcement: Enable via dfs.permissions.enabled=true in hdfs-site.xml and restart cluster.

  • Merge Small Files: Combine locally before upload or merge on download using shell:

hdfs dfs -getmerge /config/*.xml ./merged.xml

Programmatic merge example:

FileSystem hdfs = FileSystem.get(new URI("hdfs://192.168.52.250:8020"), conf, "root");
FSDataOutputStream out = hdfs.create(new Path("/bigfile.txt"));
LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
for (FileStatus st : localFs.listStatus(new Path("file:///E:\\input"))) {
    FSDataInputStream in = localFs.open(st.getPath());
    IOUtils.copy(in, out);
    IOUtils.closeQuietly(in);
}
IOUtils.closeQuietly(out);

Tags: HDFS Hadoop Distributed File System Big Data Storage Java API

Posted on Sun, 07 Jun 2026 16:15:38 +0000 by MFHJoe