HDFS Overview
HDFS (Hadoop Distributed File System) is a distributed storage system designed to handle massive datasets, typically in terabytes or petabytes. It forms the storage layer of the Hadoop ecosystem, enabling applications to work with large-scale data using a unified interface similar to a conventional file system. HDFS streams data during access, allowing command-line operations and MapReduce jobs to interact with it directly. Engineered for fault tolerance and high-throughput reads, its write-once, read-many model simplifies concurrency control and enhances throughput, a critical factor for big data workloads.
Design Objectives
- Lower cost by distributing data and computation across clusters of commodity hardware.
- Achieve reliability through automatic replication and fault recovery mechanisms.
- Provide scalable storage and processing capacity for extremely large datasets.
Historical Background
The origins trace back to challenges faced when building web crawlers for Lucene, where large-scale data storage, cluster elasticity, and dynamic fault handling were problematic. Inspired by Google's GFS paper describing a scalable, fault-tolerant distributed file system, the HDFS architecture was developed to adress these needs.
Architecture
- NameNode: Centralized master server managing namespace metadata and coordinating client access. It does not handle actual file content I/O, preventing it from becoming a bottleneck.
- DataNode: Worker nodes responsible for storing and serving file blocks. All block-level data flows between clients and DataNodes; the NameNode only provides location hints.
- Replica Placement: NameNode decides where to place replicas based on cluster topology. For reads, it directs clients to the nearest replica to minimize network latency.
- Health Monitoring: NameNode receives periodic heartbeats and block reports from each DataNode to assess liveness and track stored blocks.
| NameNode | DataNode |
|---|---|
| Stores metadata | Stores file block data |
| Metadata kept in memory | Block data persisted on disk |
| Tracks file-block-DataNode mapping | Maintains block ID to local file mapping |
Block Storage and Replication
Files are split into fixed-size blocks. Default sizes evolved from 64 MB (early Hadoop) to 128 MB (later versions), configurable via hdfs-site.xml:
<property>
<name>dfs.block.size</name>
<value>{size_in_bytes}</value>
</property>
Benefits of Block Abstraction
- Enables files larger than any single disk.
- Simplifies storage subsystem design.
- Facilitates replication for fault tolerance and availability.
Block Caching
Frequently accessed blocks may be cached in a DataNode's off-heap memory. By default, one replica per block is cached, but policies can increase this. Cached blocks improve performance for operations like map-side joins with small lookup tables. Users define caching directives specifying files and duration within a cache pool—an administrative unit controlling permissions and resource usage.
Example: A 130 MB file becomes two blocks stored separately, occupying exactly 130 MB on disk, not 256 MB.
Permistion Model
Mirrors Linux-style permissions (r=read, w=write, x=execute). Execute bit is ignored for files and applies to directory traversal. Ownership aligns with the OS user executing the Hadoop command. Permissions aim to prevent accidental misuse rather than enforce strict security.
Metadata Management with SecondaryNameNode
A single NameNode stores metadata in FsImage (checkpoint) and Edits (operation log). Locations are set in hdfs-site.xml:
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///export/servers/hadoop-3.1.1/datas/namenode/namenodedatas</value>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>file:///export/servers/hadoop-3.1.1/datas/dfs/nn/edits</value>
</property>
- Edits: Records recent write operations. Updated before clients see changes.
- FsImage: Snapshot of full metadata, loaded into memory when needed. To avoid costly reloads, incremental updates go to
Edits. Periodically,Editsmust be merged into a newFsImage.
Inspection commands:
hdfs oiv -i fsimage_0000000000000000864 -p XML -o hello.xml
hdfs oev -i edits_0000000000000000865-0000000000000000866 -o myedit.xml -p XML
SecondaryNameNode Workflow
- Requests NameNode to switch to a new edit log (
edits.new). - Retrieves current
FsImageandEditsvia HTTP. - Loads
FsImageinto memory, appliesEditsto produce a new image. - Sends merged
FsImageback to NameNode via HTTP. - NameNode replaces old
FsImagewith new one and renamesedits.newtoedits.
SecondaryNameNode should run on separate hardware due to comparable memory needs as NameNode.
Write Operation Flow
- Client contacts NameNode via RPC to create a file; NameNode validates path and permissions.
- Client requests block locations.
- NameNode replies with a pipeline of DataNodes (default replication factor 3) chosen by rack-aware policy.
- Client sends data to first DataNode; each node forwards to the next, forming a pipeline.
- Data sent in packets (~64 KB); each node queues acknowledgments.
- After block completion, client requests placement for the next block.
Read Operation Flow
- Client queries NameNode for block locations.
- NameNode returns sorted DataNode lists based on proximity and health.
- Client reads from the closest DataNode; supports short-circuit reads if client is co-located.
- Reads proceed via
FSDataInputStream; checksums validate integrity. - On failure, client fetches the block from next available replica.
- Parallel block reads are performed; results assembled into complete file.
Command-Line Interface
Syntax: hdfs dfs [options]
Common commands:
hdfs dfs -ls / # List root directory
hdfs dfs -mkdir -p /path/to/dir # Recursively create directories
hdfs dfs -moveFromLocal src dst # Move local file to HDFS
hdfs dfs -mv src dst # Rename/move within HDFS
hdfs dfs -put local dst # Upload local file
hdfs dfs -appendToFile a.txt b.txt /out.txt # Append to file
hdfs dfs -cat /path # Display file contents
hdfs dfs -cp src dst # Copy within HDFS
hdfs dfs -rm [-r] path # Delete file or directory
hdfs dfs -chmod -R 777 /path # Change permissions
hdfs dfs -chown -R user:group /path # Change ownership
Programmatic Access via Java API
Enviroment setup on Windows involves placing Hadoop binaries in a non-localized path, setting HADOOP_HOME, copying hadoop.dll to C:\Windows\System32, and rebooting.
Maven Dependencies
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-client</artifactId>
<version>3.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.1.1</version>
</dependency>
</dependencies>
Core Classes
Configuration: Encapsulates client/server settings.FileSystem: Entry point for file operations; obtain viaFileSystem.get(conf). Iffs.defaultFSis unset, defaults to local filesystem.
Obtaining FileSystem Instances
Configuration cfg = new Configuration();
cfg.set("fs.defaultFS", "hdfs://node01:8020");
FileSystem fs = FileSystem.get(cfg);
// Alternative using URI
FileSystem fs2 = FileSystem.get(new URI("hdfs://node01:8020"), new Configuration());
// Using newInstance
FileSystem fs3 = FileSystem.newInstance(cfg);
FileSystem fs4 = FileSystem.newInstance(new URI("hdfs://node01:8020"), new Configuration());
Common Operations
- List Files Recursively:
FileSystem fs = FileSystem.get(new URI("hdfs://192.168.52.250:8020"), new Configuration());
RemoteIterator<LocatedFileStatus> iter = fs.listFiles(new Path("/"), true);
while (iter.hasNext()) {
System.out.println(iter.next().getPath());
}
fs.close();
- Create Directory:
fs.mkdirs(new Path("/hello/mydir/test"));
- Download File:
FSDataInputStream in = fs.open(new Path("/timer.txt"));
IOUtils.copy(in, new FileOutputStream("e:\\timer.txt"));
- Upload File:
fs.copyFromLocalFile(new Path("file:///c:\\install.log"), new Path("/hello/mydir/test"));
-
Permission Enforcement: Enable via
dfs.permissions.enabled=trueinhdfs-site.xmland restart cluster. -
Merge Small Files: Combine locally before upload or merge on download using shell:
hdfs dfs -getmerge /config/*.xml ./merged.xml
Programmatic merge example:
FileSystem hdfs = FileSystem.get(new URI("hdfs://192.168.52.250:8020"), conf, "root");
FSDataOutputStream out = hdfs.create(new Path("/bigfile.txt"));
LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
for (FileStatus st : localFs.listStatus(new Path("file:///E:\\input"))) {
FSDataInputStream in = localFs.open(st.getPath());
IOUtils.copy(in, out);
IOUtils.closeQuietly(in);
}
IOUtils.closeQuietly(out);