Big Data Fundamentals and Core Technologies Overview
HDFS File System Commands
Disk Usage Information
Retrieve disk usage statistics for a specific path:
hadoop fs -df /home/myfile
Merge Files
Combine multiple files from HDFS into a single local file:
hadoop fs -getmerge /user/hduser0011/test /home/myfile/dir
Write Output to HDFS
Direct console output to an HDFS file:
echo abc | hadoop fs -put ...
Posted on Thu, 11 Jun 2026 18:46:45 +0000 by brucensal
Hadoop Cluster Deployment Guide
Hadoop Distributed Cluster Setup
This guide explains how to set up a fully distributed Hadoop cluster using three or more physical or virtual machines.
Cluster Architecture
Master Node (hadoop0): NameNode, JobTracker, SecondaryNameNode
Worker Nodes (hadoop1, hadoop2): DataNode, TaskTracker
Virtual Machine Setup
Create three virtual machines u ...
Posted on Thu, 04 Jun 2026 17:57:49 +0000 by Knifee
Hadoop Distributed System Fundamentals and Cluster Setup
Big Data Processing Overview
Big data involves analyzing massive datasets to extract valuable insights for organizational decision-making. Core processing stages include:
Data acquisition
Data processing
Result visualization
Hadoop Framework
Hadoop provides distributed processing capabilities for large datasets across computer clusters. Its a ...
Posted on Tue, 26 May 2026 01:24:57 +0000 by SidewinderX
Building a Music Ranking System with HBase and MapReduce
Environment: Windows 10, CentOS 7.9, Hadoop 3.2, HBase 2.5.3, and Zookeeper 3.8 in fully distributed mode;
Environment setup procedures can be found in these articles:
CentOS7 Hadoop3.X Fully Distributed Environment Setup
Hadoop3.x Fully Distributed Environment Setup with Zookeeper and Hbase
1. Integrating MapReduce and HBase
Copy hbase-site.x ...
Posted on Wed, 13 May 2026 00:15:56 +0000 by LawsLoop
MapReduce Average Computation Example
This example demonstrates how to compute the average value of a numeric field grouped by a key using the MapReduce framework. The process follows a standard pattern: the mapper extracts key-value pairs, the shuffle phase groups values by key, and the reducer computes the sum and count to produce the average.
Setup and Environment
Ensure Hadoop ...
Posted on Mon, 11 May 2026 07:19:01 +0000 by DarkPrince2005
Hive Data Warehouse Integration
Overview of Hive
1.1 Hive functions as a data warehouse within the Hadoop ecosystem. It manages and queries data stored in Hadoop. Essentially, Hive serves as an SQL parsing engine that converts SQL queries into MapReduce jobs.
Hive includes a mapping tool that translates SQL tables and columns into files and directories on HDFS. This mapping ...
Posted on Sun, 10 May 2026 17:23:35 +0000 by adnan1983
Implementing Custom InputFormat in Hadoop MapReduce
Experimental Principle
1. InputFormat Concept
The InputFormat class in Hadoop defines how input files are split and read. It provides the following functionality:
Selects files or objects to process as input
Defines InputSplits that partition files into tasks
Provides a factory method for RecordReader to read files
Hadoop includes several bui ...
Posted on Sat, 09 May 2026 20:06:23 +0000 by pdmiller