Big Data Fundamentals and Core Technologies Overview
HDFS File System Commands
Disk Usage Information
Retrieve disk usage statistics for a specific path:
hadoop fs -df /home/myfile
Merge Files
Combine multiple files from HDFS into a single local file:
hadoop fs -getmerge /user/hduser0011/test /home/myfile/dir
Write Output to HDFS
Direct console output to an HDFS file:
echo abc | hadoop fs -put ...
Posted on Thu, 11 Jun 2026 18:46:45 +0000 by brucensal
Building a Hadoop Cluster on CentOS 7
Biulding a Single-Node Hadoop Installation
For instructions on setting up a single-node Hadoop installation, please refer to: https://example.com/single-node-hadoop-setup
Creating a Hadoop Cluster
Cloning Virtual Machines
Right-click on hadoop1 → Manage → Clone
Click Next
Select the current state of the virtual machine → Click Next
Choose Crea ...
Posted on Fri, 15 May 2026 14:12:50 +0000 by ondi
Hive Fundamentals for Data Warehousing
Introduction to Hive
Hive is an open-source data warehouse system built on top of Hadoop. It enables the mapping of structured and semi-structured data files stored in HDFS in to database tables, providing a SQL-like language called HiveQL (HQL) for querying and analyzing large datasets. Hive's core functionality is to translate HiveQL queries ...
Posted on Thu, 14 May 2026 14:36:14 +0000 by willpower
MapReduce Average Computation Example
This example demonstrates how to compute the average value of a numeric field grouped by a key using the MapReduce framework. The process follows a standard pattern: the mapper extracts key-value pairs, the shuffle phase groups values by key, and the reducer computes the sum and count to produce the average.
Setup and Environment
Ensure Hadoop ...
Posted on Mon, 11 May 2026 07:19:01 +0000 by DarkPrince2005