Big Data Fundamentals and Core Technologies Overview

HDFS File System Commands Disk Usage Information Retrieve disk usage statistics for a specific path: hadoop fs -df /home/myfile Merge Files Combine multiple files from HDFS into a single local file: hadoop fs -getmerge /user/hduser0011/test /home/myfile/dir Write Output to HDFS Direct console output to an HDFS file: echo abc | hadoop fs -put ...

Posted on Thu, 11 Jun 2026 18:46:45 +0000 by brucensal

Building a Hadoop Cluster on CentOS 7

Biulding a Single-Node Hadoop Installation For instructions on setting up a single-node Hadoop installation, please refer to: https://example.com/single-node-hadoop-setup Creating a Hadoop Cluster Cloning Virtual Machines Right-click on hadoop1 → Manage → Clone Click Next Select the current state of the virtual machine → Click Next Choose Crea ...

Posted on Fri, 15 May 2026 14:12:50 +0000 by ondi

Hive Fundamentals for Data Warehousing

Introduction to Hive Hive is an open-source data warehouse system built on top of Hadoop. It enables the mapping of structured and semi-structured data files stored in HDFS in to database tables, providing a SQL-like language called HiveQL (HQL) for querying and analyzing large datasets. Hive's core functionality is to translate HiveQL queries ...

Posted on Thu, 14 May 2026 14:36:14 +0000 by willpower

MapReduce Average Computation Example

This example demonstrates how to compute the average value of a numeric field grouped by a key using the MapReduce framework. The process follows a standard pattern: the mapper extracts key-value pairs, the shuffle phase groups values by key, and the reducer computes the sum and count to produce the average. Setup and Environment Ensure Hadoop ...

Posted on Mon, 11 May 2026 07:19:01 +0000 by DarkPrince2005