Big Data Fundamentals and Core Technologies Overview

HDFS File System Commands Disk Usage Information Retrieve disk usage statistics for a specific path: hadoop fs -df /home/myfile Merge Files Combine multiple files from HDFS into a single local file: hadoop fs -getmerge /user/hduser0011/test /home/myfile/dir Write Output to HDFS Direct console output to an HDFS file: echo abc | hadoop fs -put ...

Posted on Thu, 11 Jun 2026 18:46:45 +0000 by brucensal

Hadoop Cluster Deployment Guide

Hadoop Distributed Cluster Setup This guide explains how to set up a fully distributed Hadoop cluster using three or more physical or virtual machines. Cluster Architecture Master Node (hadoop0): NameNode, JobTracker, SecondaryNameNode Worker Nodes (hadoop1, hadoop2): DataNode, TaskTracker Virtual Machine Setup Create three virtual machines u ...

Posted on Thu, 04 Jun 2026 17:57:49 +0000 by Knifee

Hadoop Distributed System Fundamentals and Cluster Setup

Big Data Processing Overview Big data involves analyzing massive datasets to extract valuable insights for organizational decision-making. Core processing stages include: Data acquisition Data processing Result visualization Hadoop Framework Hadoop provides distributed processing capabilities for large datasets across computer clusters. Its a ...

Posted on Tue, 26 May 2026 01:24:57 +0000 by SidewinderX

Building a Music Ranking System with HBase and MapReduce

Environment: Windows 10, CentOS 7.9, Hadoop 3.2, HBase 2.5.3, and Zookeeper 3.8 in fully distributed mode; Environment setup procedures can be found in these articles: CentOS7 Hadoop3.X Fully Distributed Environment Setup Hadoop3.x Fully Distributed Environment Setup with Zookeeper and Hbase 1. Integrating MapReduce and HBase Copy hbase-site.x ...

Posted on Wed, 13 May 2026 00:15:56 +0000 by LawsLoop

MapReduce Average Computation Example

This example demonstrates how to compute the average value of a numeric field grouped by a key using the MapReduce framework. The process follows a standard pattern: the mapper extracts key-value pairs, the shuffle phase groups values by key, and the reducer computes the sum and count to produce the average. Setup and Environment Ensure Hadoop ...

Posted on Mon, 11 May 2026 07:19:01 +0000 by DarkPrince2005

Hive Data Warehouse Integration

Overview of Hive 1.1 Hive functions as a data warehouse within the Hadoop ecosystem. It manages and queries data stored in Hadoop. Essentially, Hive serves as an SQL parsing engine that converts SQL queries into MapReduce jobs. Hive includes a mapping tool that translates SQL tables and columns into files and directories on HDFS. This mapping ...

Posted on Sun, 10 May 2026 17:23:35 +0000 by adnan1983

Implementing Custom InputFormat in Hadoop MapReduce

Experimental Principle 1. InputFormat Concept The InputFormat class in Hadoop defines how input files are split and read. It provides the following functionality: Selects files or objects to process as input Defines InputSplits that partition files into tasks Provides a factory method for RecordReader to read files Hadoop includes several bui ...

Posted on Sat, 09 May 2026 20:06:23 +0000 by pdmiller