Understanding Hive, HBase, and HDFS in the Hadoop Ecosystem
Hive, HBase, and HDFS serve distinct but complementary roles in the Hadoop architecture—each addressing different data access patterns and storage requirements.
Hive: SQL Abstraction Over Batch Processing
Hive is a data warehousing infrastructure built atop Hadoop that translates declarative SQL-like queries (HiveQL) into distributed batch jobs ...
Posted on Sun, 21 Jun 2026 17:35:21 +0000 by drbigfresh
Hive Fundamentals and Core Concepts
Hive Introduction
What is Hive?
Hive is an open-source data warehouse solution originally developed by Facebook that operates on Hadoop infrastructure
It provides SQL-like query capabilities (HQL) for structured data stored in HDFS
Core functionality involves translating SQL queries into MapReduce jobs
Primary use case: batch data analytics wi ...
Posted on Mon, 15 Jun 2026 18:24:52 +0000 by bobbfwed
Troubleshooting HBase Snapshot Reads with LZO Compression
Issue 1: UnsatisfiedLinkError to gplcompression
When attempting to read HBase snapshot data, the following error occurs:
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:31)
at com.hadoop.compression.lzo.LzoCodec.<clinit> ...
Posted on Sun, 14 Jun 2026 16:47:10 +0000 by zMastaa
Big Data Fundamentals and Core Technologies Overview
HDFS File System Commands
Disk Usage Information
Retrieve disk usage statistics for a specific path:
hadoop fs -df /home/myfile
Merge Files
Combine multiple files from HDFS into a single local file:
hadoop fs -getmerge /user/hduser0011/test /home/myfile/dir
Write Output to HDFS
Direct console output to an HDFS file:
echo abc | hadoop fs -put ...
Posted on Thu, 11 Jun 2026 18:46:45 +0000 by brucensal
Core Concepts and Architecture of the Hadoop Distributed File System
HDFS Overview
HDFS (Hadoop Distributed File System) is a distributed storage system designed to handle massive datasets, typically in terabytes or petabytes. It forms the storage layer of the Hadoop ecosystem, enabling applications to work with large-scale data using a unified interface similar to a conventional file system. HDFS streams data d ...
Posted on Sun, 07 Jun 2026 16:15:38 +0000 by MFHJoe
Hadoop Cluster Deployment Guide
Hadoop Distributed Cluster Setup
This guide explains how to set up a fully distributed Hadoop cluster using three or more physical or virtual machines.
Cluster Architecture
Master Node (hadoop0): NameNode, JobTracker, SecondaryNameNode
Worker Nodes (hadoop1, hadoop2): DataNode, TaskTracker
Virtual Machine Setup
Create three virtual machines u ...
Posted on Thu, 04 Jun 2026 17:57:49 +0000 by Knifee
Hadoop Distributed System Fundamentals and Cluster Setup
Big Data Processing Overview
Big data involves analyzing massive datasets to extract valuable insights for organizational decision-making. Core processing stages include:
Data acquisition
Data processing
Result visualization
Hadoop Framework
Hadoop provides distributed processing capabilities for large datasets across computer clusters. Its a ...
Posted on Tue, 26 May 2026 01:24:57 +0000 by SidewinderX
Configuring LZO Compression for Hadoop 3.1.2 and HBase 2.2.0
To implement LZO compression within a HBase environment running on Hadoop, it is necessary to compile the native LZO libraries and the corresponding Hadoop-LZO Java bridge from source. Older guides often reference the deprecated hadoop-gpl-compression library, which is incompatible with modern Hadoop versions. The following procedure outlines t ...
Posted on Mon, 18 May 2026 18:24:19 +0000 by neron-fx
Automating Hadoop and Hive Pseudo-Distributed Deployment with Bash Scripts
Project Structure OverviewThe automation solution is organized into specific directories to separate concerns:lib/: Contains external Java libraries required for the setup, including dom4j for XML parsing and the MySQL JDBC driver.software/: Stores the binary packages for Hadoop and Hive (e.g., hadoop-2.6.0-cdh5.10.0.tar.gz).scripts/: Houses th ...
Posted on Mon, 18 May 2026 03:08:58 +0000 by galayman
Setting Up Hadoop 2.10 Pseudo-Distributed Mode on CentOS 7
This guide walks through the steps to set up a Hadoop 2.10 pseudo-distributed cluster on a single CentOS 7 virtual machine.
1. Create a Hadoop User and Group
We will create a dedicated user hdfs and configure it with appropriate permissions.
As root user:
Create the hdfs user and set a password:
adduser hdfs
passwd hdfs
Add the user to the hdf ...
Posted on Fri, 15 May 2026 14:47:48 +0000 by ron8000