Understanding Hive, HBase, and HDFS in the Hadoop Ecosystem

Hive, HBase, and HDFS serve distinct but complementary roles in the Hadoop architecture—each addressing different data access patterns and storage requirements. Hive: SQL Abstraction Over Batch Processing Hive is a data warehousing infrastructure built atop Hadoop that translates declarative SQL-like queries (HiveQL) into distributed batch jobs ...

Posted on Sun, 21 Jun 2026 17:35:21 +0000 by drbigfresh

Hive Fundamentals and Core Concepts

Hive Introduction What is Hive? Hive is an open-source data warehouse solution originally developed by Facebook that operates on Hadoop infrastructure It provides SQL-like query capabilities (HQL) for structured data stored in HDFS Core functionality involves translating SQL queries into MapReduce jobs Primary use case: batch data analytics wi ...

Posted on Mon, 15 Jun 2026 18:24:52 +0000 by bobbfwed

Troubleshooting HBase Snapshot Reads with LZO Compression

Issue 1: UnsatisfiedLinkError to gplcompression When attempting to read HBase snapshot data, the following error occurs: java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:31) at com.hadoop.compression.lzo.LzoCodec.<clinit&gt ...

Posted on Sun, 14 Jun 2026 16:47:10 +0000 by zMastaa

Big Data Fundamentals and Core Technologies Overview

HDFS File System Commands Disk Usage Information Retrieve disk usage statistics for a specific path: hadoop fs -df /home/myfile Merge Files Combine multiple files from HDFS into a single local file: hadoop fs -getmerge /user/hduser0011/test /home/myfile/dir Write Output to HDFS Direct console output to an HDFS file: echo abc | hadoop fs -put ...

Posted on Thu, 11 Jun 2026 18:46:45 +0000 by brucensal

Core Concepts and Architecture of the Hadoop Distributed File System

HDFS Overview HDFS (Hadoop Distributed File System) is a distributed storage system designed to handle massive datasets, typically in terabytes or petabytes. It forms the storage layer of the Hadoop ecosystem, enabling applications to work with large-scale data using a unified interface similar to a conventional file system. HDFS streams data d ...

Posted on Sun, 07 Jun 2026 16:15:38 +0000 by MFHJoe

Hadoop Cluster Deployment Guide

Hadoop Distributed Cluster Setup This guide explains how to set up a fully distributed Hadoop cluster using three or more physical or virtual machines. Cluster Architecture Master Node (hadoop0): NameNode, JobTracker, SecondaryNameNode Worker Nodes (hadoop1, hadoop2): DataNode, TaskTracker Virtual Machine Setup Create three virtual machines u ...

Posted on Thu, 04 Jun 2026 17:57:49 +0000 by Knifee

Hadoop Distributed System Fundamentals and Cluster Setup

Big Data Processing Overview Big data involves analyzing massive datasets to extract valuable insights for organizational decision-making. Core processing stages include: Data acquisition Data processing Result visualization Hadoop Framework Hadoop provides distributed processing capabilities for large datasets across computer clusters. Its a ...

Posted on Tue, 26 May 2026 01:24:57 +0000 by SidewinderX

Configuring LZO Compression for Hadoop 3.1.2 and HBase 2.2.0

To implement LZO compression within a HBase environment running on Hadoop, it is necessary to compile the native LZO libraries and the corresponding Hadoop-LZO Java bridge from source. Older guides often reference the deprecated hadoop-gpl-compression library, which is incompatible with modern Hadoop versions. The following procedure outlines t ...

Posted on Mon, 18 May 2026 18:24:19 +0000 by neron-fx

Automating Hadoop and Hive Pseudo-Distributed Deployment with Bash Scripts

Project Structure OverviewThe automation solution is organized into specific directories to separate concerns:lib/: Contains external Java libraries required for the setup, including dom4j for XML parsing and the MySQL JDBC driver.software/: Stores the binary packages for Hadoop and Hive (e.g., hadoop-2.6.0-cdh5.10.0.tar.gz).scripts/: Houses th ...

Posted on Mon, 18 May 2026 03:08:58 +0000 by galayman

Setting Up Hadoop 2.10 Pseudo-Distributed Mode on CentOS 7

This guide walks through the steps to set up a Hadoop 2.10 pseudo-distributed cluster on a single CentOS 7 virtual machine. 1. Create a Hadoop User and Group We will create a dedicated user hdfs and configure it with appropriate permissions. As root user: Create the hdfs user and set a password: adduser hdfs passwd hdfs Add the user to the hdf ...

Posted on Fri, 15 May 2026 14:47:48 +0000 by ron8000