Understanding Hive, HBase, and HDFS in the Hadoop Ecosystem
Hive, HBase, and HDFS serve distinct but complementary roles in the Hadoop architecture—each addressing different data access patterns and storage requirements.
Hive: SQL Abstraction Over Batch Processing
Hive is a data warehousing infrastructure built atop Hadoop that translates declarative SQL-like queries (HiveQL) into distributed batch jobs ...
Posted on Sun, 21 Jun 2026 17:35:21 +0000 by drbigfresh
Setting Up HBase Single-Node Environment for Big Data Processing
Prerequisites
Server Specifications
Cloud Instance: Basic tier (pay-as-you-go)
Operating System: Linux CentOS 6.8
CPU: 1 core
Memory: 1GB
Storage: 40GB
Software Stack
Java Development Kit: Version 1.8 (jdk-8u144-linux-x64.tar.gz)
Apache Hadoop: Version 2.8.2 (hadoop-2.8.2.tar.gz)
Apache HBase: Version 1.2.6 (hbase-1.2.6-bin.tar.gz)
Download ...
Posted on Tue, 16 Jun 2026 16:49:07 +0000 by Yesideez
Configuring a Standalone Apache Hadoop and Hive Data Processing Stack on Linux
System Prerequisites and Network Configuration
Before deploying the big data stack, establish a stable base environment. The following specifications apply to this deployment:
OS: CentOS 7 (x86_64)
Resources: 2 vCPUs, 4GB RAM, 50GB Disk
Software Versions: OpenJDK 11, Hadoop 3.3.6, Apache Hive 3.1.3, MySQL 8.0 Community Server
Host Identity an ...
Posted on Wed, 13 May 2026 09:03:51 +0000 by daedalus__
Apache Hudi Integration with Spark: Getting Started Guide
Integrating Apache Hudi with Spark
This guide covers the essential steps to integrate Apache Hudi with Apache Spark for building data lake solutions. The integration enables ACID transactions, time-travel queries, and efficient upserts on large datasets.
Environment Setup
Before starting, ensure you have Spark installed and Hadoop services runn ...
Posted on Tue, 12 May 2026 17:38:23 +0000 by DaveTomneyUK