Building a Hadoop Cluster on CentOS 7
Biulding a Single-Node Hadoop Installation
For instructions on setting up a single-node Hadoop installation, please refer to: https://example.com/single-node-hadoop-setup
Creating a Hadoop Cluster
Cloning Virtual Machines
Right-click on hadoop1 → Manage → Clone
Click Next
Select the current state of the virtual machine → Click Next
Choose Crea ...
Posted on Fri, 15 May 2026 14:12:50 +0000 by ondi
Deploying Apache Hive 2.3.6 on Hadoop 2.10.0
Binary Extraction and Setup
Acquire the Apache Hive 2.3.6 binary archive from the official distribution repository. Extract the contents to a standard application directory and establish a symbolic link for simplified version management.
tar -xzf apache-hive-2.3.6-bin.tar.gz -C /usr/local/
cd /usr/local
sudo ln -s apache-hive-2.3.6-bin hive
E ...
Posted on Fri, 15 May 2026 00:46:07 +0000 by Hardwarez
Hive Fundamentals for Data Warehousing
Introduction to Hive
Hive is an open-source data warehouse system built on top of Hadoop. It enables the mapping of structured and semi-structured data files stored in HDFS in to database tables, providing a SQL-like language called HiveQL (HQL) for querying and analyzing large datasets. Hive's core functionality is to translate HiveQL queries ...
Posted on Thu, 14 May 2026 14:36:14 +0000 by willpower
Setting Up a Two-Node Hadoop HDFS Cluster
Cluster Planning
This guide covers the setup of a two-node Hadoop cluster for HDFS and YARN. The configuration uses one master node and one resource manager node.
IP Address
Deployed Services
Role
192.168.56.2 (master-node)
NameNode, DataNode, NodeManager, Hive, Presto, MySQL, Hive Metastore, Presto CLI
Master Node
192.168.56.3 (wor ...
Posted on Thu, 14 May 2026 04:41:39 +0000 by Brentley_11
Troubleshooting Common Errors in Big Data Environment Setup: Hadoop, Spark, HBase, Hive, and ZooKeeper
Hadoop Pseudo-Distributed Mode Issues
Configuration Parsing Failure in hdfs-site.xml
When you encounter FATAL conf.Configuration: error parsing conf hdfs-site.xml, the root cause is typically an encoding mismatch. Resolve it by opening the file and saving it with a uniform character encoding such as UTF-8.
HDFS Command Deprecation Warning
The w ...
Posted on Wed, 13 May 2026 16:56:31 +0000 by ashutosh.titan
Setting Up a Standalone Hadoop and Spark Environment
System Requirements
Operating System: CentOS 7 (virtual machine)
CPU: 2 cores
Memory: 2 GB
Disk: 40 GB
Software Versions
JDK: 1.8 (jdk-8u144-linux-x64.tar.gz)
Hadoop: 2.8.2 (hadoop-2.8.2.tar.gz)
Scala: 2.12.2 (scala-2.12.2.tgz)
Spark: 1.6.3 (spark-1.6.3-bin-hadoop2.4-without-hive.tgz)
Initial System Configuration
Set Hostname
hostnamectl set- ...
Posted on Wed, 13 May 2026 15:09:50 +0000 by tracy
Configuring a Standalone Apache Hadoop and Hive Data Processing Stack on Linux
System Prerequisites and Network Configuration
Before deploying the big data stack, establish a stable base environment. The following specifications apply to this deployment:
OS: CentOS 7 (x86_64)
Resources: 2 vCPUs, 4GB RAM, 50GB Disk
Software Versions: OpenJDK 11, Hadoop 3.3.6, Apache Hive 3.1.3, MySQL 8.0 Community Server
Host Identity an ...
Posted on Wed, 13 May 2026 09:03:51 +0000 by daedalus__
Setting Up Apache Spark 3.0.1 Cluster Modes and Configuring Yarn Log Aggregation
Apache Spark serves as a quasi-real-time big data processing engine that requires resource scheduling and task management. While Spark includes its own standalone resource scheduler, it also supports deployment on external platforms such as Yarn, Mesos, and Kubernetes.
This guide covers three deployment modes:
Local Mode: Ideal for local devel ...
Posted on Wed, 13 May 2026 04:54:52 +0000 by installer69
Apache Hudi Integration with Spark: Getting Started Guide
Integrating Apache Hudi with Spark
This guide covers the essential steps to integrate Apache Hudi with Apache Spark for building data lake solutions. The integration enables ACID transactions, time-travel queries, and efficient upserts on large datasets.
Environment Setup
Before starting, ensure you have Spark installed and Hadoop services runn ...
Posted on Tue, 12 May 2026 17:38:23 +0000 by DaveTomneyUK
MapReduce Average Computation Example
This example demonstrates how to compute the average value of a numeric field grouped by a key using the MapReduce framework. The process follows a standard pattern: the mapper extracts key-value pairs, the shuffle phase groups values by key, and the reducer computes the sum and count to produce the average.
Setup and Environment
Ensure Hadoop ...
Posted on Mon, 11 May 2026 07:19:01 +0000 by DarkPrince2005