Hive Data Warehouse Integration
Overview of Hive
1.1 Hive functions as a data warehouse within the Hadoop ecosystem. It manages and queries data stored in Hadoop. Essentially, Hive serves as an SQL parsing engine that converts SQL queries into MapReduce jobs.
Hive includes a mapping tool that translates SQL tables and columns into files and directories on HDFS. This mapping ...
Posted on Sun, 10 May 2026 17:23:35 +0000 by adnan1983
Implementing Custom InputFormat in Hadoop MapReduce
Experimental Principle
1. InputFormat Concept
The InputFormat class in Hadoop defines how input files are split and read. It provides the following functionality:
Selects files or objects to process as input
Defines InputSplits that partition files into tasks
Provides a factory method for RecordReader to read files
Hadoop includes several bui ...
Posted on Sat, 09 May 2026 20:06:23 +0000 by pdmiller
Setting Up Hadoop 2.7.1 on Windows and Managing HDFS Storage
Hadoop Installation on Windows
This guide covers the complete process of deploying Hadoop 2.7.1 on a Windows system, configuring HDFS, and performing file operations.
Prerequisites
Windows operating system
JDK 8 or compatible Java version installed
Hadoop 2.7.1 binary distribution from Apache archives
Windows-specific Hadoop binaries (hadoopon ...
Posted on Fri, 08 May 2026 18:54:35 +0000 by Eclesiastes
Hadoop Cluster Configuration and Data Pipeline Setup for Offline Data Warehouse
When configuring a Hadoop cluster for an offline data warehouse, proper host mapping and configuration file adjustments are essential.
In core-site.xml, proxy user settings should allow access from any host, group, or user:
<property>
<name>hadoop.proxyuser.atguigu.hosts</name>
<value>*</value>
</property> ...
Posted on Thu, 07 May 2026 07:42:31 +0000 by bruckerrlb