Deploying Apache Hive 2.3.6 on Hadoop 2.10.0

Binary Extraction and Setup

Acquire the Apache Hive 2.3.6 binary archive from the official distribution repository. Extract the contents to a standard application directory and establish a symbolic link for simplified version management.

tar -xzf apache-hive-2.3.6-bin.tar.gz -C /usr/local/
cd /usr/local
sudo ln -s apache-hive-2.3.6-bin hive

Environment Configuration

Configure the system environment variables to include the Hive installation path. Modify the profile configuration to ensure persistence across sessions.

echo 'export HIVE_HOME=/usr/local/hive' | sudo tee -a /etc/profile
echo 'export PATH=$PATH:$HIVE_HOME/bin' | sudo tee -a /etc/profile
source /etc/profile

Hive Configuration and Metastore Setup

By default, Hive uses an embedded Derby database for the metastore, which is unsuitable for production environments involving multiple users. To resolve this, configure an external MySQL database. First, replicate the default template to create the custom configuration file.

cd $HIVE_HOME/conf
cp hive-default.xml.template hive-site.xml

Edit hive-site.xml to define the JDBC connection properties. Replace the placeholder values with your specific database credentials and endpoint.

<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://127.0.0.1:3306/hive_repo?createDatabaseIfNotExist=true</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive_admin</value>
</property>
<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>secure_password</value>
</property>

Update local scratch and resource directories to ensure Hive has write access to the necessary temporary locations. Replace system variables with absolute paths.

<property>
    <name>hive.exec.local.scratchdir</name>
    <value>/var/lib/hive/tmp</value>
</property>
<property>
    <name>hive.downloaded.resources.dir</name>
    <value>/var/lib/hive/resources</value>
</property>
<property>
    <name>hive.querylog.location</name>
    <value>/var/log/hive/querylogs</value>
</property>
<property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/var/log/hive/operation_logs</value>
</property>

Disable user impersonation to prevent permission errors during execution.

<property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
</property>

JDBC Driver Integration

Manually deploy the MySQL JDBC connector JAR file into the Hive library directory. Verify driver compatibility with your MySQL server version.

sudo cp mysql-connector-java-8.0.22.jar $HIVE_HOME/lib/

Schema Initialization

Ensure the target MySQL database exists, then execute the schema initialization tool to create the necessary metastore tables.

schematool -dbType mysql -initSchema

Cluster Initialization and Verification

Start the Hadoop services to provide the underlying distributed file system and resource management.

start-dfs.sh
start-yarn.sh

Launch the Hive CLI to verify the installation.

hive

Execute basic DDL commands to test functionality. Create a new database and inspect the file system to confirm directory creation.

CREATE DATABASE demo_db;
SHOW DATABASES;

Validate that the warehouse directory structure was generated correctly on HDFS.

hdfs dfs -ls -R /user/hive/warehouse

Tags: Hadoop Hive Big Data Data Warehouse System Administration

Posted on Fri, 15 May 2026 00:46:07 +0000 by Hardwarez