Binary Extraction and Setup
Acquire the Apache Hive 2.3.6 binary archive from the official distribution repository. Extract the contents to a standard application directory and establish a symbolic link for simplified version management.
tar -xzf apache-hive-2.3.6-bin.tar.gz -C /usr/local/
cd /usr/local
sudo ln -s apache-hive-2.3.6-bin hive
Environment Configuration
Configure the system environment variables to include the Hive installation path. Modify the profile configuration to ensure persistence across sessions.
echo 'export HIVE_HOME=/usr/local/hive' | sudo tee -a /etc/profile
echo 'export PATH=$PATH:$HIVE_HOME/bin' | sudo tee -a /etc/profile
source /etc/profile
Hive Configuration and Metastore Setup
By default, Hive uses an embedded Derby database for the metastore, which is unsuitable for production environments involving multiple users. To resolve this, configure an external MySQL database. First, replicate the default template to create the custom configuration file.
cd $HIVE_HOME/conf
cp hive-default.xml.template hive-site.xml
Edit hive-site.xml to define the JDBC connection properties. Replace the placeholder values with your specific database credentials and endpoint.
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1:3306/hive_repo?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive_admin</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>secure_password</value>
</property>
Update local scratch and resource directories to ensure Hive has write access to the necessary temporary locations. Replace system variables with absolute paths.
<property>
<name>hive.exec.local.scratchdir</name>
<value>/var/lib/hive/tmp</value>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/var/lib/hive/resources</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/var/log/hive/querylogs</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/var/log/hive/operation_logs</value>
</property>
Disable user impersonation to prevent permission errors during execution.
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
JDBC Driver Integration
Manually deploy the MySQL JDBC connector JAR file into the Hive library directory. Verify driver compatibility with your MySQL server version.
sudo cp mysql-connector-java-8.0.22.jar $HIVE_HOME/lib/
Schema Initialization
Ensure the target MySQL database exists, then execute the schema initialization tool to create the necessary metastore tables.
schematool -dbType mysql -initSchema
Cluster Initialization and Verification
Start the Hadoop services to provide the underlying distributed file system and resource management.
start-dfs.sh
start-yarn.sh
Launch the Hive CLI to verify the installation.
hive
Execute basic DDL commands to test functionality. Create a new database and inspect the file system to confirm directory creation.
CREATE DATABASE demo_db;
SHOW DATABASES;
Validate that the warehouse directory structure was generated correctly on HDFS.
hdfs dfs -ls -R /user/hive/warehouse