Project Structure Overview
The automation solution is organized into specific directories to separate concerns:
- lib/: Contains external Java libraries required for the setup, including
dom4jfor XML parsing and the MySQL JDBC driver. - software/: Stores the binary packages for Hadoop and Hive (e.g.,
hadoop-2.6.0-cdh5.10.0.tar.gz). - scripts/: Houses the shell scripts responsible for the installation logic, environment configuration, and execution flow.
System Prerequisites
Prior to execution, ensure the target Linux environment meets the following requirements:
- Java Development Kit (JDK) is installed.
- MySQL database server is installed and running.
- The system firewall is disabled or configured to allow required ports.
- Network connectivity is established (ability to ping external hosts).
- The hostname is properly configured in
/etc/hostname.
Preparing the Environment
Create a dedicated directory for the installation files and adjust permissions to allow the non-root user to manage the /opt directory:
chown username /opt
mkdir -p /opt/hadoop-install
Place the installation scripts into the created directory and grant execution rights:
chmod +x main.sh env-config.sh functions.sh
Configuration Variables
The env-config.sh file defines static parameters and dynamic inputs required for the setup. This includes installation paths, database credentials, and XML configuration values.
#!/bin/bash
# Primary Installation Directory
BASE_INSTALL_DIR="/opt/hadoop"
# Database Connection Parameters
DB_HOST="192.168.59.100"
DB_PORT="3306"
DB_NAME="hive_metadata"
DB_USER="root"
DB_PASSWORD="password"
MYSQL_JAR="mysql-connector-java-5.1.42-bin.jar"
# Java Environment
JAVA_HOME_PATH="/opt/software/jdk1.8.0_131"
# Internal Configuration Paths (Do not modify unless necessary)
HADOOP_CONF_DIR="/etc/hadoop"
TEMP_DIR_HADOOP="${BASE_INSTALL_DIR}/tmp/hadoop"
TEMP_DIR_HIVE="${BASE_INSTALL_DIR}/tmp/hive"
# Environment script targets
ENV_SCRIPTS=("hadoop-env.sh" "mapred-env.sh" "yarn-env.sh")
# Hadoop XML configuration definitions
CORE_SITE_PARAMS=("core-site.xml" "fs.defaultFS" "hdfs://$(hostname):9000" "hadoop.tmp.dir" "${TEMP_DIR_HADOOP}")
HDFS_SITE_PARAMS=("hdfs-site.xml" "dfs.replication" "1")
# Hive Configuration
HIVE_LOG_DIR="${BASE_INSTALL_DIR}/logs/hive"
HIVE_SITE_PARAMS=("hive-site.xml" "javax.jdo.option.ConnectionURL" "jdbc:mysql://${DB_HOST}:${DB_PORT}/${DB_NAME}?createDatabaseIfNotExist=true" "javax.jdo.option.ConnectionDriverName" "com.mysql.jdbc.Driver" "javax.jdo.option.ConnectionUserName" "${DB_USER}" "javax.jdo.option.ConnectionPassword" "${DB_PASSWORD}")
Core Function Library
The functions.sh script contains the logic for directory preparation, file extraction, and configuration modification.
#!/bin/bash
source ./env-config.sh
# Directory preparation and cleanup
prepare_directory() {
if [ -d "$1" ]; then
echo "Directory $1 exists. Cleaning contents..."
rm -rf "${1:?}"/*
else
mkdir -p "$1"
fi
}
# Extract tar.gz archives
extract_package() {
local pkg_name=$1
local target_dir=$2
local archive=$(find ../software -name "${pkg_name}*" | head -n 1)
tar -xzf "$archive" -C "$target_dir"
if [ $? -eq 0 ]; then echo "$pkg_name extracted successfully."; else exit 1; fi
}
# Modify Hadoop environment scripts (non-XML)
configure_env_scripts() {
local install_dir=$1
local hadoop_home_dir=$(ls "$install_dir" | grep hadoop)
local conf_path="${install_dir}/${hadoop_home_dir}${HADOOP_CONF_DIR}"
for script in "${ENV_SCRIPTS[@]}"; do
sed -i '/export JAVA_HOME/d' "${conf_path}/${script}"
sed -i "2a export JAVA_HOME=${JAVA_HOME_PATH}" "${conf_path}/${script}"
done
# Configure PID directory
sed -i "s|export HADOOP_PID_DIR=.*|export HADOOP_PID_DIR=${TEMP_DIR_HADOOP}/pid|g" "${conf_path}/hadoop-env.sh"
}
# Update XML configuration files using Java helper
update_xml_config() {
local config_array=("$@")
local file_name="${config_array[0]}"
local hadoop_home_dir=$(ls "${BASE_INSTALL_DIR}" | grep hadoop)
local file_path="${BASE_INSTALL_DIR}/${hadoop_home_dir}${HADOOP_CONF_DIR}/${file_name}"
local i=1
while [ $i -lt ${#config_array[@]} ]; do
local key="${config_array[$i]}"
local val="${config_array[$((i+1))]}"
java -jar ../lib/XmlUpdater.jar "$file_path" add "$key" "$val"
((i+=2))
done
}
# Format the NameNode
format_namenode() {
local hadoop_home=$(ls "${BASE_INSTALL_DIR}" | grep hadoop)
"${BASE_INSTALL_DIR}/${hadoop_home}/bin/hdfs" namenode -format
}
Main Execution Script
The main.sh orchestrates the installation sequence by sourcing the environment and function files.
#!/bin/bash
source ./env-config.sh
source ./functions.sh
# Setup directories
prepare_directory "${BASE_INSTALL_DIR}"
mkdir -p "${TEMP_DIR_HADOOP}"
# Install Packages
extract_package hadoop "${BASE_INSTALL_DIR}"
extract_package hive "${BASE_INSTALL_DIR}"
# Configure Hadoop
configure_env_scripts "${BASE_INSTALL_DIR}"
update_xml_config "${CORE_SITE_PARAMS[@]}"
update_xml_config "${HDFS_SITE_PARAMS[@]}"
# Initialize Filesystem
format_namenode
# Configure Hive (simplified example)
local hive_home=$(ls "${BASE_INSTALL_DIR}" | grep hive)
mkdir -p "${HIVE_LOG_DIR}"
cp ../lib/${MYSQL_JAR} "${BASE_INSTALL_DIR}/${hive_home}/lib/"
echo "Pseudo-distributed installation completed."
Java XML Configuration Utility
The shell scripts rely on a Java utility to manipulate XML configuration files. The following Java code uses dom4j to inject properties into Hadoop and Hive configuration files.
package com.deploy.utils;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;
import org.dom4j.io.OutputFormat;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
public class XmlConfigUpdater {
public static void main(String[] args) {
if (args.length < 4) {
System.err.println("Usage: java -jar XmlUpdater.jar <action> <key> <value>");
System.exit(1);
}
String filePath = args[0];
String key = args[2];
String value = args[3];
try {
manipulateXmlProperty(filePath, key, value);
} catch (Exception e) {
e.printStackTrace();
System.exit(1);
}
}
private static void manipulateXmlProperty(String filePath, String key, String value) throws DocumentException, IOException {
SAXReader reader = new SAXReader();
Document document = reader.read(new File(filePath));
Element root = document.getRootElement();
// Create property element structure
Element property = root.addElement("property");
property.addElement("name").setText(key);
property.addElement("value").setText(value);
// Write back to file with pretty print format
OutputFormat format = OutputFormat.createPrettyPrint();
format.setEncoding("UTF-8");
try (XMLWriter writer = new XMLWriter(new FileWriter(filePath), format)) {
writer.write(document);
}
}
}