- Overview of Hive
1.1 Hive functions as a data warehouse within the Hadoop ecosystem. It manages and queries data stored in Hadoop. Essentially, Hive serves as an SQL parsing engine that converts SQL queries into MapReduce jobs.
Hive includes a mapping tool that translates SQL tables and columns into files and directories on HDFS. This mapping system is known as the metastore, typically stored in Derby or MySQL.
1.2 By default, Hive stores data in /user/hive/warehouse, which is determined by the hive.metastore.waerhouse.dir parameter in the hive-conf.xml configuration file. - Installation of Hive
(1) Extract, rename, and set environment variables.
(2) In the $HIVE_HOME/conf directory, execute the command mv hive-default.xml.template hive-site.xml.
In the $HIVE_HOME/conf directory, execute the command mv hive-env.sh.template hive-env.sh.
(3) Modify the hadoop-env.sh configuration file with the following content:
export HADOOP_CLASSPATH=.:$CLASSPATH:$HADOOP_CLASSPATH:$HADOOP_HOME/bin
(4) In the $HIVE_HOME/bin directory, modify the hive-config.sh file and add the following lines:
export JAVA_HOME=/usr/local/jdk
export HIVE_HOME=/usr/local/hive
export HADOOP_HOME=/usr/local/hadoop
- Installing MySQL
(1) Remove existing MySQL packages using the command rpm -e xxxxxxx --nodeps.
Run the command rpm -qa | grep mysql to check if its fully removed.
(2) Install the MySQL server by running rpm -i mysql-server-5.5.31-2.e16.i686.rpm.
(3) Start the MySQL server with the command mysqld_safe &.
(4) Install the MySQL client by running rpm -i mysql-client-5.5.31-2.e16.i686.rpm.
(5) Set the root password using the command mysql_secure_installation.
4. Using MySQL as the Hive Metastore
(1) Place the MySQL JDBC driver (found in Downloads) into the Hive lib directory.
Execute the command: cp mysql-connection-java-5.1.10.jar /usr/local/hive/lib/.
(2) Edit the hive/conf/hive-site.xml file and update the following properties:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop0:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>admin</value>
</property>
- Internal Tables
CREATE TABLE t1(id int);
LOAD DATA LOCAL INPATH '/root/id' INTO TABLE t1; // Upload from HDFS
CREATE TABLE t2(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
- Partitioned Tables
CREATE TABLE t3(id int) PARTITIONED BY (day int);
LOAD DATA LOCAL INPATH '/root/id' INTO TABLE t1 PARTITION (day=22);
- Bucketed Tables
create table t4(id int) clustered by(id) into 4 buckets;
set hive.enforce.bucketing = true;
insert into table t4 select id from t3;
- External Tables
create external table t5(id int) location '/external';