To implement LZO compression within a HBase environment running on Hadoop, it is necessary to compile the native LZO libraries and the corresponding Hadoop-LZO Java bridge from source. Older guides often reference the deprecated hadoop-gpl-compression library, which is incompatible with modern Hadoop versions. The following procedure outlines the steps for Hadoop 3.1.2 and HBase 2.2.0 using the maintained hadoop-lzo repository.
1. Compiling the Native LZO Library
Begin by downloading and building the LZO source code. This ensures the native binaries are compatible with your system architecture.
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz
tar -xzf lzo-2.10.tar.gz
cd lzo-2.10
Configure the build to enable shared libraries and specify the installation prefix. Then compile and install.
./configure --enable-shared --prefix=/opt/lzo
make && make install
After installation, ensure the libraries are accessiblee. Create symbolic links in the system library path if necessary.
ln -s /opt/lzo/lib/* /usr/local/lib/
ldconfig
2. Installing the LZOP Utility
The lzop command-line tool is required for file-level operasions.
wget http://www.lzop.org/download/lzop-1.04.tar.gz
tar -xzf lzop-1.04.tar.gz
cd lzop-1.04
./configure --prefix=/opt/lzop
make && make install
ln -s /opt/lzop/bin/lzop /usr/bin/lzop
3. Building the Hadoop-LZO Module
Retrieve the latest source code for the Hadoop-LZO connector. You must modify the build configuration to match your specific Hadoop version.
git clone https://github.com/twitter/hadoop-lzo.git
cd hadoop-lzo
Edit the pom.xml file. Locate the properties section and update the Hadoop version to 3.1.2.
<properties>
<hadoop.current.version>3.1.2</hadoop.current.version>
<!-- Ensure other versions match your environment -->
</properties>
Prepare the build enviroment by setting the compiler flags to point to your LZO installation. This is critical for the native components to link correctly.
export CFLAGS=-m64
export CXXFLAGS=-m64
export C_INCLUDE_PATH=/opt/lzo/include
export LIBRARY_PATH=/opt/lzo/lib
Compile the project using Maven, skipping tests to speed up the process.
mvn clean package -Dmaven.test.skip=true
Once the build completes, you will find the JAR file and the native libraries in the target directory. These must be deployed to your Hadoop installation.
# Copy the JAR to the Hadoop classpath
cp target/hadoop-lzo-*.jar $HADOOP_HOME/share/hadoop/common/
# Extract and copy native libraries
cd target/native/Linux-amd64-64
tar -cBf - -C lib . | tar -xBvf - -C $HADOOP_HOME/lib/native/
Distribute these files to all nodes in the cluster.
4. Hadoop Configuration
Configure the environment variables in $HADOOP_HOME/etc/hadoop/hadoop-env.sh to ensure the native libraries are found.
export LD_LIBRARY_PATH=/opt/lzo/lib:$LD_LIBRARY_PATH
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
Update $HADOOP_HOME/etc/hadoop/core-site.xml to register the LZO codecs.
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
Optionally, enable LZO compression for MapReduce intermediate outputs in mapred-site.xml.
<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
Restart the Hadoop cluster to apply these changes.
5. HBase Configuration
Integrate LZO support into HBase by placing the compiled JAR into the HBase library directory.
cp target/hadoop-lzo-*.jar $HBASE_HOME/lib/
Configure the native library path in $HBASE_HOME/conf/hbase-env.sh.
export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/Linux-amd64-64/:/opt/lzo/lib
Finally, declare the supported compression codecs in $HBASE_HOME/conf/hbase-site.xml.
<property>
<name>hbase.regionserver.codecs</name>
<value>lzo</value>
</property>
Restart HBase. You can now create tables with LZO compression enabled (e.g., create 't1', {NAME => 'cf1', COMPRESSION => 'LZO'}).
Note on Legacy Dependencies
Using the legacy hadoop-gpl-compression JAR (often found in older tutorials) with Hadoop 3.x will result in java.lang.NoSuchFieldError: lzoCompressLevelFunc or ClassNotFoundException. Always use the hadoop-lzo module compiled against your specific Hadoop version to avoid these runtime failures.