Configuring LZO Compression for Hadoop 3.1.2 and HBase 2.2.0

To implement LZO compression within a HBase environment running on Hadoop, it is necessary to compile the native LZO libraries and the corresponding Hadoop-LZO Java bridge from source. Older guides often reference the deprecated hadoop-gpl-compression library, which is incompatible with modern Hadoop versions. The following procedure outlines the steps for Hadoop 3.1.2 and HBase 2.2.0 using the maintained hadoop-lzo repository.

1. Compiling the Native LZO Library

Begin by downloading and building the LZO source code. This ensures the native binaries are compatible with your system architecture.

wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz
tar -xzf lzo-2.10.tar.gz
cd lzo-2.10

Configure the build to enable shared libraries and specify the installation prefix. Then compile and install.

./configure --enable-shared --prefix=/opt/lzo
make && make install

After installation, ensure the libraries are accessiblee. Create symbolic links in the system library path if necessary.

ln -s /opt/lzo/lib/* /usr/local/lib/
ldconfig

2. Installing the LZOP Utility

The lzop command-line tool is required for file-level operasions.

wget http://www.lzop.org/download/lzop-1.04.tar.gz
tar -xzf lzop-1.04.tar.gz
cd lzop-1.04
./configure --prefix=/opt/lzop
make && make install
ln -s /opt/lzop/bin/lzop /usr/bin/lzop

3. Building the Hadoop-LZO Module

Retrieve the latest source code for the Hadoop-LZO connector. You must modify the build configuration to match your specific Hadoop version.

git clone https://github.com/twitter/hadoop-lzo.git
cd hadoop-lzo

Edit the pom.xml file. Locate the properties section and update the Hadoop version to 3.1.2.

<properties>
    <hadoop.current.version>3.1.2</hadoop.current.version>
    <!-- Ensure other versions match your environment -->
</properties>

Prepare the build enviroment by setting the compiler flags to point to your LZO installation. This is critical for the native components to link correctly.

export CFLAGS=-m64
export CXXFLAGS=-m64
export C_INCLUDE_PATH=/opt/lzo/include
export LIBRARY_PATH=/opt/lzo/lib

Compile the project using Maven, skipping tests to speed up the process.

mvn clean package -Dmaven.test.skip=true

Once the build completes, you will find the JAR file and the native libraries in the target directory. These must be deployed to your Hadoop installation.

# Copy the JAR to the Hadoop classpath
cp target/hadoop-lzo-*.jar $HADOOP_HOME/share/hadoop/common/

# Extract and copy native libraries
cd target/native/Linux-amd64-64
tar -cBf - -C lib . | tar -xBvf - -C $HADOOP_HOME/lib/native/

Distribute these files to all nodes in the cluster.

4. Hadoop Configuration

Configure the environment variables in $HADOOP_HOME/etc/hadoop/hadoop-env.sh to ensure the native libraries are found.

export LD_LIBRARY_PATH=/opt/lzo/lib:$LD_LIBRARY_PATH
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

Update $HADOOP_HOME/etc/hadoop/core-site.xml to register the LZO codecs.

<property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec,
           org.apache.hadoop.io.compress.GzipCodec,
           org.apache.hadoop.io.compress.BZip2Codec,
           com.hadoop.compression.lzo.LzoCodec,
           com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

Optionally, enable LZO compression for MapReduce intermediate outputs in mapred-site.xml.

<property>
    <name>mapreduce.map.output.compress</name>
    <value>true</value>
</property>
<property>
    <name>mapreduce.map.output.compress.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

Restart the Hadoop cluster to apply these changes.

5. HBase Configuration

Integrate LZO support into HBase by placing the compiled JAR into the HBase library directory.

cp target/hadoop-lzo-*.jar $HBASE_HOME/lib/

Configure the native library path in $HBASE_HOME/conf/hbase-env.sh.

export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/Linux-amd64-64/:/opt/lzo/lib

Finally, declare the supported compression codecs in $HBASE_HOME/conf/hbase-site.xml.

<property>
    <name>hbase.regionserver.codecs</name>
    <value>lzo</value>
</property>

Restart HBase. You can now create tables with LZO compression enabled (e.g., create 't1', {NAME => 'cf1', COMPRESSION => 'LZO'}).

Note on Legacy Dependencies

Using the legacy hadoop-gpl-compression JAR (often found in older tutorials) with Hadoop 3.x will result in java.lang.NoSuchFieldError: lzoCompressLevelFunc or ClassNotFoundException. Always use the hadoop-lzo module compiled against your specific Hadoop version to avoid these runtime failures.

Tags: Hadoop HBase LZO compression Big Data

Posted on Mon, 18 May 2026 18:24:19 +0000 by neron-fx