Troubleshooting Common Errors in Big Data Environment Setup: Hadoop, Spark, HBase, Hive, and ZooKeeper

Hadoop Pseudo-Distributed Mode Issues

Configuration Parsing Failure in hdfs-site.xml

When you encounter FATAL conf.Configuration: error parsing conf hdfs-site.xml, the root cause is typically an encoding mismatch. Resolve it by opening the file and saving it with a uniform character encoding such as UTF-8.

HDFS Command Deprecation Warning

The warning Use of this script to execute hdfs command is deprecated appears due to an older Hadoop version. Replace hadoop commands with hdfs to suppress it.

Missing NameNode Path Specification

org.apache.hadoop.hdfs.server.namenode.NameNode errors often indicate that the installation path hasn’t been set. Edit hadoop-env.sh, located under $HADOOP_HOME/etc/hadoop/, and append:

export HADOOP_PREFIX=/usr/local/hadoop/hadoop-2.8.2

General Hadoop Configuration and Startup Errors

SSH Hostname Resolution Failure at Startup

Error: localhost: ssh: Could not resolve hostname localhost: Temporary failure in name resolution This usually means environment variables are missing or haven't taken effect. Define them in /etc/profile:

export JAVA_HOME=/opt/java/jdk
export HADOOP_HOME=/opt/hadoop/hadoop2.8 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

Then apply the changes:

source /etc/profile

Directory Creation Failure for Hive Warehouse

If mkdir: '/user/hive/warehouse': No such file or directory occurs, the command format needs correction. Use:

$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse

Profile Parsing Due to Trailing Spaces

A message like bash:... : is a directory may stem from unexpected spaces in environment variable definitions inside /etc/profile. Remove trailing whitespace and re-source the file.

Native Library Loading Warning

The warning Unable to load native-hadoop library for your platform... indicates that the bundled 32-bit library isn’t compatible with a 64-bit host.

Steps:

  • Obtain a pre-compiled 64-bit version, for example from http://dl.bintray.com/sequenceiq/sequenceiq-bin/.
  • Extract the archive into $HADOOP_HOME/lib and $HADOOP_HOME/lib/native.
  • Set environment variables:
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native  
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
    
  • Verify with hadoop checknative -a.

NameNode Not Starting After Configuration

If the NameNode fails to launch, inspect the configuration files located in $HADOOP_HOME/etc/hadoop carefully for misconfigurations in the cluster setup.

Spark Errors

JDO Transactional Connection Factory Creation Failure

When Spark SQL throws javax.jdo.JDOFatalInternalException: Error creating transactional connection factory, the JDBC driver is missing from the classpath. In spark-env.sh add:

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/spark/spark2.2/jars/mysql-connector-java-5.1.41.jar

Alternatively, launch spark-sql with the driver directly:

spark-sql --driver-class-path /opt/spark/spark2.2/jars/mysql-connector-java-5.1.41.jar

Excessive Logging in spark-sql

Spark's log level defaults to INFO, producing verbose output. Switch to WARN in $SPARK_HOME/conf:

cp log4j.properties.template log4j.properties

Then edit log4j.properties and change:

log4j.rootCategory=WARN, console

SparkSQLCLIDriver Launch Error

If org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver fails, modify the spark-sql script in $SPARK_HOME/bin to include the required assembly jar:

exec "${SPARK_HOME}" /bin/spark-submit \
  -jars /opt/spark/spark1.6-hadoop2.4-hive/lib/spark-assembly-1.6.3-hadoop2.4.0.jar \
  --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver "$@"

HBase Errors

Start Command Interpreted as Directory

-bash: /opt/hbase/hbase-1.2.6/bin: is a directory means the launch command was malformed or Hadoop isn’t running. Verify command syntax and ensure Hadoop services are active before starting HBase.

Java API Connection Timeout Due to Hostname Resolution

Exceptions like RetriesExhaustedException with SocketTimeoutException often occur when connecting via hostname without proper DNS resolution on the client machine. Edit the local hosts file (C:\Windows\System32\drivers\etc\hosts on Windows) and add entries:

192.169.0.23 master
192.169.0.24 slave1
192.169.0.25 slave2

Hive Errors

SessionHiveMetaStoreClient Instantiation Failure

Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient suggests the metastore schema hasn’t been initialized. Run:

schematool -dbType mysql -initSchema

Multiple SLF4J Bindings Warning

Class path contains multiple SLF4J bindings indicates conflicting logging jars. Remove one of the duplicate slf4j.jar files from either Hive’s or Hadoop’s lib directory.

JDBC Remote Connection Refused

HIVE2 Error: Failed to open new session with a RemoteException denotes missing proxy user configuration. Add to hadoop/conf/core-site.xml:

<property>
  <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>

Restart Hadoop afterwards.

Connection Refused on Port 10000

When you see jdbc connection refused, first check if the HiveServer2 service is listening:

netstat -anp | grep 10000

If not, ensure hive-site.xml contains:

<property> 
  <name>hive.server2.thrift.port</name> 
  <value>10000</value> 
</property>
<property>
  <name>hive.server2.thrift.bind.host</name>
  <value>master</value>
</property>

Then launch the service manually:

hive --service hiveserver2

Deprecated hive.metastore.local Warning

WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist appears because this option was removed after Hive 1.0. Delete the corresponding <property> block from hive-site.xml.

Missing Scala Classes in Hive on Spark

java.lang.NoClassDefFoundError: scala/collection/Iterable occurs when the required Spark assembly jar isn’t available. Copy the jar (e.g., spark-assembly-1.6.3-hadoop2.4.0.jar from a spark-without-hive distribution) into $HIVE_HOME/lib and reference it in hive-env.sh.

Spark Task Execution Failure

Failed to execute spark task... Failed to create spark client. indicates Hive cannot connect to the Spark master. Ensure version compatibility and add to hive-site.xml:

<property>
  <name>spark.master</name>        
  <value>spark://hserver1:7077</value>      
</property>

Duplicate Key During Metastore Initialization

Error: Duplicate key name 'PCS_STATS_IDX' suggests a leftover metastore database. Remove the existing metastore_db directory and retry.

MySQL Access Denied During Schema Initialization

Access denied for user 'root'@'master' may persist evenif credentials are correct if the connection string uses a hostname. Change the JDBC URL in hive-site.xml to use the IP address instead.

MapReduce Memory Failure in Hive Joins

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask occurs due to insufficient memory. Increase limits:

set mapred.reduce.tasks = 2000;
set mapreduce.reduce.memory.mb=16384;
set mapreduce.reduce.java.opts=-Xmx16384m;

ZooKeeper Errors

Status Check Fails After Cluster Startup

Error contacting service. It is probably not running. when querying zkServer.sh status means the ensemble setup isn’t valid.

  • Disable firewalls on all nodes.
  • Verify myid matches the server.X entries in zoo.cfg, and that no extra spaces exist.
  • Confirm ZooKeeper processes are running with jps.
  • Query status only after every node in the cluster has been started.

Sample zoo.cfg:

dataDir=/opt/zookeeper/data
dataLogDir=/opt/zookeeper/dataLog
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

Corresponding myid files contain the IDs: 1, 2, and 3 respectively.

Tags: Hadoop Spark HBase Hive ZooKeeper

Posted on Wed, 13 May 2026 16:56:31 +0000 by ashutosh.titan