Hadoop Pseudo-Distributed Mode Issues
Configuration Parsing Failure in hdfs-site.xml
When you encounter FATAL conf.Configuration: error parsing conf hdfs-site.xml, the root cause is typically an encoding mismatch. Resolve it by opening the file and saving it with a uniform character encoding such as UTF-8.
HDFS Command Deprecation Warning
The warning Use of this script to execute hdfs command is deprecated appears due to an older Hadoop version. Replace hadoop commands with hdfs to suppress it.
Missing NameNode Path Specification
org.apache.hadoop.hdfs.server.namenode.NameNode errors often indicate that the installation path hasn’t been set. Edit hadoop-env.sh, located under $HADOOP_HOME/etc/hadoop/, and append:
export HADOOP_PREFIX=/usr/local/hadoop/hadoop-2.8.2
General Hadoop Configuration and Startup Errors
SSH Hostname Resolution Failure at Startup
Error: localhost: ssh: Could not resolve hostname localhost: Temporary failure in name resolution
This usually means environment variables are missing or haven't taken effect. Define them in /etc/profile:
export JAVA_HOME=/opt/java/jdk
export HADOOP_HOME=/opt/hadoop/hadoop2.8
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
Then apply the changes:
source /etc/profile
Directory Creation Failure for Hive Warehouse
If mkdir: '/user/hive/warehouse': No such file or directory occurs, the command format needs correction. Use:
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
Profile Parsing Due to Trailing Spaces
A message like bash:... : is a directory may stem from unexpected spaces in environment variable definitions inside /etc/profile. Remove trailing whitespace and re-source the file.
Native Library Loading Warning
The warning Unable to load native-hadoop library for your platform... indicates that the bundled 32-bit library isn’t compatible with a 64-bit host.
Steps:
- Obtain a pre-compiled 64-bit version, for example from
http://dl.bintray.com/sequenceiq/sequenceiq-bin/. - Extract the archive into
$HADOOP_HOME/liband$HADOOP_HOME/lib/native. - Set environment variables:
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" - Verify with
hadoop checknative -a.
NameNode Not Starting After Configuration
If the NameNode fails to launch, inspect the configuration files located in $HADOOP_HOME/etc/hadoop carefully for misconfigurations in the cluster setup.
Spark Errors
JDO Transactional Connection Factory Creation Failure
When Spark SQL throws javax.jdo.JDOFatalInternalException: Error creating transactional connection factory, the JDBC driver is missing from the classpath. In spark-env.sh add:
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/spark/spark2.2/jars/mysql-connector-java-5.1.41.jar
Alternatively, launch spark-sql with the driver directly:
spark-sql --driver-class-path /opt/spark/spark2.2/jars/mysql-connector-java-5.1.41.jar
Excessive Logging in spark-sql
Spark's log level defaults to INFO, producing verbose output. Switch to WARN in $SPARK_HOME/conf:
cp log4j.properties.template log4j.properties
Then edit log4j.properties and change:
log4j.rootCategory=WARN, console
SparkSQLCLIDriver Launch Error
If org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver fails, modify the spark-sql script in $SPARK_HOME/bin to include the required assembly jar:
exec "${SPARK_HOME}" /bin/spark-submit \
-jars /opt/spark/spark1.6-hadoop2.4-hive/lib/spark-assembly-1.6.3-hadoop2.4.0.jar \
--class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver "$@"
HBase Errors
Start Command Interpreted as Directory
-bash: /opt/hbase/hbase-1.2.6/bin: is a directory means the launch command was malformed or Hadoop isn’t running. Verify command syntax and ensure Hadoop services are active before starting HBase.
Java API Connection Timeout Due to Hostname Resolution
Exceptions like RetriesExhaustedException with SocketTimeoutException often occur when connecting via hostname without proper DNS resolution on the client machine. Edit the local hosts file (C:\Windows\System32\drivers\etc\hosts on Windows) and add entries:
192.169.0.23 master
192.169.0.24 slave1
192.169.0.25 slave2
Hive Errors
SessionHiveMetaStoreClient Instantiation Failure
Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient suggests the metastore schema hasn’t been initialized. Run:
schematool -dbType mysql -initSchema
Multiple SLF4J Bindings Warning
Class path contains multiple SLF4J bindings indicates conflicting logging jars. Remove one of the duplicate slf4j.jar files from either Hive’s or Hadoop’s lib directory.
JDBC Remote Connection Refused
HIVE2 Error: Failed to open new session with a RemoteException denotes missing proxy user configuration. Add to hadoop/conf/core-site.xml:
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
Restart Hadoop afterwards.
Connection Refused on Port 10000
When you see jdbc connection refused, first check if the HiveServer2 service is listening:
netstat -anp | grep 10000
If not, ensure hive-site.xml contains:
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>master</value>
</property>
Then launch the service manually:
hive --service hiveserver2
Deprecated hive.metastore.local Warning
WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist appears because this option was removed after Hive 1.0. Delete the corresponding <property> block from hive-site.xml.
Missing Scala Classes in Hive on Spark
java.lang.NoClassDefFoundError: scala/collection/Iterable occurs when the required Spark assembly jar isn’t available. Copy the jar (e.g., spark-assembly-1.6.3-hadoop2.4.0.jar from a spark-without-hive distribution) into $HIVE_HOME/lib and reference it in hive-env.sh.
Spark Task Execution Failure
Failed to execute spark task... Failed to create spark client. indicates Hive cannot connect to the Spark master. Ensure version compatibility and add to hive-site.xml:
<property>
<name>spark.master</name>
<value>spark://hserver1:7077</value>
</property>
Duplicate Key During Metastore Initialization
Error: Duplicate key name 'PCS_STATS_IDX' suggests a leftover metastore database. Remove the existing metastore_db directory and retry.
MySQL Access Denied During Schema Initialization
Access denied for user 'root'@'master' may persist evenif credentials are correct if the connection string uses a hostname. Change the JDBC URL in hive-site.xml to use the IP address instead.
MapReduce Memory Failure in Hive Joins
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask occurs due to insufficient memory. Increase limits:
set mapred.reduce.tasks = 2000;
set mapreduce.reduce.memory.mb=16384;
set mapreduce.reduce.java.opts=-Xmx16384m;
ZooKeeper Errors
Status Check Fails After Cluster Startup
Error contacting service. It is probably not running. when querying zkServer.sh status means the ensemble setup isn’t valid.
- Disable firewalls on all nodes.
- Verify
myidmatches theserver.Xentries inzoo.cfg, and that no extra spaces exist. - Confirm ZooKeeper processes are running with
jps. - Query status only after every node in the cluster has been started.
Sample zoo.cfg:
dataDir=/opt/zookeeper/data
dataLogDir=/opt/zookeeper/dataLog
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
Corresponding myid files contain the IDs: 1, 2, and 3 respectively.