Setting Up a Standalone Hadoop and Spark Environment

System Requirements

Operating System: CentOS 7 (virtual machine)
CPU: 2 cores
Memory: 2 GB
Disk: 40 GB

Software Versions

  • JDK: 1.8 (jdk-8u144-linux-x64.tar.gz)
  • Hadoop: 2.8.2 (hadoop-2.8.2.tar.gz)
  • Scala: 2.12.2 (scala-2.12.2.tgz)
  • Spark: 1.6.3 (spark-1.6.3-bin-hadoop2.4-without-hive.tgz)

Initial System Configuration

Set Hostname

hostnamectl set-hostname master
reboot

Configure Hosts Mapping

Edit /etc/hosts:

192.168.219.128 master

Disable Firewall

For CentOS 7:

systemctl stop firewalld.service
systemctl disable firewalld.service

Synchronize Time

Verify system time with date. If needed, adjust using:

date -s 'MMDDhhmmYYYY.ss'

Install Scala

Extract and rleocate the Scala distribution:

tar -xvf scala-2.12.2.tgz
mv scala-2.12.2 /opt/scala/scala2.1

Add to /etc/profile:

export SCALA_HOME=/opt/scala/scala2.1
export PATH=$PATH:$SCALA_HOME/bin

Reload profile and verify:

source /etc/profile
scala -version

Install Spark

Extract and move Spark:

tar -xvf spark-1.6.3-bin-hadoop2.4-without-hive.tgz
mv spark-1.6.3-bin-hadoop2.4-without-hive /opt/spark/spark1.6-hadoop2.4-hive

Update /etc/profile:

export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive
export PATH=$PATH:$SPARK_HOME/bin

Reload environment:

source /etc/profile

Configure Spark

Navigate to $SPARK_HOME/conf and create spark-env.sh from the template:

cp spark-env.sh.template spark-env.sh

Edit spark-env.sh:

export JAVA_HOME=/opt/java/jdk1.8
export SCALA_HOME=/opt/scala/scala2.1
export HADOOP_HOME=/opt/hadoop/hadoop2.8
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive
export SPARK_MASTER_HOST=master
export SPARK_EXECUTOR_MEMORY=1g

Configure Hadoop

Enviroment Variables

Add to /etc/profile:

export HADOOP_HOME=/opt/hadoop/hadoop2.8
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$PATH:$HADOOP_HOME/bin

Core Configuration Files

core-site.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/root/hadoop/tmp</value>
  </property>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
  </property>
</configuration>

hadoop-env.sh

Set explicit JDK path:

export JAVA_HOME=/opt/java/jdk1.8

hdfs-site.xml

<configuration>
  <property>
    <name>dfs.name.dir</name>
    <value>/root/hadoop/dfs/name</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/root/hadoop/dfs/data</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
</configuration>

mapred-site.xml

Create from tempalte and configure:

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
  </property>
</configuration>

Initialize and Start Hadoop

Format the NameNode:

$HADOOP_HOME/bin/hdfs namenode -format

Start services:

$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh

Verify via web UIs at http://<ip>:50070 (HDFS) and http://<ip>:8088 (YARN).

Start Spark

Ensure Hadoop is running, then launch Spark:

$SPARK_HOME/sbin/start-all.sh

Access the Spark Web UI at http://192.168.219.128:8080. If unreachable, confirm firewall status, validate process presence with jps, and review all configuration paths.

Tags: Hadoop Spark Scala Big Data Environment Setup

Posted on Wed, 13 May 2026 15:09:50 +0000 by tracy