System Requirements
Operating System: CentOS 7 (virtual machine)
CPU: 2 cores
Memory: 2 GB
Disk: 40 GB
Software Versions
- JDK: 1.8 (
jdk-8u144-linux-x64.tar.gz) - Hadoop: 2.8.2 (
hadoop-2.8.2.tar.gz) - Scala: 2.12.2 (
scala-2.12.2.tgz) - Spark: 1.6.3 (
spark-1.6.3-bin-hadoop2.4-without-hive.tgz)
Initial System Configuration
Set Hostname
hostnamectl set-hostname master
reboot
Configure Hosts Mapping
Edit /etc/hosts:
192.168.219.128 master
Disable Firewall
For CentOS 7:
systemctl stop firewalld.service
systemctl disable firewalld.service
Synchronize Time
Verify system time with date. If needed, adjust using:
date -s 'MMDDhhmmYYYY.ss'
Install Scala
Extract and rleocate the Scala distribution:
tar -xvf scala-2.12.2.tgz
mv scala-2.12.2 /opt/scala/scala2.1
Add to /etc/profile:
export SCALA_HOME=/opt/scala/scala2.1
export PATH=$PATH:$SCALA_HOME/bin
Reload profile and verify:
source /etc/profile
scala -version
Install Spark
Extract and move Spark:
tar -xvf spark-1.6.3-bin-hadoop2.4-without-hive.tgz
mv spark-1.6.3-bin-hadoop2.4-without-hive /opt/spark/spark1.6-hadoop2.4-hive
Update /etc/profile:
export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive
export PATH=$PATH:$SPARK_HOME/bin
Reload environment:
source /etc/profile
Configure Spark
Navigate to $SPARK_HOME/conf and create spark-env.sh from the template:
cp spark-env.sh.template spark-env.sh
Edit spark-env.sh:
export JAVA_HOME=/opt/java/jdk1.8
export SCALA_HOME=/opt/scala/scala2.1
export HADOOP_HOME=/opt/hadoop/hadoop2.8
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive
export SPARK_MASTER_HOST=master
export SPARK_EXECUTOR_MEMORY=1g
Configure Hadoop
Enviroment Variables
Add to /etc/profile:
export HADOOP_HOME=/opt/hadoop/hadoop2.8
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$PATH:$HADOOP_HOME/bin
Core Configuration Files
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
hadoop-env.sh
Set explicit JDK path:
export JAVA_HOME=/opt/java/jdk1.8
hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/root/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/root/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
mapred-site.xml
Create from tempalte and configure:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
</configuration>
Initialize and Start Hadoop
Format the NameNode:
$HADOOP_HOME/bin/hdfs namenode -format
Start services:
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
Verify via web UIs at http://<ip>:50070 (HDFS) and http://<ip>:8088 (YARN).
Start Spark
Ensure Hadoop is running, then launch Spark:
$SPARK_HOME/sbin/start-all.sh
Access the Spark Web UI at http://192.168.219.128:8080. If unreachable, confirm firewall status, validate process presence with jps, and review all configuration paths.