2017-01-11

Install Spark on Ubuntu

# Install Java

$ sudo apt install python-software-properties
$ sudo add-apt-repository ppa:webupd&team/java
$ sudo apt update
$ sudo apt install oracle-java8-installer

# Install Scala
$ sudo apt install scala
# Install Spark
Download prebuilt Spark from http://spark.apache.org/downloads.html
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
$ sudo mv spark-2.1.0-bin-hadoop2.7.tgz /opt
$ sudo tar xfz spark-2.1.9-bin-hadoop2.7.tgz
$ sudo ln -s spark-2.1.9-bin-hadooop2.7 spark
# Add environment variables and add path
$ vi ~/.profile
JAVA_HOME=/usr/lib/jvm/java-8-oracle
SCALA_HOME=/usr/share/scala
SPARK_HOME=/opt/spark
PYTHONPATH=$SPARK_HOME/python/lib/pyspark.zip:$SPARK_HOME/python/lib/py4j-0.10-4-src.zip
export JAVA_HOME SPARK_HOME PYTHONPATH
PATH="$HOME/bin:$HOME/.local/bin:$PATH:$SPARK_HOME/bin"
# Edit configuration(Optioinal) to reduce logs displayed
$ cd $SPARK_HOME/conf
$ sudo cp log4j.properties.template log4j.properties
$ sudo vi log4j.properties
FIND : log4j.rootCategory=INFO, console
REPLACE : log4j.rootCategory=WARN, console
# Test if Spark works
$ run-example SparkPi 10
Pi is roughly 3.140963140963141
# Launch Spark Shell(run scala language) or PySpark(run python language)
$ spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/
        
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.
scala >
$ pyspark
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/
Using Python version 2.7.12 (default, Nov 19 2016 06:48:10)
SparkSession available as 'spark'.
 > > > 
$ Spark UI
http://localhost:4040

No comments:

Post a Comment