{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Plotting With SparkMagic on Hops\n", "\n", "To run large scale computations in a hops cluster from Jupyter we use sparkmagic, a livy REST server, and the pyspark kernel. \n", "\n", "The fact that the default computation on a cluster is distributed over several machines makes it a little different to do things such as plotting compared to when running code locally. \n", "\n", "This notebook illustrates how you can combine plotting and large-scale computations on a Hops cluster in a single notebook." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting Spark application\n" ] }, { "data": { "text/html": [ "
ID | YARN Application ID | Kind | State | Spark UI | Driver log |
---|---|---|---|---|---|
4 | application_1582818676081_0008 | pyspark | idle | Link | Link |
Magic | \n", "Example | \n", "Explanation | \n", "
---|---|---|
info | \n", "%%info | \n", "Outputs session information for the current Livy endpoint. | \n", "
cleanup | \n", "%%cleanup -f | \n", "Deletes all sessions for the current Livy endpoint, including this notebook's session. The force flag is mandatory. | \n", "
delete | \n", "%%delete -f -s 0 | \n", "Deletes a session by number for the current Livy endpoint. Cannot delete this kernel's session. | \n", "
logs | \n", "%%logs | \n", "Outputs the current session's Livy logs. | \n", "
configure | \n", "%%configure -f {\"executorMemory\": \"1000M\", \"executorCores\": 4} | \n",
" Configure the session creation parameters. The force flag is mandatory if a session has already been\n",
" created and the session will be dropped and recreated. Look at \n", " Livy's POST /sessions Request Body for a list of valid parameters. Parameters must be passed in as a JSON string. | \n",
"
spark | \n", "%%spark -o df df = spark.read.parquet('... | \n",
" Executes spark commands.\n",
" Parameters:\n",
"
| \n",
"
sql | \n", "%%sql -o tables -q SHOW TABLES | \n",
" Executes a SQL query against the variable sqlContext (Spark v1.x) or spark (Spark v2.x).\n",
" Parameters:\n",
"
| \n",
"
local | \n", "%%local a = 1 | \n",
" All the code in subsequent lines will be executed locally. Code must be valid Python code. | \n", "
send_to_spark | \n", "%%send_to_spark -o variable -t str -n var | \n", "Sends a variable from local output to spark cluster.\n",
" \n", " Parameters:\n", "
| \n",
"