Members of a project in Hopsworks, can launch the following types of applications through a project’s Jobs service:
If you are a beginner it is highly recommended to click on the Spark
button at landing page under the available tours. It will guide you through launching your
first Spark application and the steps for launching any job type are similar. Details on running Python programs
are provided in the Python section below.
To create a new job, click on the Jobs
tab from the Project Menu and
follow the steps below:
New Job
button on the top left cornerCreate
buttonRun
button to launch your job. If no default arguments have been configured, a dialog textbox will ask for any runtime arguments the job may require. If this job requires no arguments, the field can be left empty. The figure below shows the dialog.After creating a job by following the new job wizard, you can manage all jobs and their runs from the landing page of the Jobs service. The figure below shows a project with 6 jobs where 5 jobs are shown per page. When a job has run at least once, all past and current runs are then shown in the UI.
Users can interact with the jobs in the following ways:
10. Export a job, which prompts the user to download a json file. A job can then be imported by clicking on the New Job and then Import Job button.
Additionally, users click on a job and view additional information about their runs.
By default all files and folders created by Spark are group writable (i.e umask=007). If you want to change this
default umask you can add additional spark property spark.hadoop.fs.permissions.umask-mode=<umask>
in More Spark Properties when you create a new job.
(Available in Hopsworks Enterprise only)
There are three ways of running Python programs in Hopsworks:
The GIF below demonstrates how to create a Python job from the Jobs UI by selecting a python file that is already
uploaded in a Hopsworks dataset and attaching a few other files to be immediately available to the application at
runtime. However, any file can be made available to the application at runtime by using in the Python app to run, the
copy_to_local
function of the hdfs
module of the hops
Python library
http://hops-py.logicalclocks.com/hops.html#module-hops.hdfs
You do not have to upload the Python program UI to run it. That can be done so from within the Python program by using
upload
function of the dataset
module of the hops
Python library http://hops-py.logicalclocks.com
To do that, first generate an API key for your project, see Generate an API key,
and then use the project.connect()
function of the same
library to connect to a project of your Hopsworks cluster and then dataset.upload
.
(Available in Hopsworks Enterprise only)
The Docker job type in Hopsworks enables running your own Docker containers as jobs in Hopsworks. With the Docker job type, users are no longer restricted in running only Python, Spark/PySpark and Flink programs, but can now utilize the Hopsworks Jobs service to run any program/service does it packaged in a Docker container.
As seen the screenshot below, users can set the following Docker job specific properties (advanced properties are optional):
Admin options
The following options can be set using the Variables service within the Admin UI of Hopsworks:
Below you can find some examples showing how to set various Docker job options. Despite all jobs using commands and arguments differently, the output of all jobs is the equivalent. You can choose whichever setup is convenient for your use-case, keep in mind that defaultArgs and execution args are provided in a single line (String variable). If the job fails and no out/error logs are available, make sure the commands and arguments are properly formatted, for example not trailing whitespace characters are present.
The command to run is /bin/sh -c sleep 10 && cp /Projects/p1/Jupyter/README.md /Projects/p1/Resources/README_Jupyter.md && ls /
Example 1: A job with multiple commands and no arguments
Example 2: A job with multiple commands and default arguments
Example 3: A job with multiple commands and no arguments (requested upon execution)
Below you can see how to view the stdout and stderr job logs.
It is also possible to work on jobs while developing in your IntelliJ/PyCharm IDE by installing the Hopsworks Plugin from the marketplace.
Usage
Python
only supports Hopsworks-EEActions
Support for running Flink jobs
Create Job
to first create a flink
job in Hopsworks.Run Job
. This will first start a flink cluster if
there is no active running flink job with same job name. Otherwise it
will use an active running flink cluster with same job name. Next it
will upload and submit your program to a running flink cluster.Main Class
field in
preferences. To pass arguments, simply fill it in the
User Arguments
, multiple arguments separated by space. e.g.
–arg1 a1 –arg2 a2