{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ "---\n", "title: \"Titanic Dataset with the Feature Store\"\n", "date: 2021-02-24\n", "type: technical_note\n", "draft: false\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Titanic Dataset for the Feature Store" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook prepares the Titanic dataset to be used with the feature store.\n", "\n", "The Titanic dataset contains information about the passengers of the famous Titanic ship. The training and test data come in form of two CSV files, which can be downloaded from the Titanic Competition page on [Kaggle](https://www.kaggle.com/c/titanic/data).\n", "\n", "Download the `train.csv` and `test.csv` files, and upload them to the `Resources` folder of your Hopsworks Project. If you prefer doing things using GUIs, then you can find the `Resources` by opening the **Data Sets** tab on the left menu bar.\n", "\n", "Once you have the two files uploaded on the `Resources` folder, you can proceed with the rest of the notebook." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting Spark application\n" ] }, { "data": { "text/html": [ "
ID | YARN Application ID | Kind | State | Spark UI | Driver log |
---|---|---|---|---|---|
0 | application_1614293057610_0001 | pyspark | idle | Link | Link |