{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "title: \"Create Training Data from Features \"\n", "date: 2021-02-24\n", "type: technical_note\n", "draft: false\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### HSFS training datasets\n", "\n", "Training datasets is the third building block of the Hopsworks Feature Store. Data scientists can query the feature store (see [feature_exploration](./feature_exploration.ipynb) notebook) and materialize their query in training datasets.\n", "\n", "Training datasets can be saved in a ML framework friendly format (eg. TfRecords, CSV, Numpy) and then be fed to a machine learning model for training.\n", "\n", "Training datasets can also be stored on external storage systems like Amazon S3 or GCS to be read by external model training platforms.\n", "\n", "As with the previous notebooks, the first step is to establish a connection with the Hopsworks feature store and get the feature store handle" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting Spark application\n" ] }, { "data": { "text/html": [ "
ID | YARN Application ID | Kind | State | Spark UI | Driver log |
---|---|---|---|---|---|
1 | application_1612782748969_0003 | pyspark | idle | Link | Link |