{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature Store Quick Start\n", "\n", "This notebook gives you a quick overview of how you can intergrate the feature store service on Hopsworks into your machine learning pipeline. We'll go over four steps:\n", "\n", "1. Generate some sample data (rather than reading data from disk just to make this notebook stand-alone)\n", "2. Do some feature engineering on the data\n", "3. **Save the engineered features to the feature store**\n", "4. **Select a group of the features from the feature store and create a managed training dataset of tf records in the feature store**\n", "5. Train a model on the training dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports\n", "\n", "We'll use numpy and pandas for data generation, pyspark for feature engineering, tensorflow and keras for model training, and the hops `featurestore` library for interacting with the feature store." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting Spark application\n" ] }, { "data": { "text/html": [ "
ID | YARN Application ID | Kind | State | Spark UI | Driver log | Current session? |
---|---|---|---|---|---|---|
13 | application_1549128638243_0017 | pyspark | idle | Link | Link | ✔ |