{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "title: \"Databricks Azure Feature Store Quickstart\"\n", "date: 2021-02-24\n", "type: technical_note\n", "draft: false\n", "---" ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "inputWidgets": {}, "nuid": "26dbe322-f2e0-4b58-96c8-ec0de1564127", "showTitle": false, "title": "" } }, "source": [ "# Databricks Azure Feature Store Quick Start\n", "\n", "This notebook gives you a quick overview of how you can intergrate the Feature Store on Hopsworks with Databricks and Azure ADL. We'll go over four steps:\n", "\n", "- Generate some sample data and store it on ADL\n", "- Do some feature engineering with Databricks and the data from ADL\n", "- Save the engineered features to the Feature Store\n", "- Select a group of the features from the Feature Store and create a training dataset\n", "\n", "This requries configuring the Databricks cluster to be able to interact with Hopsworks Feature Store, see [Databricks Quick Start](https://docs.hopsworks.ai/feature-store-api/latest/integrations/databricks/configuration/)." ] }, { "cell_type": "markdown", "metadata": { "application/vnd.databricks.v1+cell": { "inputWidgets": {}, "nuid": "d72a3a79-1c6e-41e2-bda7-73ce32bb3a0d", "showTitle": false, "title": "" } }, "source": [ "### Imports\n", "\n", "We'll use numpy and pandas for data generation, pyspark for feature engineering, tensorflow and keras for model training, and the `hsfs` library to interact with the Hopsworks Feature Store." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "application/vnd.databricks.v1+cell": { "inputWidgets": {}, "nuid": "8b87abd1-30c0-4240-8460-bdf7d26024ea", "showTitle": false, "title": "" } }, "outputs": [ { "data": { "text/html": [ "\n", "