{ "cells": [ { "cell_type": "markdown", "id": "c78f87a8", "metadata": {}, "source": [ "---\n", "title: \"Snowflake: PySpark External Tables (On-Demand FGs)\"\n", "date: 2021-02-24\n", "type: technical_note\n", "draft: false\n", "---" ] }, { "cell_type": "markdown", "id": "cae35164", "metadata": {}, "source": [ "## Snowflake (On-Demand) Feature Group \n", "This tutorial notebook will show how you can define an external feature group for a table in Snowflake.\n", "\n", "In this notebook we assume that you already have snowflake account and did the getting started with snowflake tutorial." ] }, { "cell_type": "code", "execution_count": 1, "id": "bc509dff", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting Spark application\n" ] }, { "data": { "text/html": [ "\n", "
IDYARN Application IDKindStateSpark UIDriver log
1application_1614610677629_0002pysparkidleLinkLink
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "SparkSession available as 'spark'.\n" ] } ], "source": [ "import hsfs" ] }, { "cell_type": "code", "execution_count": 2, "id": "a414c7a2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected. Call `.close()` to terminate connection gracefully." ] } ], "source": [ "connection = hsfs.connection()\n", "# Get the feature store handle for the project's feature store\n", "fs = connection.get_feature_store()" ] }, { "cell_type": "code", "execution_count": null, "id": "7fe26256", "metadata": {}, "outputs": [], "source": [ "snowflake_conn = fs.get_storage_connector(\"sfconnector\")\n", "snowflake_conn.spark_options()\n", "sfOptions = snowflake_conn.spark_options()\n", "sfOptions[\"query\"] = \"select * from TELCO\"" ] }, { "cell_type": "code", "execution_count": 6, "id": "be059163", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+-----------+------+--------------+-------+----------+------+-------------+----------------+----------------+---------------+-------------+-----------------+------------+------------+----------------+--------------+-----------------+--------------------+---------------+-------------+-----+\n", "|CUSTOMER_ID|GENDER|SENIOR_CITIZEN|PARTNER|DEPENDENTS|TENURE|PHONE_SERVICE| MULTIPLE_LINES|INTERNET_SERVICE|ONLINE_SECURITY|ONLINE_BACKUP|DEVICE_PROTECTION|TECH_SUPPORT|STREAMING_TV|STREAMING_MOVIES| CONTRACT|PAPERLESS_BILLING| PAYMENT_METHOD|MONTHLY_CHARGES|TOTAL_CHARGES|CHURN|\n", "+-----------+------+--------------+-------+----------+------+-------------+----------------+----------------+---------------+-------------+-----------------+------------+------------+----------------+--------------+-----------------+--------------------+---------------+-------------+-----+\n", "| 7590-VHVEG|Female| false| Yes| No| 1| No|No phone service| DSL| No| Yes| No| No| No| No|Month-to-month| Yes| Electronic check| 29.85| 29.85| No|\n", "| 5575-GNVDE| Male| false| No| No| 34| Yes| No| DSL| Yes| No| Yes| No| No| No| One year| No| Mailed check| 56.95| 1889.5| No|\n", "| 3668-QPYBK| Male| false| No| No| 2| Yes| No| DSL| Yes| Yes| No| No| No| No|Month-to-month| Yes| Mailed check| 53.85| 108.15| Yes|\n", "| 7795-CFOCW| Male| false| No| No| 45| No|No phone service| DSL| Yes| No| Yes| Yes| No| No| One year| No|Bank transfer (au...| 42.3| 1840.75| No|\n", "| 9237-HQITU|Female| false| No| No| 2| Yes| No| Fiber optic| No| No| No| No| No| No|Month-to-month| Yes| Electronic check| 70.7| 151.65| Yes|\n", "| 9305-CDSKC|Female| false| No| No| 8| Yes| Yes| Fiber optic| No| No| Yes| No| Yes| Yes|Month-to-month| Yes| Electronic check| 99.65| 820.5| Yes|\n", "| 1452-KIOVK| Male| false| No| Yes| 22| Yes| Yes| Fiber optic| No| Yes| No| No| Yes| No|Month-to-month| Yes|Credit card (auto...| 89.1| 1949.4| No|\n", "| 6713-OKOMC|Female| false| No| No| 10| No|No phone service| DSL| Yes| No| No| No| No| No|Month-to-month| No| Mailed check| 29.75| 301.9| No|\n", "| 7892-POOKP|Female| false| Yes| No| 28| Yes| Yes| Fiber optic| No| No| Yes| Yes| Yes| Yes|Month-to-month| Yes| Electronic check| 104.8| 3046.05| Yes|\n", "| 6388-TABGU| Male| false| No| Yes| 62| Yes| No| DSL| Yes| Yes| No| No| No| No| One year| No|Bank transfer (au...| 56.15| 3487.95| No|\n", "+-----------+------+--------------+-------+----------+------+-------------+----------------+----------------+---------------+-------------+-----------------+------------+------------+----------------+--------------+-----------------+--------------------+---------------+-------------+-----+\n", "only showing top 10 rows" ] } ], "source": [ "df = spark.read.format(\"net.snowflake.spark.snowflake\") \\\n", " .options(**sfOptions) \\\n", " .load()\n", "df.show(10)" ] }, { "cell_type": "markdown", "id": "0a09e733", "metadata": {}, "source": [ "### External (On-Demand) Feature Group\n", "An on-Demand Feature Group stores the metadata for features in the feature store. The actual feature data is read from the external source using a storage connector. \n", "The next cell shows how to create an on-demand feature group using the snowflake connector." ] }, { "cell_type": "code", "execution_count": 7, "id": "88fff740", "metadata": {}, "outputs": [], "source": [ "telco_on_dmd = fs.create_on_demand_feature_group(name=\"telco_snowflake_scala\",\n", " version=2,\n", " query=\"select * from telco\",\n", " description=\"On-demand feature group for telecom customer data\",\n", " storage_connector=snowflake_conn,\n", " statistics_config=True)" ] }, { "cell_type": "code", "execution_count": 8, "id": "e73bcf1e", "metadata": {}, "outputs": [], "source": [ "telco_on_dmd.save()" ] }, { "cell_type": "code", "execution_count": 9, "id": "bbf76e3c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+-----------+----------------+-------------+-------------+-----+\n", "|customer_id|internet_service|phone_service|total_charges|churn|\n", "+-----------+----------------+-------------+-------------+-----+\n", "| 7590-VHVEG| DSL| No| 29.85| No|\n", "| 5575-GNVDE| DSL| Yes| 1889.5| No|\n", "| 3668-QPYBK| DSL| Yes| 108.15| Yes|\n", "| 7795-CFOCW| DSL| No| 1840.75| No|\n", "| 9237-HQITU| Fiber optic| Yes| 151.65| Yes|\n", "+-----------+----------------+-------------+-------------+-----+\n", "only showing top 5 rows" ] } ], "source": [ "telco_on_dmd.select(['customer_id', 'internet_service', 'phone_service', 'total_charges', 'churn']).show(5)" ] }, { "cell_type": "code", "execution_count": null, "id": "10661c79", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "PySpark", "language": "python", "name": "pysparkkernel" }, "language_info": { "codemirror_mode": { "name": "python", "version": 3 }, "mimetype": "text/x-python", "name": "pyspark", "pygments_lexer": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }