{ "cells": [ { "cell_type": "markdown", "id": "italic-amplifier", "metadata": {}, "source": [ "## Snowflake (On-Demand) Feature Group \n", "This tutorial notebook will show how you can define an external feature group for a table in Snowflake.\n", "\n", "In this notebook we assume that you already have snowflake account and did the getting started with snowflake tutorial." ] }, { "cell_type": "code", "execution_count": 1, "id": "enormous-passing", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting Spark application\n" ] }, { "data": { "text/html": [ "\n", "
IDYARN Application IDKindStateSpark UIDriver log
0application_1614610677629_0001sparkidleLinkLink
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "SparkSession available as 'spark'.\n", "import com.logicalclocks.hsfs._\n" ] } ], "source": [ "import com.logicalclocks.hsfs._" ] }, { "cell_type": "code", "execution_count": 2, "id": "molecular-costume", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "connection: com.logicalclocks.hsfs.HopsworksConnection = com.logicalclocks.hsfs.HopsworksConnection@295412ba\n", "fs: com.logicalclocks.hsfs.FeatureStore = FeatureStore{id=67, name='project1_featurestore', projectId=119, featureGroupApi=com.logicalclocks.hsfs.metadata.FeatureGroupApi@4851643}\n" ] } ], "source": [ "val connection = HopsworksConnection.builder().build();\n", "val fs = connection.getFeatureStore();" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "val snowflakeConn = fs.getStorageConnector(\"sfconnector\")\n", "var sfOptions = snowflakeConn.getSparkOptionsInt\n", "sfOptions.put(\"query\", \"select * from TELCO\")" ] }, { "cell_type": "code", "execution_count": 4, "id": "ahead-lodging", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "df: org.apache.spark.sql.DataFrame = [CUSTOMER_ID: string, GENDER: string ... 19 more fields]\n", "+-----------+------+--------------+-------+----------+------+-------------+----------------+----------------+---------------+-------------+-----------------+------------+------------+----------------+--------------+-----------------+--------------------+---------------+-------------+-----+\n", "|CUSTOMER_ID|GENDER|SENIOR_CITIZEN|PARTNER|DEPENDENTS|TENURE|PHONE_SERVICE| MULTIPLE_LINES|INTERNET_SERVICE|ONLINE_SECURITY|ONLINE_BACKUP|DEVICE_PROTECTION|TECH_SUPPORT|STREAMING_TV|STREAMING_MOVIES| CONTRACT|PAPERLESS_BILLING| PAYMENT_METHOD|MONTHLY_CHARGES|TOTAL_CHARGES|CHURN|\n", "+-----------+------+--------------+-------+----------+------+-------------+----------------+----------------+---------------+-------------+-----------------+------------+------------+----------------+--------------+-----------------+--------------------+---------------+-------------+-----+\n", "| 7590-VHVEG|Female| false| Yes| No| 1| No|No phone service| DSL| No| Yes| No| No| No| No|Month-to-month| Yes| Electronic check| 29.85| 29.85| No|\n", "| 5575-GNVDE| Male| false| No| No| 34| Yes| No| DSL| Yes| No| Yes| No| No| No| One year| No| Mailed check| 56.95| 1889.5| No|\n", "| 3668-QPYBK| Male| false| No| No| 2| Yes| No| DSL| Yes| Yes| No| No| No| No|Month-to-month| Yes| Mailed check| 53.85| 108.15| Yes|\n", "| 7795-CFOCW| Male| false| No| No| 45| No|No phone service| DSL| Yes| No| Yes| Yes| No| No| One year| No|Bank transfer (au...| 42.3| 1840.75| No|\n", "| 9237-HQITU|Female| false| No| No| 2| Yes| No| Fiber optic| No| No| No| No| No| No|Month-to-month| Yes| Electronic check| 70.7| 151.65| Yes|\n", "| 9305-CDSKC|Female| false| No| No| 8| Yes| Yes| Fiber optic| No| No| Yes| No| Yes| Yes|Month-to-month| Yes| Electronic check| 99.65| 820.5| Yes|\n", "| 1452-KIOVK| Male| false| No| Yes| 22| Yes| Yes| Fiber optic| No| Yes| No| No| Yes| No|Month-to-month| Yes|Credit card (auto...| 89.1| 1949.4| No|\n", "| 6713-OKOMC|Female| false| No| No| 10| No|No phone service| DSL| Yes| No| No| No| No| No|Month-to-month| No| Mailed check| 29.75| 301.9| No|\n", "| 7892-POOKP|Female| false| Yes| No| 28| Yes| Yes| Fiber optic| No| No| Yes| Yes| Yes| Yes|Month-to-month| Yes| Electronic check| 104.8| 3046.05| Yes|\n", "| 6388-TABGU| Male| false| No| Yes| 62| Yes| No| DSL| Yes| Yes| No| No| No| No| One year| No|Bank transfer (au...| 56.15| 3487.95| No|\n", "+-----------+------+--------------+-------+----------+------+-------------+----------------+----------------+---------------+-------------+-----------------+------------+------------+----------------+--------------+-----------------+--------------------+---------------+-------------+-----+\n", "only showing top 10 rows\n", "\n" ] } ], "source": [ "val df = spark.read.format(\"net.snowflake.spark.snowflake\").options(sfOptions).load()\n", "df.show(10)" ] }, { "cell_type": "markdown", "id": "historic-distinction", "metadata": {}, "source": [ "### External (On-Demand) Feature Group\n", "An on-Demand Feature Group stores the metadata for features in the feature store. The actual feature data is read from the external source using a storage connector. \n", "The next cell shows how to create an on-demand feature group using the snowflake connector." ] }, { "cell_type": "code", "execution_count": 5, "id": "vocational-michael", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "telcoOnDmd: com.logicalclocks.hsfs.OnDemandFeatureGroup = com.logicalclocks.hsfs.OnDemandFeatureGroup@55afc64e\n" ] } ], "source": [ "val telcoOnDmd = (fs.createOnDemandFeatureGroup()\n", " .name(\"telco_snowflake_scala\")\n", " .version(1)\n", " .query(\"select * from CUSTOMER_CHURN\")\n", " .description(\"On-demand feature group for telecom customer data\")\n", " .storageConnector(snowflakeConn)\n", " .statisticsConfig(new StatisticsConfig(true, true, true))\n", " .build())" ] }, { "cell_type": "code", "execution_count": 6, "id": "veterinary-modeling", "metadata": {}, "outputs": [], "source": [ "telcoOnDmd.save()" ] }, { "cell_type": "code", "execution_count": 14, "id": "interested-remainder", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "import java.util.Arrays\n", "+-----------+----------------+-------------+-------------+-----+\n", "|customer_id|internet_service|phone_service|total_charges|churn|\n", "+-----------+----------------+-------------+-------------+-----+\n", "| 7590-VHVEG| DSL| No| 29.85| No|\n", "| 5575-GNVDE| DSL| Yes| 1889.5| No|\n", "| 3668-QPYBK| DSL| Yes| 108.15| Yes|\n", "| 7795-CFOCW| DSL| No| 1840.75| No|\n", "| 9237-HQITU| Fiber optic| Yes| 151.65| Yes|\n", "+-----------+----------------+-------------+-------------+-----+\n", "only showing top 5 rows\n", "\n" ] } ], "source": [ "import java.util.Arrays;\n", "telcoOnDmd.select(Arrays.asList(\"customer_id\", \"internet_service\", \"phone_service\", \"total_charges\", \"churn\")).show(5)" ] }, { "cell_type": "code", "execution_count": null, "id": "communist-protein", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Spark", "language": "scala", "name": "sparkkernel" }, "language_info": { "codemirror_mode": "text/x-scala", "mimetype": "text/x-scala", "name": "scala", "pygments_lexer": "scala" } }, "nbformat": 4, "nbformat_minor": 5 }