# Data ingestion  <a name="data_ingestion"></a> 
 
## Ingesting data from Redshift cluster
 
### Step 1) Create Redshift connector in Hopsworks
 
Look at the "Get started with Redshift and the Feature Store" notebook for a step by step guide on how to create a redshift cluster. 
 
#### Create a redshift connector in Hopsworks for your redshift cluster.
![create-connector.png](images/redshift/create-connector.png)
 
Enter a unique name for your connector and in the Refshift tab fill in the cluster identifier, database driver name, endpoint, database name, database port, and database user fields.
You can use a password or an IAM role to connect to the database. If you use an IAM role a temporary password will be generated for the user every time you get the connector. 
The IAM role needs a policy that will allow it to get temporary credentials for the user. An example policy that will allow GetClusterCredentials, CreateClusterUser, and JoinGroup on database `dev` is shown below.

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowRedshiftTempCreds",
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials",
                "redshift:CreateClusterUser",
                "redshift:JoinGroup"
            ],
            "Resource": [
                "arn:aws:redshift:us-east-2:123456789011:dbuser:redshift-cluster-1/temp_creds_user",
                "arn:aws:redshift:us-east-2:123456789011:dbname:redshift-cluster-1/dev",
                "arn:aws:redshift:us-east-2:123456789011:dbgroup:redshift-cluster-1/auto_login_group"
            ]
        }
    ]
}
```

A temporary password is only valid for 1 hour.




### Get a storage connector by name

In [None]:
import hsfs
# Create a connection
connection = hsfs.connection()
# Get the feature store handle for the project's feature store
fs = connection.get_feature_store()
# Get the connector created above 
connector_redshift = fs.get_storage_connector("connector-1", "REDSHIFT")
options = connector_redshift.spark_options()

### Create a table

In [None]:
telcom = spark.read.option("header","true").option("inferSchema","true")\
.csv("hdfs:///Projects/redshift/redshift_Training_Datasets/telco_customer_churn.csv")

telcom.write.\
format("jdbc").options(**options).\
mode("overwrite").\
save()

### Read data from a table

In [None]:
telcom = spark.read.format("jdbc").options(**options).load()
telcom.show(10)