“`html
Many organizations use identity providers (IdPs) to authenticate users, manage their attributes, and group memberships for secure, efficient, and centralized identity management. You might be modernizing your data architecture using Amazon Redshift to enable access to your data lake and data in your data warehouse, and are looking for a centralized and scalable way to define and manage the data access based on IdP identities. AWS Lake Formation makes it straightforward to centrally govern, secure, and globally share data for analytics and machine learning (ML). Currently, you may have to map user identities and groups to AWS Identity and Access Management (IAM) roles, and data access permissions are defined at the IAM role level within Lake Formation. This setup is not efficient because setting up and maintaining IdP groups with IAM role mapping as new groups are created is time consuming and it makes it difficult to derive what data was accessed from which service at that time.
Amazon Redshift, Amazon Quicksight, and Lake Formation now integrate with the new trusted identity propagation capability in AWS IAM Identity Center to authenticate users seamlessly across services. In this post, we discuss two use cases to configure trusted identity propagation with Amazon Redshift and Lake Formation.
Solution overview
Trusted identity propagation provides a new authentication option for organizations that want to centralize data permissions management and authorize requests based on their IdP identity across service boundaries. With IAM Identity Center, you can configure an existing IdP to manage users and groups and use Lake Formation to define fine-grained access control permissions on catalog resources for these IdP identities. Amazon Redshift supports identity propagation when querying data with Amazon Redshift Spectrum and with Amazon Redshift Data Sharing, and you can use AWS CloudTrail to audit data access by IdP identities to help your organization meet their regulatory and compliance requirements.
With this new capability, users can connect to Amazon Redshift from Quicksight with a single sign-on experience and create direct query datasets. This is enabled by using IAM Identity Center as a shared identity source. With trusted identity propagation, when Quicksight assets like dashboards are shared with other users, the database permissions of each Quicksight user are applied by propagating their end-user identity from Quicksight to Amazon Redshift and enforcing their individual data permissions. Depending on the use case, the author can apply additional row-level and column-level security in Quicksight.
The following diagram illustrates an example of the solution architecture.
In this post, we walk through how to configure trusted identity propagation with Amazon Redshift and Lake Formation. We cover the following use cases:
Redshift Spectrum with Lake Formation
Redshift data sharing with Lake Formation
Prerequisites
This walkthrough assumes you have set up a Lake Formation administrator role or a similar role to follow along with the instructions in this post. To learn more about setting up permissions for a data lake administrator, see Create a data lake administrator.
Additionally, you must create the following resources as detailed in Integrate Okta with Amazon Redshift Query Editor V2 using AWS IAM Identity Center for seamless Single Sign-On:
An Okta account integrated with IAM Identity Center to sync users and groups
A Redshift managed application with IAM Identity Center
A Redshift source cluster with IAM Identity Center integration enabled
A Redshift target cluster with IAM Identity Center integration enabled (you can skip the section to set up Amazon Redshift role-based access)
Users and groups from IAM Identity Center assigned to the Redshift application
A permission set assigned to AWS accounts to enable Redshift Query Editor v2 access
Add the below permission to the IAM role used in Redshift managed application for integration with IAM Identity Center.
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“lakeformation:GetDataAccess”,
“glue:GetTable”,
“glue:GetTables”,
“glue:SearchTables”,
“glue:GetDatabase”,
“glue:GetDatabases”,
“glue:GetPartitions”,
“lakeformation:GetResourceLFTags”,
“lakeformation:ListLFTags”,
“lakeformation:GetLFTag”,
“lakeformation:SearchTablesByLFTags”,
“lakeformation:SearchDatabasesByLFTags”
],
“Resource”: “*”
}
]
}
Use case 1: Redshift Spectrum with Lake Formation
This use case assumes you have the following prerequisites:
Log in to the AWS Management Console as an IAM administrator.
Go to CloudShell or your AWS CLI and run the following AWS CLI command, providing your bucket name to copy the data:
aws s3 sync s3://redshift-demos/data/NY-Pub/ s3://<bucketname>/data/NY-Pub/
In this post, we use an AWS Glue crawler to create the external table ny_pub stored in Apache Parquet format in the Amazon S3 location s3://<bucketname>/data/NY-Pub/. In the next step, we create the solution resources using AWS CloudFormation to create a stack named CrawlS3Source-NYTaxiData in us-east-1.
Download the .yml file or launch the CloudFormation stack.
The stack creates the following resources:
The crawler NYTaxiCrawler along with the new IAM role AWSGlueServiceRole-RedshiftAutoMount
The AWS Glue database automountdb
When the stack is complete, continue with the following steps to finish setting up your resources:
On the AWS Glue console, under Data Catalog in the navigation pane, choose Crawlers.
Open NYTaxiCrawler and choose Edit.
Under Choose data sources and classifiers, choose Edit.
For Data source, choose S3.
For S3 path, enter s3://<bucketname>/data/NY-Pub/.
Choose Update S3 data source.
Choose Next and choose Update.
Choose Run crawler.
After the crawler is complete, you can see a new table called ny_pub in the Data Catalog under the automountdb database.
After you create the resources, complete the steps in the next sections to set up Lake Formation permissions on the AWS Glue table ny_pub for the sales IdP group and access them via Redshift Spectrum.
Enable Lake Formation propagation for the Redshift managed application
Complete the following steps to enable Lake Formation propagation for the Redshift managed application created in Integrate Okta with Amazon Redshift Query Editor V2 using AWS IAM Identity Center for seamless Single Sign-On:
Log in to the console as admin.
On the Amazon Redshift console, choose IAM Identity Center connection in the navigation pane.
Select the managed application that starts with redshift-iad and choose Edit.
Select Enable AWS Lake Formation access grants under Trusted identity propagation and save your changes.
Set up Lake Formation as an IAM Identity Center application
Complete the following steps to set up Lake Formation as an IAM Identity Center application:
On the Lake Formation console, under Administration in the navigation pane, choose IAM Identity Center integration.
Review the options and choose Submit to enable Lake Formation integration.
The integration status will update to Success.
Alternatively, you can run the following command:
aws lakeformation create-lake-formation-identity-center-configuration –cli-input-json ‘{“CatalogId”: “<catalog_id>”,”InstanceArn”: “<identitycenter_arn>”}’
Register the data with Lake Formation
In this section, we register the data with Lake Formation. Complete the following steps:
On the Lake Formation console, under Administration in the navigation pane, choose Data lake locations.
Choose Register location.
For Amazon S3 path, enter the bucket where the table data resides (s3://<bucketname>/data/NY-Pub/).
For IAM role, choose a Lake Formation user-defined role. For more information, refer to Requirements for roles used to register locations.
For Permission mode, select Lake Formation.
Choose Register location.
Next, verify that the IAMAllowedPrincipal group doesn’t have permission on the database.
On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.
Select automountdb and on the Actions menu, choose View permissions.
If IAMAllowedPrincipal is listed, select the principal and choose Revoke.
Repeat these steps to verify permissions for the table ny_pub.
Grant the IAM Identity Center group permissions on the AWS Glue database and table
Complete the following steps to grant database permissions to the IAM Identity Center group:
On the Lake Formation console, under Data catalog in the navigation pane, choose Databases.
Select the database automountdb and on the Actions menu, choose Grant.
Choose Grant database.
Under Principals, select IAM Identity Center and choose Add.
In the pop-up window, if this is the first time assigning users and groups, choose Get started.
Enter the IAM Identity Center group in the search bar and choose the group.
Choose Assign.
Under LF-Tags or catalog resources, automountdb is already selected for Databases.
Select Describe for Database permissions.
Choose Grant to apply the permissions.
Alternatively, you can run the following command:
aws lakeformation grant-permissions –cli-input-json ‘
{
“Principal”: {
“DataLakePrincipalIdentifier”: “arn:aws:identitystore:::group/<identitycenter_group_name>”
},
“Resource”: {
“Database”: {
“Name”: “automountdb”
}
},
“Permissions”: [
“DESCRIBE”
]
}’
Next, you grant table permissions to the IAM Identity Center group.
Under Data catalog in the navigation pane, choose Databases.
Select the database automountdb and on the Actions menu, choose Grant.
Under Principals, select IAM Identity Center and choose Add.
Enter the IAM Identity Center group in the search bar and