Prepare S3 for technical lineage via AWS connection
For Collibra Data Lineage to access source files, such as SQL files, from Amazon S3, you can configure an AWS connection. Depending on your security requirements and where your Edge site is hosted, you can choose EC2 or IAM authentication types.
Use EC2 authentication if your Edge site is on an Amazon EC2 instance and you want to use keyless authentication. Use IAM authentication if your Edge site is not on AWS, or if you prefer using Access Keys and Secret Keys.
Prerequisites
Ensure that the following requirements are met:
- You have administrative access to the AWS IAM console.
- You have installed the Edge site using the bundled K3s installer.
- The Edge site runs on an Amazon EC2 instance.
- You have administrative access to the AWS IAM and EC2 consoles.
Steps
As the AWS administrator, while you may choose to implement this via existing roles or permission boundaries, the following examples provide validated procedures.
- Go to AWS Identity and Access Management (IAM) and select Users. You can create a new user or select an existing one.
- Open the details of the AWS user, and click Add permissions.
- Create an inline policy named read_from_bucket using the following JSON. Specify
<the bucket>with the name of the S3 bucket that contains your source files.Copy{
"Version": "2012-10-17",
"Statement":
[
{
"Sid": "S3ListStart",
"Effect": "Allow",
"Action":
[
"s3:List*",
"s3:Get*"
],
"Resource":
[
"arn:aws:s3:::<the bucket>",
"arn:aws:s3:::<the bucket>/*"
]
}
]
} - Create a key pair, access key ID and secret access key, for that AWS user.
When you create the AWS connection, enter the access key ID and secret access key.
- Go to AWS Identity and Access Management.
- Create an IAM role. Do not attach permissions during role creation.
- Open the details of the newly created role, and click Add permissions.
- Create an inline policy named read_from_bucket using the following JSON. Specify
<the bucket>with the name of the S3 bucket that contains your source files.Copy{
"Version": "2012-10-17",
"Statement":
[
{
"Sid": "S3ListStart",
"Effect": "Allow",
"Action":
[
"s3:List*",
"s3:Get*"
],
"Resource":
[
"arn:aws:s3:::<the bucket>",
"arn:aws:s3:::<the bucket>/*"
]
}
]
} - In the Amazon EC2 console, attach the IAM role you created to the Amazon EC2 instance.