From c2ac88010918a2d763d53891085be15f43f70b2a Mon Sep 17 00:00:00 2001 From: Ian Vanagas <34755028+ivanagas@users.noreply.github.com> Date: Mon, 15 Jul 2024 16:27:06 -0700 Subject: [PATCH] update s3 data warehouse --- contents/docs/data-warehouse/setup/s3.md | 80 ++++++++++++++++-------- 1 file changed, 54 insertions(+), 26 deletions(-) diff --git a/contents/docs/data-warehouse/setup/s3.md b/contents/docs/data-warehouse/setup/s3.md index d9b89a5064ed..038871f8c394 100644 --- a/contents/docs/data-warehouse/setup/s3.md +++ b/contents/docs/data-warehouse/setup/s3.md @@ -29,41 +29,61 @@ The data warehouse can link to data in your object storage system like S3. To st Next, we need to create a new user in our AWS console with programmatic access to our newly created bucket. 1. Open [IAM](https://console.aws.amazon.com/iam/home) and create a new policy to enable access to this bucket -2. On the left under "Access management," select "Policies," and click "Create policy" -3. Under the service, choose "S3" -4. Under "Actions," select: - 1. "Write" -> "PutObject" - 2. "Permission Management" -> "PutObjectAcl" -5. Under "Resources," select "Specific," and click "object" -> "Add ARN" -6. Specify your bucket name and choose "any" for the object name. In the example below, replace `posthog-s3-export` with the bucket name you chose in the previous section - -![bucket arn config](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/bucket-arn.png) - -7. Your config should now look like the following - -![policy overview](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/policy-config.png) +2. On the left under **Access management**, select **Policies** and click **Create policy** +3. Under the service, choose **S3** +4. Under **Actions**, select: + 1. **List** -> **ListBucket** and **ListBucketMultipartUploads** + 2. **Read** -> **GetBucketLocation** and **GetObject** + 2. **Write** -> **AbortMultipartUpload** and **PutObject** + 3. **Permission Management** -> **PutObjectAcl** +5. Under **Resources**, select **Specific**. Under object, click **Add ARNs** +6. Specify your bucket name and choose **Any object name**. In the example below, replace `posthog-s3-export` with the bucket name you chose in the previous section + +![bucket arn config](https://res.cloudinary.com/dmukukwp6/image/upload/Clean_Shot_2024_07_15_at_15_19_29_2x_15416e8e84.png) + +7. Your policy in JSON should look like this: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "VisualEditor0", + "Effect": "Allow", + "Action": [ + "s3:PutObject", + "s3:GetObject", + "s3:PutObjectAcl", + "s3:ListBucket", + "s3:ListBucketMultipartUploads", + "s3:AbortMultipartUpload", + "s3:GetBucketLocation" + ], + "Resource": "arn:aws:s3:::posthog-s3-export/*" + } + ] +} +``` 6. Click "Next" until you end up on the "Review Policy" page 7. Give your policy a name and click "Create policy" The final step is to create a new user and give them access to our bucket by attaching our newly created policy. -1. Open [IAM](https://console.aws.amazon.com/iam/home) and navigate to "Users" on the left -2. Click "Add Users" -3. Specify a name and make sure to choose "Access key - Programmatic access" - -![create user](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/create-user.png) - -4. Click "Next" -5. At the top, select "Attach existing policies directly" +1. Open [IAM](https://console.aws.amazon.com/iam/home) and navigate to **Users** on the left +2. Click **Create user**, add a user name, and click **Next** +5. Select **Attach policies directly** 6. Search for the policy you just created and click the checkbox on the far left to attach it to this user +6. Click **Next** and then click **Create user** -![attaching the policy to our newly created user](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/attach-policy.png) +![User](https://res.cloudinary.com/dmukukwp6/image/upload/Clean_Shot_2024_07_15_at_16_16_34_2x_9f0f99d7a4.png) -6. Click "Next" until you reach the "Create user" button. Click that as well. -7. **Make sure to copy your "Access key" and "Secret access key". The latter will not be shown again.** +7. Once created, search and click on the user name and then click **Create access key** +8. Select **Application running outside AWS** and then click **Next** +9. Add a description tag value and click **Create access key** +10. Copy the access key and secret access key values to somewhere safe. You will need to recreate these values if you lose them. -![showing our newly created api key and secret key](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/access-keys.png) +![AWS access keys](https://res.cloudinary.com/dmukukwp6/image/upload/Clean_Shot_2024_07_15_at_16_17_45_2x_e7dcb9dd39.png) ### Step 3: Add data to the bucket @@ -97,4 +117,12 @@ The final step is to create a new user and give them access to our bucket by att ### Step 5: Query the table. -Amazing! You can now [query](/docs/data-warehouse/query) your new table. \ No newline at end of file +Amazing! You can now [query](/docs/data-warehouse/query) your new table. + +## Troubleshooting + +- **Create table failed: Could not get columns**: Check that your **Files URL pattern** is correct and that your file format is correct. For example, make sure your columns don't have spaces and there aren't commas in cells in your `.csv` file. + +- **Create table failed: Access was denied when reading the provided file**: Make sure your access policies are correct. + +- **Create table failed: The provided file is not in Parquet format**: Make sure you've selected the correct file format. \ No newline at end of file