Skip to content

Commit

Permalink
update s3 data warehouse
Browse files Browse the repository at this point in the history
  • Loading branch information
ivanagas committed Jul 15, 2024
1 parent 2d850d7 commit c2ac880
Showing 1 changed file with 54 additions and 26 deletions.
80 changes: 54 additions & 26 deletions contents/docs/data-warehouse/setup/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,41 +29,61 @@ The data warehouse can link to data in your object storage system like S3. To st
Next, we need to create a new user in our AWS console with programmatic access to our newly created bucket.

1. Open [IAM](https://console.aws.amazon.com/iam/home) and create a new policy to enable access to this bucket
2. On the left under "Access management," select "Policies," and click "Create policy"
3. Under the service, choose "S3"
4. Under "Actions," select:
1. "Write" -> "PutObject"
2. "Permission Management" -> "PutObjectAcl"
5. Under "Resources," select "Specific," and click "object" -> "Add ARN"
6. Specify your bucket name and choose "any" for the object name. In the example below, replace `posthog-s3-export` with the bucket name you chose in the previous section

![bucket arn config](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/bucket-arn.png)

7. Your config should now look like the following

![policy overview](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/policy-config.png)
2. On the left under **Access management**, select **Policies** and click **Create policy**
3. Under the service, choose **S3**
4. Under **Actions**, select:
1. **List** -> **ListBucket** and **ListBucketMultipartUploads**
2. **Read** -> **GetBucketLocation** and **GetObject**
2. **Write** -> **AbortMultipartUpload** and **PutObject**
3. **Permission Management** -> **PutObjectAcl**
5. Under **Resources**, select **Specific**. Under object, click **Add ARNs**
6. Specify your bucket name and choose **Any object name**. In the example below, replace `posthog-s3-export` with the bucket name you chose in the previous section

![bucket arn config](https://res.cloudinary.com/dmukukwp6/image/upload/Clean_Shot_2024_07_15_at_15_19_29_2x_15416e8e84.png)

7. Your policy in JSON should look like this:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:PutObjectAcl",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:AbortMultipartUpload",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::posthog-s3-export/*"
}
]
}
```

6. Click "Next" until you end up on the "Review Policy" page
7. Give your policy a name and click "Create policy"

The final step is to create a new user and give them access to our bucket by attaching our newly created policy.

1. Open [IAM](https://console.aws.amazon.com/iam/home) and navigate to "Users" on the left
2. Click "Add Users"
3. Specify a name and make sure to choose "Access key - Programmatic access"

![create user](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/create-user.png)

4. Click "Next"
5. At the top, select "Attach existing policies directly"
1. Open [IAM](https://console.aws.amazon.com/iam/home) and navigate to **Users** on the left
2. Click **Create user**, add a user name, and click **Next**
5. Select **Attach policies directly**
6. Search for the policy you just created and click the checkbox on the far left to attach it to this user
6. Click **Next** and then click **Create user**

![attaching the policy to our newly created user](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/attach-policy.png)
![User](https://res.cloudinary.com/dmukukwp6/image/upload/Clean_Shot_2024_07_15_at_16_16_34_2x_9f0f99d7a4.png)

6. Click "Next" until you reach the "Create user" button. Click that as well.
7. **Make sure to copy your "Access key" and "Secret access key". The latter will not be shown again.**
7. Once created, search and click on the user name and then click **Create access key**
8. Select **Application running outside AWS** and then click **Next**
9. Add a description tag value and click **Create access key**
10. Copy the access key and secret access key values to somewhere safe. You will need to recreate these values if you lose them.

![showing our newly created api key and secret key](https://res.cloudinary.com/dmukukwp6/image/upload/v1710055416/posthog.com/contents/images/docs/apps/s3-export/access-keys.png)
![AWS access keys](https://res.cloudinary.com/dmukukwp6/image/upload/Clean_Shot_2024_07_15_at_16_17_45_2x_e7dcb9dd39.png)

### Step 3: Add data to the bucket

Expand Down Expand Up @@ -97,4 +117,12 @@ The final step is to create a new user and give them access to our bucket by att

### Step 5: Query the table.

Amazing! You can now [query](/docs/data-warehouse/query) your new table.
Amazing! You can now [query](/docs/data-warehouse/query) your new table.

## Troubleshooting

- **Create table failed: Could not get columns**: Check that your **Files URL pattern** is correct and that your file format is correct. For example, make sure your columns don't have spaces and there aren't commas in cells in your `.csv` file.

- **Create table failed: Access was denied when reading the provided file**: Make sure your access policies are correct.

- **Create table failed: The provided file is not in Parquet format**: Make sure you've selected the correct file format.

0 comments on commit c2ac880

Please sign in to comment.