-
Notifications
You must be signed in to change notification settings - Fork 480
Docs update to document copy to
for any S3 compatible storage
#33752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
93b93a9
bc44cc9
4e81cb7
cfeb5d3
740813e
15388de
3ef5767
ce0a633
05b9c85
6f266fe
14c367d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
--- | ||
title: "S3 Compatible Object Storage" | ||
description: "How to export results from Materialize to S3 compatible object storage" | ||
aliases: | ||
- /serve-results/s3-compatible/ | ||
menu: | ||
main: | ||
parent: sink | ||
name: "S3 Compatible Object Storage" | ||
weight: 10 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just wondering ... when we sink to Snowflake via S3 ...can that be done via S3-compatible? If so, maybe a note in the setup s3 section stating that this can be done via s3-compatible and a link to the page?https://preview.materialize.com/materialize/33752/serve-results/sink/snowflake/ also mention that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, for the various sink pages like the concepts and Sink results ... the content hasn't changed since whenever the content was written other than some minor reorg. Should we incorporate that sink is available via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, for my second comment, I'll handle that in a separate PR. I want to link to some links related to subscription based sinks ... so, I can also incorporate copy to based sinks as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That change is #33792 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually - we likely would have to re-work the entire S3 => snowflake guide. It's not as simple as just using |
||
--- | ||
|
||
This guide walks you through the steps required to export results from | ||
Materialize to an S3 compatible object storage service, such as Google | ||
Cloud Storage, or Cloudflare R2. | ||
|
||
## Before you begin: | ||
- Make sure that you have setup your bucket. | ||
- Obtain the following for your bucket. Instructions to obtain these vary by provider. | ||
- The S3 compatible URI (`S3_BUCKET_URI`) | ||
- The S3 compatible access tokens (`ACCESS_KEY_ID` and `SECRET_ACCESS_KEY`) | ||
|
||
## Step 1. Create a connection | ||
|
||
1. In the [SQL Shell](https://console.materialize.com/), or your preferred SQL | ||
client connected to Materialize, create an [AWS connection](/sql/create-connection/#aws), | ||
replacing `<ACCESS_KEY_ID>` and `<SECRET_ACCESS_KEY>` with the credentials for your bucket. The AWS | ||
connection can be used to connect to any S3 compatible object storage service, by specifying the endpoint and the region. | ||
|
||
For example, to connect to Google Cloud Storage, you can run the following: | ||
|
||
```mzsql | ||
CREATE SECRET secret_access_key AS '<SECRET_ACCESS_KEY>'; | ||
CREATE CONNECTION bucket_connection TO AWS ( | ||
ACCESS KEY ID = '<ACCESS_KEY_ID>', | ||
SECRET ACCESS KEY = SECRET secret_access_key, | ||
ENDPOINT = 'https://storage.googleapis.com', | ||
REGION = 'us' | ||
); | ||
``` | ||
|
||
1. Validate the connection you created using the | ||
[`VALIDATE CONNECTION`](/sql/validate-connection) command. | ||
|
||
```mzsql | ||
VALIDATE CONNECTION bucket_connection; | ||
``` | ||
|
||
If no validation error is returned, you're ready to use this connection to | ||
run a bulk export from Materialize to your target bucket. | ||
|
||
## Step 2. Run a bulk export | ||
|
||
To export data to your target bucket, use the [`COPY TO`](/sql/copy-to/#copy-to-s3) | ||
command and the AWS connection you created in the previous step. Replace the `<S3_BUCKET_URI>` | ||
with the S3 compatible URI for your target bucket. | ||
|
||
{{< tabs >}} | ||
{{< tab "Parquet">}} | ||
|
||
```mzsql | ||
COPY some_object TO '<S3_BUCKET_URI>' | ||
WITH ( | ||
AWS CONNECTION = bucket_connection, | ||
FORMAT = 'parquet' | ||
); | ||
``` | ||
|
||
For details on the Parquet writer settings Materialize uses, as well as data | ||
type support and conversion, check the [reference documentation](/sql/copy-to/#copy-to-s3-parquet). | ||
|
||
{{< /tab >}} | ||
|
||
{{< tab "CSV">}} | ||
|
||
```mzsql | ||
COPY some_object TO '<S3_BUCKET_URI>' | ||
WITH ( | ||
AWS CONNECTION = bucket_connection, | ||
FORMAT = 'csv' | ||
); | ||
``` | ||
|
||
{{< /tab >}} | ||
|
||
{{< /tabs >}} | ||
|
||
## Step 3. (Optional) Add scheduling | ||
|
||
Bulk exports to object storage using the `COPY TO` command are _one-shot_: every time | ||
you want to export results, you must run the command. To automate running bulk | ||
exports on a regular basis, you can set up scheduling, for example using a | ||
simple `cron`-like service or an orchestration platform like Airflow or | ||
Dagster. |
Uh oh!
There was an error while loading. Please reload this page.