Skip to content
2 changes: 0 additions & 2 deletions doc/user/content/serve-results/sink/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ menu:
weight: 10
---

{{< public-preview />}}

This guide walks you through the steps required to export results from
Materialize to Amazon S3. Copying results to S3 is
useful to perform tasks like periodic backups for auditing, or downstream
Expand Down
94 changes: 94 additions & 0 deletions doc/user/content/serve-results/sink/s3_compatible.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
title: "S3 Compatible Object Storage"
description: "How to export results from Materialize to S3 compatible object storage"
aliases:
- /serve-results/s3-compatible/
menu:
main:
parent: sink
name: "S3 Compatible Object Storage"
weight: 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering ... when we sink to Snowflake via S3 ...can that be done via S3-compatible? If so, maybe a note in the setup s3 section stating that this can be done via s3-compatible and a link to the page?https://preview.materialize.com/materialize/33752/serve-results/sink/snowflake/ also mention that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for the various sink pages like the concepts and Sink results ... the content hasn't changed since whenever the content was written other than some minor reorg. Should we incorporate that sink is available via COPY TO in this PR or handle it in a separate PR at a later date?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, for my second comment, I'll handle that in a separate PR. I want to link to some links related to subscription based sinks ... so, I can also incorporate copy to based sinks as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change is #33792

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we sink to Snowflake via S3 ...can that be done via S3-compatible?
In theory, yes. Let me add a note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - we likely would have to re-work the entire S3 => snowflake guide. It's not as simple as just using copy to! If it's ok, I'll avoid making changes to the snowflake guide for now

---

This guide walks you through the steps required to export results from
Materialize to an S3 compatible object storage service, such as Google
Cloud Storage, or Cloudflare R2.

## Before you begin:
- Make sure that you have setup your bucket.
- Obtain the following for your bucket. Instructions to obtain these vary by provider.
- The S3 compatible URI (`S3_BUCKET_URI`)
- The S3 compatible access tokens (`ACCESS_KEY_ID` and `SECRET_ACCESS_KEY`)

## Step 1. Create a connection

1. In the [SQL Shell](https://console.materialize.com/), or your preferred SQL
client connected to Materialize, create an [AWS connection](/sql/create-connection/#aws),
replacing `<ACCESS_KEY_ID>` and `<SECRET_ACCESS_KEY>` with the credentials for your bucket. The AWS
connection can be used to connect to any S3 compatible object storage service, by specifying the endpoint and the region.

For example, to connect to Google Cloud Storage, you can run the following:

```mzsql
CREATE SECRET secret_access_key AS '<SECRET_ACCESS_KEY>';
CREATE CONNECTION bucket_connection TO AWS (
ACCESS KEY ID = '<ACCESS_KEY_ID>',
SECRET ACCESS KEY = SECRET secret_access_key,
ENDPOINT = 'https://storage.googleapis.com',
REGION = 'us'
);
```

1. Validate the connection you created using the
[`VALIDATE CONNECTION`](/sql/validate-connection) command.

```mzsql
VALIDATE CONNECTION bucket_connection;
```

If no validation error is returned, you're ready to use this connection to
run a bulk export from Materialize to your target bucket.

## Step 2. Run a bulk export

To export data to your target bucket, use the [`COPY TO`](/sql/copy-to/#copy-to-s3)
command and the AWS connection you created in the previous step. Replace the `<S3_BUCKET_URI>`
with the S3 compatible URI for your target bucket.

{{< tabs >}}
{{< tab "Parquet">}}

```mzsql
COPY some_object TO '<S3_BUCKET_URI>'
WITH (
AWS CONNECTION = bucket_connection,
FORMAT = 'parquet'
);
```

For details on the Parquet writer settings Materialize uses, as well as data
type support and conversion, check the [reference documentation](/sql/copy-to/#copy-to-s3-parquet).

{{< /tab >}}

{{< tab "CSV">}}

```mzsql
COPY some_object TO '<S3_BUCKET_URI>'
WITH (
AWS CONNECTION = bucket_connection,
FORMAT = 'csv'
);
```

{{< /tab >}}

{{< /tabs >}}

## Step 3. (Optional) Add scheduling

Bulk exports to object storage using the `COPY TO` command are _one-shot_: every time
you want to export results, you must run the command. To automate running bulk
exports on a regular basis, you can set up scheduling, for example using a
simple `cron`-like service or an orchestration platform like Airflow or
Dagster.
4 changes: 1 addition & 3 deletions doc/user/content/sql/copy-to.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,7 @@ Name | Values | Default value | Description
COPY (SUBSCRIBE some_view) TO STDOUT WITH (FORMAT binary);
```

## Copy to Amazon S3 {#copy-to-s3}

{{< public-preview />}}
## Copy to Amazon S3 and S3 compatible services {#copy-to-s3}

Copying results to Amazon S3 (or S3-compatible services) is useful to perform
tasks like periodic backups for auditing, or downstream processing in
Expand Down
17 changes: 17 additions & 0 deletions doc/user/content/sql/create-connection.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,23 @@ CREATE CONNECTION aws_credentials TO AWS (

{{< /tabs >}}

### S3 compatible object storage
You can use an AWS connection to perform bulk exports to any S3 compatible object storage service,
such as Google Cloud Storage. While connecting to S3 compatible object storage, you need to provide
static access key credentials, specify the endpoint, and the region.

To create a connection that uses static access key credentials:

```mzsql
CREATE SECRET secret_access_key AS '...';
CREATE CONNECTION gcs_connection TO AWS (
ACCESS KEY ID = 'ASIAV2KIV5LPTG6HGXG6',
SECRET ACCESS KEY = SECRET secret_access_key,
ENDPOINT = 'https://storage.googleapis.com',
REGION = 'us'
);
```

### Kafka

A Kafka connection establishes a link to a [Kafka] cluster. You can use Kafka
Expand Down