diff --git a/doc/user/content/serve-results/sink/s3.md b/doc/user/content/serve-results/sink/s3.md index 3937cfd9890c9..00e088e27520d 100644 --- a/doc/user/content/serve-results/sink/s3.md +++ b/doc/user/content/serve-results/sink/s3.md @@ -10,8 +10,6 @@ menu: weight: 10 --- -{{< public-preview />}} - This guide walks you through the steps required to export results from Materialize to Amazon S3. Copying results to S3 is useful to perform tasks like periodic backups for auditing, or downstream diff --git a/doc/user/content/serve-results/sink/s3_compatible.md b/doc/user/content/serve-results/sink/s3_compatible.md new file mode 100644 index 0000000000000..3c2121b3a4ddb --- /dev/null +++ b/doc/user/content/serve-results/sink/s3_compatible.md @@ -0,0 +1,94 @@ +--- +title: "S3 Compatible Object Storage" +description: "How to export results from Materialize to S3 compatible object storage" +aliases: + - /serve-results/s3-compatible/ +menu: + main: + parent: sink + name: "S3 Compatible Object Storage" + weight: 10 +--- + +This guide walks you through the steps required to export results from +Materialize to an S3 compatible object storage service, such as Google +Cloud Storage, or Cloudflare R2. + +## Before you begin: +- Make sure that you have setup your bucket. +- Obtain the following for your bucket. Instructions to obtain these vary by provider. + - The S3 compatible URI (`S3_BUCKET_URI`) + - The S3 compatible access tokens (`ACCESS_KEY_ID` and `SECRET_ACCESS_KEY`) + +## Step 1. Create a connection + +1. In the [SQL Shell](https://console.materialize.com/), or your preferred SQL + client connected to Materialize, create an [AWS connection](/sql/create-connection/#aws), + replacing `` and `` with the credentials for your bucket. The AWS + connection can be used to connect to any S3 compatible object storage service, by specifying the endpoint and the region. + + For example, to connect to Google Cloud Storage, you can run the following: + + ```mzsql + CREATE SECRET secret_access_key AS ''; + CREATE CONNECTION bucket_connection TO AWS ( + ACCESS KEY ID = '', + SECRET ACCESS KEY = SECRET secret_access_key, + ENDPOINT = 'https://storage.googleapis.com', + REGION = 'us' + ); + ``` + +1. Validate the connection you created using the + [`VALIDATE CONNECTION`](/sql/validate-connection) command. + + ```mzsql + VALIDATE CONNECTION bucket_connection; + ``` + + If no validation error is returned, you're ready to use this connection to + run a bulk export from Materialize to your target bucket. + +## Step 2. Run a bulk export + +To export data to your target bucket, use the [`COPY TO`](/sql/copy-to/#copy-to-s3) +command and the AWS connection you created in the previous step. Replace the `` +with the S3 compatible URI for your target bucket. + +{{< tabs >}} +{{< tab "Parquet">}} + +```mzsql +COPY some_object TO '' +WITH ( + AWS CONNECTION = bucket_connection, + FORMAT = 'parquet' + ); +``` + +For details on the Parquet writer settings Materialize uses, as well as data +type support and conversion, check the [reference documentation](/sql/copy-to/#copy-to-s3-parquet). + +{{< /tab >}} + +{{< tab "CSV">}} + +```mzsql +COPY some_object TO '' +WITH ( + AWS CONNECTION = bucket_connection, + FORMAT = 'csv' + ); +``` + +{{< /tab >}} + +{{< /tabs >}} + +## Step 3. (Optional) Add scheduling + +Bulk exports to object storage using the `COPY TO` command are _one-shot_: every time +you want to export results, you must run the command. To automate running bulk +exports on a regular basis, you can set up scheduling, for example using a +simple `cron`-like service or an orchestration platform like Airflow or +Dagster. diff --git a/doc/user/content/sql/copy-to.md b/doc/user/content/sql/copy-to.md index d557ce5b6407e..5bf871dadbcfe 100644 --- a/doc/user/content/sql/copy-to.md +++ b/doc/user/content/sql/copy-to.md @@ -39,9 +39,7 @@ Name | Values | Default value | Description COPY (SUBSCRIBE some_view) TO STDOUT WITH (FORMAT binary); ``` -## Copy to Amazon S3 {#copy-to-s3} - -{{< public-preview />}} +## Copy to Amazon S3 and S3 compatible services {#copy-to-s3} Copying results to Amazon S3 (or S3-compatible services) is useful to perform tasks like periodic backups for auditing, or downstream processing in diff --git a/doc/user/content/sql/create-connection.md b/doc/user/content/sql/create-connection.md index 66804b3b2ca13..d38552a01bff0 100644 --- a/doc/user/content/sql/create-connection.md +++ b/doc/user/content/sql/create-connection.md @@ -173,6 +173,23 @@ CREATE CONNECTION aws_credentials TO AWS ( {{< /tabs >}} +### S3 compatible object storage +You can use an AWS connection to perform bulk exports to any S3 compatible object storage service, +such as Google Cloud Storage. While connecting to S3 compatible object storage, you need to provide +static access key credentials, specify the endpoint, and the region. + +To create a connection that uses static access key credentials: + +```mzsql +CREATE SECRET secret_access_key AS '...'; +CREATE CONNECTION gcs_connection TO AWS ( + ACCESS KEY ID = 'ASIAV2KIV5LPTG6HGXG6', + SECRET ACCESS KEY = SECRET secret_access_key, + ENDPOINT = 'https://storage.googleapis.com', + REGION = 'us' +); +``` + ### Kafka A Kafka connection establishes a link to a [Kafka] cluster. You can use Kafka