You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
https://trino.io/docs/current/overview/use-cases.html#what-trino-is[Trino] is a tool designed to efficiently query vast amounts of data using distributed queries. It is not a database with its own storage but rather interacts with many different data stores. Trino connects to these data stores - or data sources - via https://trino.io/docs/current/connector.html[connectors].
8
+
{what-trino-is}[Trino] is a tool designed to efficiently query vast amounts of data using distributed queries.
9
+
It is not a database with its own storage but rather interacts with many different data stores.
10
+
Trino connects to these data stores - or data sources - via {trino-connector}[connectors].
6
11
Each connector enables access to a specific underlying data source such as a Hive warehouse, a PostgreSQL database or a Druid instance.
7
12
8
-
A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load. The workers fetch data from the connectors, execute tasks and share intermediate results. The coordinator collects and consolidates these results for the end-user.
13
+
A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load.
14
+
The workers fetch data from the connectors, execute tasks and share intermediate results.
15
+
The coordinator collects and consolidates these results for the end-user.
9
16
10
17
== [[catalogs]]Catalogs
11
18
@@ -24,9 +31,12 @@ Currently, the following connectors are supported:
24
31
25
32
== Catalog references
26
33
27
-
Within Stackable a `TrinoCatalog` consists of one or more (mandatory or optional) components which are specific to that catalog. A catalog should be re-usable within multiple Trino clusters. Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.
34
+
Within Stackable a `TrinoCatalog` consists of one or more (mandatory or optional) components which are specific to that catalog.
35
+
A catalog should be re-usable within multiple Trino clusters.
36
+
Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.
28
37
29
-
The following diagram illustrates this. Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:
38
+
The following diagram illustrates this.
39
+
Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:
30
40
31
41
image::catalogs.drawio.svg[A TrinoCluster referencing two catalogs by label matching]
Copy file name to clipboardexpand all lines: docs/modules/trino/pages/getting_started/first_steps.adoc
+13-5
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,13 @@
1
1
= First steps
2
+
:description: Deploy and verify a Trino cluster with Stackable Operator. Access via CLI or web interface, and clean up after testing.
2
3
3
-
After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Trino cluster and the required dependencies. Afterwards you can <<_verify_that_it_works, verify that it works>> by running some queries against Trino or visit the Trino web interface.
4
+
After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Trino cluster and the required dependencies.
5
+
Afterwards you can <<_verify_that_it_works, verify that it works>> by running some queries against Trino or visit the Trino web interface.
4
6
5
7
== Setup Trino
6
8
7
-
A working Trino cluster and its web interface require only the commons, secret and listener operators to work. Simple tests are possible without an external data source (e.g. PostgreSQL, Hive or S3), as internal data can be used.
9
+
A working Trino cluster and its web interface require only the commons, secret and listener operators to work.
10
+
Simple tests are possible without an external data source (e.g. PostgreSQL, Hive or S3), as internal data can be used.
8
11
9
12
Create a file named `trino.yaml` with the following content:
We use the https://trino.io/download.html[Trino CLI tool] to access the Trino cluster. This link points to the latest Trino version. In this guide we keep Trino cluster and client versions in sync and download the CLI tool from the https://repo.stackable.tech/[Stackable repository]:
60
+
We use the https://trino.io/download.html[Trino CLI tool] to access the Trino cluster.
61
+
This link points to the latest Trino version.
62
+
In this guide we keep Trino cluster and client versions in sync and download the CLI tool from the https://repo.stackable.tech/[Stackable repository]:
58
63
59
64
[source,bash]
60
65
----
@@ -100,9 +105,12 @@ Congratulations, you set up your first Stackable Trino cluster successfully.
100
105
101
106
=== Access the Trino web interface
102
107
103
-
With the port-forward still active, you can connect to the Trino web interface. Enter `https://localhost:8443/ui` in your browser and login with the username `admin`. Since no authentication is enabled you do not need to enter a password.
108
+
With the port-forward still active, you can connect to the Trino web interface.
109
+
Enter `https://localhost:8443/ui` in your browser and login with the username `admin`.
110
+
Since no authentication is enabled you do not need to enter a password.
104
111
105
-
WARNING: Your browser will probably show a security risk warning because it does not trust the self generated TLS certificates. Just ignore that and continue.
112
+
WARNING: Your browser will probably show a security risk warning because it does not trust the self generated TLS certificates.
113
+
Just ignore that and continue.
106
114
107
115
After logging in you should see the Trino web interface:
Copy file name to clipboardexpand all lines: docs/modules/trino/pages/getting_started/index.adoc
+3-1
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
= Getting started
2
+
:description: Get started with Trino on Kubernetes using the Stackable Operator. Follow steps for installation, setup, and resource recommendations.
2
3
3
-
This guide will get you started with Trino using the Stackable Operator. It will guide you through the installation of the operator and its dependencies and setting up your first Trino cluster.
4
+
This guide will get you started with Trino using the Stackable Operator.
5
+
It will guide you through the installation of the operator and its dependencies and setting up your first Trino cluster.
Copy file name to clipboardexpand all lines: docs/modules/trino/pages/index.adoc
+1-1
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
= Stackable Operator for Trino
2
-
:description: The Stackable operator for Trino is a Kubernetes operator that can manage Trino clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Trino versions.
2
+
:description: Manage Trino clusters on Kubernetes with the Stackable operator, featuring resource management, demos, and support for custom Trino versions.
3
3
:keywords: Stackable operator, Trino, Kubernetes, k8s, operator, data science, data exploration
Copy file name to clipboardexpand all lines: docs/modules/trino/pages/usage-guide/configuration.adoc
+17-9
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,5 @@
1
1
= Configuration
2
+
:description: Configure Trino clusters with properties, environment variables, and resource requests. Customize settings for performance and storage efficiently.
2
3
3
4
The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).
4
5
@@ -8,11 +9,11 @@ IMPORTANT: Do not override port numbers. This will lead to faulty installations.
8
9
9
10
For a role or role group, at the same level of `config`, you can specify `configOverrides` for:
10
11
11
-
- `config.properties`
12
-
- `node.properties`
13
-
- `log.properties`
14
-
- `password-authenticator.properties`
15
-
- `security.properties`
12
+
* `config.properties`
13
+
* `node.properties`
14
+
* `log.properties`
15
+
* `password-authenticator.properties`
16
+
* `security.properties`
16
17
17
18
For a list of possible configuration properties consult the https://trino.io/docs/current/admin/properties.html[Trino Properties Reference].
18
19
@@ -46,9 +47,13 @@ All override property values must be strings. The properties will be passed on w
46
47
47
48
=== The security.properties file
48
49
49
-
The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
50
+
The `security.properties` file is used to configure JVM security properties.
51
+
It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
50
52
51
-
The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 414, Trino performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up queries you can configure the TTL of entries in the positive cache like this:
53
+
The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved.
54
+
Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them.
55
+
As of version 414, Trino performs poorly if the positive cache is disabled.
56
+
To cache resolved host names, and thus speeding up queries you can configure the TTL of entries in the positive cache like this:
52
57
53
58
[source,yaml]
54
59
----
@@ -124,7 +129,9 @@ workers:
124
129
capacity: 3Gi
125
130
----
126
131
127
-
In the above example, all Trino workers in the default group will store data (the location of the property `--data-dir`) on a `3Gi` volume. Additional role groups not specifying any resources will inherit the config provided on the role level (`2Gi` volume). This works the same for memory or CPU requests.
132
+
In the above example, all Trino workers in the default group will store data (the location of the property `--data-dir`) on a `3Gi` volume.
133
+
Additional role groups not specifying any resources will inherit the config provided on the role level (`2Gi` volume).
134
+
This works the same for memory or CPU requests.
128
135
129
136
By default, in case nothing is configured in the custom resource for a certain role group, each Pod will have a `2Gi` large local volume mount for the data location containing mainly logs.
130
137
@@ -168,4 +175,5 @@ spec:
168
175
capacity: '1Gi'
169
176
----
170
177
171
-
WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production. Please adapt according to your requirements.
178
+
WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production.
TIP: In case you are using OpenID connect, use `--external-authentication` instead of `--password`. A browser window will be opened, which might require you to log in. Please note that you still need to pass the `--user` argument because of https://github.com/trinodb/trino/issues/11547[this Trino issue].
33
+
TIP: In case you are using OpenID connect, use `--external-authentication` instead of `--password`.
34
+
A browser window will be opened, which might require you to log in.
35
+
Please note that you still need to pass the `--user` argument because of https://github.com/trinodb/trino/issues/11547[this Trino issue].
As the last step you can click on _Finish_ and start using the Trino connection.
55
58
56
-
TIP: In case you are using OpenID connect, set the `externalAuthentication` property to `true` and don't provide and username or password. A browser window will be opened, which might require you to log in.
59
+
TIP: In case you are using OpenID connect, set the `externalAuthentication` property to `true` and don't provide and username or password.
60
+
A browser window will be opened, which might require you to log in.
Copy file name to clipboardexpand all lines: docs/modules/trino/pages/usage-guide/query.adoc
+3-1
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
= Testing Trino with Hive and S3
2
+
:description: Test Trino with Hive and S3 by creating a schema and table for Iris data in Parquet format, then querying the dataset.
2
3
3
-
Create a schema and a table for the Iris data located in S3 and query data. This assumes to have the Iris data set in the `PARQUET` format available in the S3 bucket which can be downloaded https://www.kaggle.com/gpreda/iris-dataset/version/2?select=iris.parquet[here].
4
+
Create a schema and a table for the Iris data located in S3 and query data.
5
+
This assumes to have the Iris data set in the `PARQUET` format available in the S3 bucket which can be downloaded https://www.kaggle.com/gpreda/iris-dataset/version/2?select=iris.parquet[here].
Copy file name to clipboardexpand all lines: docs/modules/trino/pages/usage-guide/s3.adoc
+1
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,5 @@
1
1
= Connecting Trino to S3
2
+
:description: Configure S3 connections in Trino either inline within the TrinoCatalog or via an external S3Connection resource for centralized management.
2
3
3
4
You can specify S3 connection details directly inside the TrinoCatalog specification or by referring to an external S3Connection custom resource.
4
5
This mechanism used used across the whole Stackable Data Platform, read the xref:concepts:s3.adoc[S3 concepts page] to learn more.
0 commit comments