Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concept of connections #459

Closed
2 tasks
scsmithr opened this issue Jan 5, 2023 · 9 comments
Closed
2 tasks

Add concept of connections #459

scsmithr opened this issue Jan 5, 2023 · 9 comments
Assignees
Labels
feat 🎇 New feature or request

Comments

@scsmithr
Copy link
Member

scsmithr commented Jan 5, 2023

Summary

See https://www.notion.so/glaredb/Database-Objects-a7e52611154e40efabb7ac46bf0b2a76#fc55700503794ee6bdb784a4349f1768

Specifications

  • Ensure that this is a reasonable thing to do for the types of external tables we want to support in the future.
  • Add appropriate entries to the catalog.
  • Small refactor of creating external tables to use the new connections.

Rationale

Removes the need of specifying credentials every time an external table is added for a database. Also makes updating credentials easier (when we have support for that) since only one entry needs to be updated instead of multiple tables.

Impact

Will affect https://github.com/GlareDB/cloud/issues/431 and https://github.com/GlareDB/cloud/issues/473. Instead of executing one query, we'll need to execute two. However, doing these queries separately enables things like easily adding more that one table at a time, or even attaching an entire schema.

Tasks

  • Spec for statements based off of Materialize, Firebolt etc, FDW research
  • Implementation
@scsmithr scsmithr added the feat 🎇 New feature or request label Jan 5, 2023
@greyscaled
Copy link
Contributor

Documentation of GlareDB-specific statements, rework statements

@greyscaled
Copy link
Contributor

This is not to match a spec/standard, this is custom (no standard AFAIK for external table related statements)

Sean --> Firebolt and Materialize for inspiration on the structure for connection statements.
Rustom --> FDW (Foreign Data Wrapper statements)

Sean is going to write up a spec.

@RustomMS
Copy link
Contributor

RustomMS commented Jan 9, 2023

Once we have the concept of connections is the plan to remove the ability to create ad-hoc external tables with all the info provided?

Wondering if we should have both, the current syntax where we can provide all credentials needed and the the syntax leveraging saved connections?

@scsmithr
Copy link
Member Author

scsmithr commented Jan 9, 2023

Yeah the idea is to create an external table, a 'connection' of some sort is required. My rationale for that is I would rather have one well defined way of doing this.

This does bring up issues like what should we do if there's a public parquet file on S3 that you want to access. The 'connection' in this case wouldn't contain anything useful. So I'm still thinking through that...

@RustomMS
Copy link
Contributor

RustomMS commented Jan 9, 2023

Also we should consider some validation of the connection/credentials some time before querying. Currently I can provide a bad key and we will only find out when querying. Maybe as a follow-on task

@RustomMS
Copy link
Contributor

RustomMS commented Jan 9, 2023

Yeah the idea is to create an external table, a 'connection' of some sort is required. My rationale for that is I would rather have one well defined way of doing this.

I think that is a fair reason.

I will provide a scenario to consider. Say someone is using our product to query from their main datasources. They now need to do some ad-hoc transformations on a sizable number of public data sources. They don't need these persisted.

Maybe we consider having a temp schema that we automatically clean up on graceful shutdown as an alternative to having a ad-hoc compatible syntax on CREATE EXTERNAL TABLE. Likely not something that matters for the first iteration. Though I think having a way to batch remove datasources or have all data sources removed if a given connection is remove is something to keep in mind during the design.

@scsmithr
Copy link
Member Author

scsmithr commented Jan 9, 2023

Also we should consider some validation of the connection/credentials some time before querying. Currently I can provide a bad key and we will only find out when querying. Maybe as a follow-on task

100% agree.

I will provide a scenario to consider. Say someone is using our product to query from their main datasources. They now need to do some ad-hoc transformations on a sizable number of public data sources. They don't need these persisted.

Maybe we consider having a temp schema that we automatically clean up on graceful shutdown as an alternative to having a ad-hoc compatible syntax on CREATE EXTERNAL TABLE. Likely not something that matters for the first iteration.

I think there's two approaches we can take here. DuckDB has a parquet_scan (and others like it) that return table objects. I would like to do something similar for the ad-hoc use cases, but this would require datafusion to support table functions, which it currently doesn't (see e.g. apache/datafusion#212 and apache/datafusion#2177).

The second would be to allow for creating temp external tables via CREATE EXTERNAL TEMP TABLE ... that could match postgres semantics in that they're dropped when the session closes and aren't visible to other sessions.

Though I think having a way to batch remove datasources or have all data sources removed if a given connection is remove is something to keep in mind during the design.

Ideally we would eventually have DROP ... CASCADE support where dropping the connection would drop tables that depend on it.

@scsmithr scsmithr added the on hold ⚓ Issue is on hold (groomed, but not planned) label Jan 11, 2023
@greyscaled
Copy link
Contributor

2023-01-16 Notes

  1. What's left to do on this one?

@greyscaled greyscaled removed the on hold ⚓ Issue is on hold (groomed, but not planned) label Jan 16, 2023
@greyscaled
Copy link
Contributor

Effectively done with the connections stub + metastore work in prog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat 🎇 New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants