Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Default external volume not being consider on dbt-snowflake iceberg. #1266

Open
2 tasks done
maurofloriano opened this issue Dec 6, 2024 · 3 comments
Open
2 tasks done
Labels
enhancement New feature or request iceberg

Comments

@maurofloriano
Copy link

maurofloriano commented Dec 6, 2024

Is this a new bug in dbt-snowflake?

  • I believe this is a new bug in dbt-snowflake
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Snowflake CREATE ICEBERG TABLE DDL gives EXTERNAL_VOLUME as an optional parameter with the following reasoning:

If you don’t specify this parameter, the Iceberg table defaults to the external volume for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.

However, at the moment, dbt-snowflake expects that external_volume is set at the model level into the configs.

external_volume = '{config.get('external_volume')}'

Expected Behavior

If an external volume is set at account/database/schema level, dbt should ignore external volume field and just not pass this information to the statment.

Steps To Reproduce

  1. Set external volume to database
  2. Not pass this info to the model config
  3. Run dbt.

Relevant log output

External volume 'NONE' does not exist or not authorized.

Environment

- OS:
- Python:
- dbt-core:
- dbt-snowflake:

Additional Context

No response

@maurofloriano maurofloriano added bug Something isn't working triage labels Dec 6, 2024
@dataders
Copy link
Contributor

dataders commented Dec 6, 2024

@maurofloriano thanks for reporting!

three follow-up questions:

  1. can you confirm that you are not blocked by this as you can just pass the external volume as a config (even though its redundant)?
  2. are you satisfied with Snowflake's error message when you don't pass an EXTERNAL VOLUME and there isn't a default associated with the database or schema? or should dbt-snowflake provide something more specific/helpful?
  3. can you share more about why you think its a good idea to set the external volume as default on the database? asking because Snowflake Iceberg is rather new, so we're just starting to think about best practices.

@maurofloriano
Copy link
Author

maurofloriano commented Dec 6, 2024

  1. For now it might work in our case and we are not blocked by this, I do understand that is still not an official release and errors might happen. We are still on soft release for iceberg on snowflake also.
  2. I think is okay, but in my case that i had something defined at database level, was confusing in the beginning but then I looked into snowflake logs and saw that external catalog was being set anyway.
  3. In our scenario, we have different logic internally for different environments that will be hidden for our final user, and this is define by combination of database + external volume. So all those configurations we avoid making optional for users and configure ourselfs when possible.

I do have a suggestion for this issue, maybe something like:
if external volume is set:
pass external volume for the call
else:
expect is set at some level. (this will throw an error in case is not such as: external volume is not define.

Might be easy enough to implement and if users define will always take precedent into the default one

@dataders dataders added enhancement New feature or request iceberg and removed bug Something isn't working triage labels Dec 6, 2024
@dataders dataders changed the title [Bug] Default external volume not being consider on dbt-snowflake iceberg. [Feature] Default external volume not being consider on dbt-snowflake iceberg. Dec 6, 2024
@maurofloriano
Copy link
Author

  def get_iceberg_ddl_options(self, config: RelationConfig) -> str:
        
        base_location: str = f"_dbt/{self.database}/{self.schema}/{self.name}"

        if subpath := config.get("base_location_subpath"):
            base_location += f"/{subpath}"


        iceberg_ddl_predicates: str = f"""
        catalog = 'snowflake'
        base_location = '{base_location}'
        """

        if external_catalog := config.get('external_volume'):
            iceberg_ddl_predicates += f"external_volume = '{external_catalog}'\n"

        return textwrap.indent(textwrap.dedent(iceberg_ddl_predicates), " " * 10)

I had this tested locally and might be a solution, I would also suggest to add database to the base_location path, this way would make it possible for users that share external volumes to make sure to have the unique id for the table easily.

I tried to create a MR with this change to request review, but was not able, not sure if need any extra access or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request iceberg
Projects
None yet
Development

No branches or pull requests

2 participants