Skip to content

Latest commit

 

History

History
399 lines (284 loc) · 12.6 KB

README.md

File metadata and controls

399 lines (284 loc) · 12.6 KB

dbt-codegen

Macros that generate dbt code, and log it to the command line.

Contents

Installation instructions

New to dbt packages? Read more about them here.

  1. Include this package in your packages.yml file — check here for the latest version number:
packages:
  - package: dbt-labs/codegen
    version: X.X.X ## update to latest version here
  1. Run dbt deps to install the package.

Macros

generate_source (source)

This macro generates lightweight YAML for a Source, which you can then paste into a schema file.

Arguments

  • schema_name (required): The schema name that contains your source data
  • database_name (optional, default=target.database): The database that your source data is in.
  • table_names (optional, default=none): A list of tables that you want to generate the source definitions for.
  • generate_columns (optional, default=False): Whether you want to add the column names to your source definition.
  • include_descriptions (optional, default=False): Whether you want to add description placeholders to your source definition.
  • include_data_types (optional, default=True): Whether you want to add data types to your source columns definitions.
  • table_pattern (optional, default='%'): A table prefix / postfix that you want to subselect from all available tables within a given schema.
  • exclude (optional, default=''): A string you want to exclude from the selection criteria
  • name (optional, default=schema_name): The name of your source
  • include_database (optional, default=False): Whether you want to add the database to your source definition
  • include_schema (optional, default=False): Whether you want to add the schema to your source definition
  • case_sensitive_databases (optional, default=False): Whether you want database names to be in lowercase, or to match the case in the source table — not compatible with Redshift
  • case_sensitive_schemas (optional, default=False): Whether you want schema names to be in lowercase, or to match the case in the source table — not compatible with Redshift
  • case_sensitive_tables (optional, default=False): Whether you want table names to be in lowercase, or to match the case in the source table — not compatible with Redshift
  • case_sensitive_cols (optional, default=False): Whether you want column names to be in lowercase, or to match the case in the source table

Outputting to a file

If you use the dbt run-operation approach it is possible to output directly to a file by piping the output to a new file and using the --quiet CLI flag:

dbt --quiet run-operation generate_source --args '{"table_names": ["orders"]}' > models/staging/jaffle_shop/_sources.yml

Usage:

  1. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code
{{ codegen.generate_source('raw_jaffle_shop') }}

or for multiple arguments

{{ codegen.generate_source(schema_name= 'jaffle_shop', database_name= 'raw') }}

Alternatively, call the macro as an operation:

$ dbt run-operation generate_source --args 'schema_name: raw_jaffle_shop'

or

# for multiple arguments, use the dict syntax
$ dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "database_name": "raw", "table_names":["table_1", "table_2"]}'

or if you want to include column names and data types:

$ dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "generate_columns": true}'

or if you want to include column names without data types (the behavior dbt-codegen <= v0.9.0):

$ dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "generate_columns": true, "include_data_types": false}'
  1. The YAML for the source will be logged to the command line
version: 2

sources:
  - name: raw_jaffle_shop
    database: raw
    schema: raw_jaffle_shop
    tables:
      - name: customers
        description: ""
      - name: orders
        description: ""
      - name: payments
        description: ""
  1. Paste the output in to a schema .yml file, and refactor as required.

generate_base_model (source)

This macro generates the SQL for a base model, which you can then paste into a model.

Arguments:

  • source_name (required): The source you wish to generate base model SQL for.
  • table_name (required): The source table you wish to generate base model SQL for.
  • leading_commas (optional, default=False): Whether you want your commas to be leading (vs trailing).
  • case_sensitive_cols (optional, default=False): Whether your source table has case sensitive column names. If true, keeps the case of the column names from the source.
  • materialized (optional, default=None): Set materialization style (e.g. table, view, incremental) inside of the model's config block. If not set, materialization style will be controlled by dbt_project.yml

Usage:

  1. Create a source for the table you wish to create a base model on top of.
  2. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code
{{ codegen.generate_base_model(
    source_name='raw_jaffle_shop',
    table_name='customers',
    materialized='table'
) }}

Alternatively, call the macro as an operation:

$ dbt run-operation generate_base_model --args '{"source_name": "raw_jaffle_shop", "table_name": "customers"}'
  1. The SQL for a base model will be logged to the command line
with source as (

    select * from {{ source('raw_jaffle_shop', 'customers') }}

),

renamed as (

    select
        id,
        first_name,
        last_name,
        email,
        _elt_updated_at

    from source

)

select * from renamed
  1. Paste the output in to a model, and refactor as required.

create_base_models (source)

This macro generates a series of terminal commands (appended with the && to allow for subsequent execution) that execute the base_model_creation bash script. This bash script will write the output of the generate_base_model macro into a new model file in your local dbt project.

Note: This macro is not compatible with the dbt Cloud IDE.

Arguments:

  • source_name (required): The source you wish to generate base model SQL for.
  • tables (required): A list of all tables you want to generate the base models for.

Usage:

  1. Create a source for the table you wish to create a base model on top of.
  2. Copy the macro into a statement tab into your local IDE, and run your code
dbt run-operation codegen.create_base_models --args '{source_name: my-source, tables: ["this-table","that-table"]}'

base_model_creation (source)

This bash script when executed from your local IDE will create model files in your dbt project instance that contain the outputs of the generate_base_model macro.

Note: This macro is not compatible with the dbt Cloud IDE.

Arguments:

  • source_name (required): The source you wish to generate base model SQL for.
  • tables (required): A list of all tables you want to generate the base models for.

Usage:

  1. Create a source for the table you wish to create a base model on top of.
  2. Copy the macro into a statement tab into your local IDE, and run your code
source dbt_packages/codegen/bash_scripts/base_model_creation.sh "source_name" ["this-table","that-table"]

generate_model_yaml (source)

This macro generates the YAML for a list of model(s), which you can then paste into a schema.yml file.

Arguments:

  • model_names (required): The model(s) you wish to generate YAML for.
  • upstream_descriptions (optional, default=False): Whether you want to include descriptions for identical column names from upstream models and sources.
  • include_data_types (optional, default=True): Whether you want to add data types to your model column definitions.

Usage:

  1. Create a model.
  2. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code
{{ codegen.generate_model_yaml(
    model_names=['customers']
) }}

You can use the helper function codegen.get_models and specify a directory and/or prefix to get a list of all matching models, to be passed into model_names list.

{% set models_to_generate = codegen.get_models(directory='marts', prefix='fct_') %}
{{ codegen.generate_model_yaml(
    model_names = models_to_generate
) }}

Alternatively, call the macro as an operation:

$ dbt run-operation generate_model_yaml --args '{"model_names": ["customers"]}'
  1. The YAML for a base model(s) will be logged to the command line
version: 2

models:
  - name: customers
    description: ""
    columns:
      - name: customer_id
        data_type: integer
        description: ""
      - name: customer_name
        data_type: text
        description: ""
  1. Paste the output in to a schema.yml file, and refactor as required.

generate_model_import_ctes (source)

This macro generates the SQL for a given model with all references pulled up into import CTEs, which you can then paste back into the model.

Arguments:

  • model_name (required): The model you wish to generate SQL with import CTEs for.
  • leading_commas (optional, default=False): Whether you want your commas to be leading (vs trailing).

Usage:

  1. Create a model with your original SQL query
  2. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code
{{ codegen.generate_model_import_ctes(
    model_name = 'my_dbt_model'
) }}

Alternatively, call the macro as an operation:

$ dbt run-operation generate_model_import_ctes --args '{"model_name": "my_dbt_model"}'
  1. The new SQL - with all references pulled up into import CTEs - will be logged to the command line
with customers as (

    select * from {{ ref('stg_customers') }}

),

orders as (

    select * from {{ ref('stg_orders') }}

),

payments as (

    select * from {{ ref('stg_payments') }}

),

customer_orders as (

    select
        customer_id,
        min(order_date) as first_order,
        max(order_date) as most_recent_order,
        count(order_id) as number_of_orders
    from orders
    group by customer_id

),

customer_payments as (

    select
        orders.customer_id,
        sum(amount) as total_amount
    from payments
    left join orders on
         payments.order_id = orders.order_id
    group by orders.customer_id

),

final as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order,
        customer_orders.most_recent_order,
        customer_orders.number_of_orders,
        customer_payments.total_amount as customer_lifetime_value
    from customers
    left join customer_orders
        on customers.customer_id = customer_orders.customer_id
    left join customer_payments
        on  customers.customer_id = customer_payments.customer_id

)

select * from final
  1. Replace the contents of the model's current SQL file with the compiled or logged code

Contributing

To contirbute code to this package, please follow the steps outlined in the integration_tests directory's README file.