Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Where to reference/install dbt packages with MWAA? #73

Open
drackham opened this issue Nov 18, 2022 · 4 comments
Open

Where to reference/install dbt packages with MWAA? #73

drackham opened this issue Nov 18, 2022 · 4 comments

Comments

@drackham
Copy link

drackham commented Nov 18, 2022

Where should I be installing packages (or referencing if deployed during a CI step) dbt packages while using MWAA? I'm currently installing to: /tmp/dbt/packages but am experiencing about a 50/50 success rate when running the test operator. Here is what the graph view of my dags look like:

Screen Shot 2022-11-12 at 12 21 06 PM

This is the error log for when the runs fail:
Screen Shot 2022-11-12 at 12 21 38 PM

The deps operator works great, then dbt run, but the test operator fails about 50% of the time. Should I be deploying package files to a folder within the MWAA S3 folder and then updating profiles.yml to reference that location?

Environment context:
MWAA v2.0.2
dbt-core>=1.0
dbt-postgres>=1.0
airflow-dbt==0.4.0

@Falydoor
Copy link
Contributor

I was able to have custom modules by having the modules in the folder dbt_modules in my dbt project. Then the whole dbt project is in the dags folder on S3.

And finally, when using the dbt operator, I use the dir argument so dbt has access to dbt_modules:

DbtRunOperator(
  task_id='dbt_run',
  dbt_bin='/usr/local/airflow/.local/bin/dbt',
  profiles_dir='/usr/local/airflow/dags/{DBT_FOLDER}/',
  dir='/usr/local/airflow/dags/{DBT_FOLDER}/'
)

Hopefully you can replicate the same idea for your dbt packages.

@sandervandorsten
Copy link

sandervandorsten commented Feb 22, 2023

bit of a different question but related to the using airflow-dbt with MWAA: i can't get your example to run, or more that manner the example in the airflow-dbt/README.md on MWAA.

Ithink the folder where the code runs in MWAA is non-writable?. see AWS MWAA docs on using dbt

I can get my dag to run locally using the following setup and code:
environment (python 3.10) in docker container using celery_executor

apache-airflow==2.4.3
dbt-core==1.3.2
airflow_dbt=0.4.0

Code:

...
default_args = dict(
  dbt_bin="dbt"",
  profiles_dir="/opt/airflow/dags/dbt",
  dir="/opt/airflow/dags/dbt"
)
with DAG(
    start_date=datetime(2022, 3, 14),
    schedule_interval="@once",
    dag_id="dbt_test",
    default_args=default_args,
    tags=["dbt", "development"],
) as dag:
    dbt_run = DbtRunOperator(
        task_id="dbt_run_royalty",
        models="+marts-report_team-royalty",
        target="dev-sandervd",
    )

but if i run the similar setup in MWAA i get an error regarding writing logs. I use different default_args like so (required for MWAA)

default_args = dict(
  dbt_bin='/usr/local/airflow/.local/bin/dbt',
  profiles_dir='/usr/local/airflow/dags/dbt/',
  dir='/usr/local/airflow/dags/dbt/'
)

I get the following error

*** Reading remote log from Cloudwatch log_group: airflow-airflow-dev-sandervd-Task log_stream: dag_id=dev_dbt/run_id=manual__2023-02-22T08_27_50.474777+00_00/task_id=dbt_run_royalty/attempt=1.log.
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1165}} INFO - Dependencies all met for <TaskInstance: dev_dbt.dbt_run_royalty manual__2023-02-22T08:27:50.474777+00:00 [queued]>
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1165}} INFO - Dependencies all met for <TaskInstance: dev_dbt.dbt_run_royalty manual__2023-02-22T08:27:50.474777+00:00 [queued]>
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1362}} INFO - 
--------------------------------------------------------------------------------
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1363}} INFO - Starting attempt 1 of 1
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1364}} INFO - 
--------------------------------------------------------------------------------
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1383}} INFO - Executing <Task(DbtRunOperator): dbt_run_royalty> on 2023-02-22 08:27:50.474777+00:00
[2023-02-22, 09:27:59 CET] {{standard_task_runner.py:55}} INFO - Started process 2349 to run task
[2023-02-22, 09:27:59 CET] {{standard_task_runner.py:82}} INFO - Running: ['airflow', 'tasks', 'run', 'dev_dbt', 'dbt_run_royalty', 'manual__2023-02-22T08:27:50.474777+00:00', '--job-id', '40', '--raw', '--subdir', 'DAGS_FOLDER/merlin_dags/dbt_dags/dag_dev_dbt.py', '--cfg-path', '/tmp/tmp3xtcgs41']
[2023-02-22, 09:27:59 CET] {{standard_task_runner.py:83}} INFO - Job 40: Subtask dbt_run_royalty
[2023-02-22, 09:27:59 CET] {{task_command.py:376}} INFO - Running <TaskInstance: dev_dbt.dbt_run_royalty manual__2023-02-22T08:27:50.474777+00:00 [running]> on host b7f1bf7e73bf
[2023-02-22, 09:27:59 CET] {{taskinstance.py:1590}} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=dev_dbt
AIRFLOW_CTX_TASK_ID=dbt_run_royalty
AIRFLOW_CTX_EXECUTION_DATE=2023-02-22T08:27:50.474777+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-02-22T08:27:50.474777+00:00
[2023-02-22, 09:27:59 CET] {{dbt_hook.py:117}} INFO - /usr/local/airflow/.local/bin/dbt run --profiles-dir /usr/local/airflow/dags/dbt --target dev-sandervd --models +marts-report_team-royalty
[2023-02-22, 09:27:59 CET] {{dbt_hook.py:126}} INFO - Output:
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - �[0m08:28:02.215321 [error] [MainThread]: Encountered an error:
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log'
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - �[0m08:28:02  Encountered an error:
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log'
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - �[0m08:28:02.218674 [error] [MainThread]: Traceback (most recent call last):
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 135, in main
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     results, succeeded = handle_and_check(args)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 198, in handle_and_check
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     task, res = run_from_args(parsed)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 234, in run_from_args
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     setup_event_logger(log_path or "logs", level_override)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/events/functions.py", line 81, in setup_event_logger
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     file_handler = RotatingFileHandler(
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/lib/python3.10/logging/handlers.py", line 155, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     BaseRotatingHandler.__init__(self, filename, mode, encoding=encoding,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/lib/python3.10/logging/handlers.py", line 58, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     logging.FileHandler.__init__(self, filename, mode=mode,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/lib/python3.10/logging/__init__.py", line 1169, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     StreamHandler.__init__(self, self._open())
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/lib/python3.10/logging/__init__.py", line 1201, in _open
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     return open_func(self.baseFilename, self.mode,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - PermissionError: [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log'
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING - /usr/local/airflow/.local/lib/python3.10/site-packages/watchtower/__init__.py:349 WatchtowerWarning: Received empty message. Empty messages cannot be sent to CloudWatch Logs
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING - Traceback (most recent call last):
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING -   File "/usr/local/airflow/config/cloudwatch_logging.py", line 161, in emit
    self.sniff_errors(record)
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING -   File "/usr/local/airflow/config/cloudwatch_logging.py", line 211, in sniff_errors
    if pattern.search(record.message):
[2023-02-22, 09:28:02 CET] {{logging_mixin.py:137}} WARNING - AttributeError: 'LogRecord' object has no attribute 'message'
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - �[0m08:28:02  Traceback (most recent call last):
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 135, in main
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     results, succeeded = handle_and_check(args)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 198, in handle_and_check
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     task, res = run_from_args(parsed)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/main.py", line 234, in run_from_args
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     setup_event_logger(log_path or "logs", level_override)
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/local/airflow/.local/lib/python3.10/site-packages/dbt/events/functions.py", line 81, in setup_event_logger
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     file_handler = RotatingFileHandler(
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/lib/python3.10/logging/handlers.py", line 155, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     BaseRotatingHandler.__init__(self, filename, mode, encoding=encoding,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/lib/python3.10/logging/handlers.py", line 58, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     logging.FileHandler.__init__(self, filename, mode=mode,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/lib/python3.10/logging/__init__.py", line 1169, in __init__
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     StreamHandler.__init__(self, self._open())
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -   File "/usr/lib/python3.10/logging/__init__.py", line 1201, in _open
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO -     return open_func(self.baseFilename, self.mode,
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:130}} INFO - PermissionError: [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log'
[2023-02-22, 09:28:02 CET] {{dbt_hook.py:132}} INFO - Command exited with return code 2
[2023-02-22, 09:28:02 CET] {{taskinstance.py:1851}} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt/operators/dbt_operator.py", line 98, in execute
    self.create_hook().run_cli('run')
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow_dbt/hooks/dbt_hook.py", line 138, in run_cli
    raise AirflowException("dbt command failed")
airflow.exceptions.AirflowException: dbt command failed
[2023-02-22, 09:28:02 CET] {{taskinstance.py:1401}} INFO - Marking task as FAILED. dag_id=dev_dbt, task_id=dbt_run_royalty, execution_date=20230222T082750, start_date=20230222T082759, end_date=20230222T082802
[2023-02-22, 09:28:02 CET] {{standard_task_runner.py:100}} ERROR - Failed to execute job 40 for task dbt_run_royalty (dbt command failed; 2349)
[2023-02-22, 09:28:02 CET] {{local_task_job.py:159}} INFO - Task exited with return code 1
[2023-02-22, 09:28:02 CET] {{taskinstance.py:2623}} INFO - 0 downstream tasks scheduled from follow-on schedule check

I can get it to run in MWAA if i run it in the tmp dir which seems writable (as per AWS MWAA's example) using the setup below, however in that way i can't use the airflow-dbt operators anymore. Any suggestions?

cli_command = BashOperator(
    task_id="dbt_run_mwaa",
    bash_command=f"cp -R /usr/local/airflow/dags/dbt/ /tmp;\
    cd /tmp/dbt;\
    /usr/local/airflow/.local/bin/dbt run --project-dir /tmp/dbt/ -s incoming-royalty_etl-alibaba;\
    ",
)

@drackham
Copy link
Author

@sandervandorsten it looks like it might be an issue with your dbt_project.yml configuration.

In order to get around [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log' you need to update your config in dbt_project.yml to include:

target-path: "/tmp/dbt/target" # https://github.com/gocardless/airflow-dbt/issues/33 log-path: "/tmp/dbt/logs"

@sandervandorsten
Copy link

@sandervandorsten it looks like it might be an issue with your dbt_project.yml configuration.

In order to get around [Errno 13] Permission denied: '/usr/local/airflow/dags/dbt/logs/dbt.log' you need to update your config in dbt_project.yml to include:

target-path: "/tmp/dbt/target" # https://github.com/gocardless/airflow-dbt/issues/33 log-path: "/tmp/dbt/logs"

Thanks! I indeed changed my dbt_project.yml files, works as a charm! for completeness

[other stuff]
...

# Changed folders that dbt writes to during runtime
# because with AWS MWAA the worker node's directory is non-writable
# # https://github.com/gocardless/airflow-dbt/issues/33
packages-install-path: "/tmp/dbt/dbt_packages"
log-path: "/tmp/dbt/logs"
target-path: "/tmp/dbt/target"

...
[other stuff]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants