Skip to content

Possible missing indices on the database tables task_instance and dag_run #60145

@benrifkind

Description

@benrifkind

Apache Airflow version

Other Airflow 3 version (please specify below)

If "Other Airflow 3 version" selected, which one?

3.1.2

What happened?

On upgrade to Airflow 3.1.2, there was a huge increase in the number of DAG versions for many of the DAGs. I believe this may have been fixed in Airflow 3.1.5 (#59091).

Before upgrading to get the fix, I wanted to clean up the database airflow db clean. This was to clean up around 360K rows. The non batch version ran for 2+ hours before I killed it. Then I tried

airflow db clean -t dag_version --clean-before-timestamp 2025-12-20 --batch-size 1000

This was also extremely slow. Around 6 minutes per batch.

The fix for me was to add (temporary) indices to the tables task_instance and dag_run

CREATE INDEX CONCURRENTLY idx_temp_task_instance_dag_version_id 
  ON task_instance(dag_version_id);

CREATE INDEX CONCURRENTLY idx_temp_dag_run_dag_version_id 
  ON dag_run(created_dag_version_id);

The airflow db clean then ran extremely fast - under 2 minutes.

What you think should happen instead?

airflow db clean should complete relatively quickly. Adding indices to the tables task_instance and dag_run on the DAG version id led to a huge speed up so these indices should likely exist.

How to reproduce

Create a large number of DAG versions and then attempt to drop them

Operating System

Debian GNU/Linux 12 (bookworm)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==9.16.0
apache-airflow-providers-celery==3.13.0
apache-airflow-providers-cncf-kubernetes==10.9.0
apache-airflow-providers-common-compat==1.8.0
apache-airflow-providers-common-io==1.6.4
apache-airflow-providers-common-messaging==2.0.0
apache-airflow-providers-common-sql==1.28.2
apache-airflow-providers-docker==4.4.4
apache-airflow-providers-elasticsearch==6.3.4
apache-airflow-providers-fab==3.0.1
apache-airflow-providers-ftp==3.13.2
apache-airflow-providers-git==0.0.9
apache-airflow-providers-google==18.1.0
apache-airflow-providers-grpc==3.8.2
apache-airflow-providers-hashicorp==4.3.3
apache-airflow-providers-http==5.4.0
apache-airflow-providers-microsoft-azure==12.8.0
apache-airflow-providers-mysql==6.3.4
apache-airflow-providers-odbc==4.10.2
apache-airflow-providers-openlineage==2.7.3
apache-airflow-providers-postgres==6.4.0
apache-airflow-providers-redis==4.3.2
apache-airflow-providers-sendgrid==4.1.4
apache-airflow-providers-sftp==5.4.1
apache-airflow-providers-slack==9.4.0
apache-airflow-providers-smtp==2.3.1
apache-airflow-providers-snowflake==6.6.0
apache-airflow-providers-ssh==4.1.5
apache-airflow-providers-standard==1.9.1

Deployment

Official Apache Airflow Helm Chart

Deployment details

helm on AWS EKS

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions