Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in PostgreSQL Extension While Exporting Tables to Parquet #274

Open
2 tasks done
vineetver opened this issue Dec 6, 2024 · 0 comments
Open
2 tasks done

Comments

@vineetver
Copy link

What happens?

Memory usage keeps increasing as the process progresses.
It seems that the results from PostgreSQL queries are not being released or cleaned up properly.

To Reproduce

I am using DuckDB to export tables from my local PostgreSQL database to Parquet files. However, I am noticing a significant memory increase during the process, suggesting a potential memory leak. Below is the code I am using:

import duckdb

con = duckdb.connect(database='my_database.duckdb')

con.install_extension("postgres_scanner")
con.load_extension("postgres_scanner")
con.sql("SET memory_limit = '20GB';")
con.sql("SET threads TO 3;")
con.sql("SET enable_progress_bar = true;")
con.sql("""
    ATTACH 'dbname=** user=** host=127.0.0.1 password=**' AS db (TYPE POSTGRES, READ_ONLY);
""")

all_tables = con.sql("SHOW ALL tables;").fetchdf()
tables = all_tables['name'].to_list()

for table in tables:
    con.execute(f"COPY db.public.{table} TO '{table}.parquet' (FORMAT PARQUET);")
    print(f"Table {table} copied to {table}.parquet")

con.close()

OS:

Ubuntu, x86_64

DuckDB Version:

1.1.3

DuckDB Client:

Python

Hardware:

VM: 32 GB RAM, 8 Core

Full Name:

Vineet Verma

Affiliation:

Harvard University

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

No - I cannot easily share my data sets due to their large size

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
@szarnyasg szarnyasg transferred this issue from duckdb/duckdb Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant