Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/#325 import status2019 #326

Merged
merged 10 commits into from
Oct 23, 2024
13 changes: 13 additions & 0 deletions src/egon/data/airflow/dags/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@
from egon.data.datasets.zensus import ZensusMiscellaneous, ZensusPopulation
from egon.data.datasets.zensus_mv_grid_districts import ZensusMvGridDistricts
from egon.data.datasets.zensus_vg250 import ZensusVg250
from egon.data.datasets.scenario_path.import_status2019 import Import_Status2019

# Set number of threads used by numpy and pandas
set_numexpr_threads()
Expand Down Expand Up @@ -672,6 +673,18 @@
]
)

# import scenario status2019 from backup
import_status2019 = Import_Status2019(
dependencies=[
storage_etrago,
hts_etrago_table,
fill_etrago_generators,
household_electricity_demand_annual,
cts_demand_buildings,
emobility_mit,
]
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need all these dependencies?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were the dependencies used for the low and medium flexibility scenarios. Since the creation concept of our new scenarios is the same, it makes sense to me to use the same dependencies. Now that you mentioned this, I think it is better to have this import status2019 and the task that creates the new scenarios in the same dataset.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry but I don't really get what you want to change.
I think it is fine like this, it might help to make a comment in the code why these dependencies are set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant these changes: 723295a


# ########## Keep this dataset at the end
# Sanity Checks
sanity_checks = SanityChecks(
Expand Down
87 changes: 87 additions & 0 deletions src/egon/data/datasets/scenario_path/import_status2019.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""
Read eTraGo tables for the status2019 and import it to db
"""
import os
import subprocess

import pandas as pd

from egon.data import config, db
from egon.data.datasets import Dataset


class Import_Status2019(Dataset):
def __init__(self, dependencies):
super().__init__(
name="import_status2019",
version="0.0.1",
dependencies=dependencies,
tasks=(import_scn_status2019,),
)


def import_scn_status2019():
"""
Read and import the scenario status2019 and import it into db

Parameters
----------
*No parameters required

"""
# Connect to the data base
con = db.engine()

# Clean existing data for status2019
tables = pd.read_sql(
"""
SELECT tablename FROM pg_catalog.pg_tables
WHERE schemaname = 'grid'
""",
con,
)

tables = tables[
~tables["tablename"].isin(
[
"egon_etrago_carrier",
"egon_etrago_temp_resolution",
]
)
]

for table in tables["tablename"]:
db.execute_sql(
f"""
DELETE FROM grid.{table} WHERE scn_name = 'status2019';
"""
)

my_env = os.environ.copy()
my_env["PGPASSWORD"] = "data"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an option to get the password from somewhere else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we can retrieve the password from datasets.yml a83ca47


config_data = config.settings()["egon-data"]
database = config_data["--database-name"]
host = config_data["--database-host"]
port = config_data["--database-port"]
user = config_data["--database-user"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The password is stored in config_data as well, isn't it? At least it is listed in my config files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned, a parameter called database-password is in the configuration file. But it can be modified anytime by the user, and then it would not match the fixed password that the status2019 has.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is the same for all parameters, isn't it?
Other functions also access the password from there, so it would be a problem anyhow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case, shouldn't we take database-password out from the configuration file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I would like to merge this branch before proceeding with the scenario path creation, I will retrieve the password using the database-password parameter in the configuration file, as you suggested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change applied in: a7cd357

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I completely forgot to reply. Looks good to me!


for table in tables["tablename"]:
subprocess.Popen(
[
"pg_restore",
"-d",
database,
"--host",
host,
"--port",
port,
"-U",
user,
"-a",
"--single-transaction",
f"--table={table}",
"data_bundle_powerd_data/PoWerD_status2019-v2.backup",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refer to the already uploaded backup? This way we would upload the same data again, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a new version of status2019 in Zenodo. Now it is downloaded and used in the same dataset: 2bb1ae0 and 95495d2

],
env=my_env,
)
Loading