Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodata returned for a query where there is extreme data #46

Open
veenstrajelmer opened this issue Mar 4, 2024 · 0 comments
Open

nodata returned for a query where there is extreme data #46

veenstrajelmer opened this issue Mar 4, 2024 · 0 comments

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Mar 4, 2024

  • ddlpy version: main
  • Python version: 3.11
  • Operating System: Windows

Description

When checking whether there is data available we sometimes get False even though there is data. measurements_available check was part of measurements from #33, but this was temporarily disabled in #57 because of this. Re-enable this check again after fix in ddl.

What I Did

import ddlpy

locations = ddlpy.locations()
bool_hoedanigheid = locations['Hoedanigheid.Code'].isin(['NAP'])
bool_stations = locations.index.isin(['HOEKVHLD'])
bool_grootheid = locations['Grootheid.Code'].isin(['WATHTE'])
selected = locations.loc[bool_grootheid & bool_hoedanigheid & bool_stations]

date_min = "1980-01-01"
date_max = "1980-01-05"
# if we pass one row to the measurements function you can get all the measurements
available = ddlpy.measurements_available(selected.iloc[0], date_min, date_max)
measurements = ddlpy.measurements(selected.iloc[0], date_min, date_max)
measurements.plot(y='Meetwaarde.Waarde_Numeriek', linewidth=0.5, figsize=(13, 8))

print("available:", available)
print("num meas:", len(measurements))

Prints available: False but num meas: 28.

This also happens the other way round

import pandas as pd
import ddlpy

locations = ddlpy.locations()
bool_hoedanigheid = locations['Hoedanigheid.Code'].isin(['MSL'])
bool_stations = locations.index.isin(['VERDTLPNOT'])
bool_grootheid = locations['Grootheid.Code'].isin(['WATHTE'])
bool_groepering = locations['Groepering.Code'].isin(['NVT'])
selected = locations.loc[bool_grootheid & bool_hoedanigheid & bool_groepering & bool_stations]
loc = selected.iloc[0]

start_date = pd.Timestamp("1999-09-01")
end_date = pd.Timestamp("2000-04-01")

# get measurements (this fails)
avai = ddlpy.measurements_available(loc, start_date=start_date, end_date=end_date)
meas = ddlpy.measurements(loc, start_date=start_date, end_date=end_date)
print("meas available:", avai)
print("num meas retrieved:", len(meas))

This prints:

meas available: True
num meas retrieved: 0

This is an issue in ddl, not in ddlpy, the corresponding issue can be found here: Rijkswaterstaat/wm-ws-dl#11

After this is fixed, the measurements_available check can be re-introduced as part of measurements again. This significantly speeds up the retrieval process for years without data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant