-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BTrDB duplicate datapoints with same index/MDAL issue ? #56
Comments
I'm pretty sure this is because some of the data points were written twice while we were developing the Green Button data ingester. You should be able to get rid of it when you insert into a DataFrame (and in which case #55 should fix it) |
Sum is mean times count |
"Sum is mean times count" MDAL mean (15min): 1120.0 kWh Downloading the raw data with MDAL and doing all this in pandas has produced another issue that we are looking into. |
You can definitely drop duplicated rows with the same index in pandas. In our scenario, we aren't going to have different values for the same index (timestamp), so this strategy should work. @marcopritoni you should also make a note of which streams have duplicate points so we can clean them up later. |
Sure I can make a list. Do you want me to add it here or keep it offline? |
Offline would be better; thanks! It's pretty easy to fix the streams to remove duplicates (I already have 90% of the script done). Maybe you could make a spreadsheet and add the streams to there so I can mark them off when they're done? Shoot me an email |
#58 should help |
Just noticed that greenbutton data from XBOS downloaded as mdal.RAW data, has multiple (all?) points with the same timestamp (and value). Not sure if the issue is in BTrDB or MDAL.
I would imagine we do not want to have the same timestamp for multiple points.
Example:
{'4d95d5ce-de62-3449-bd58-4dcad75b526d':
2017-01-01 00:00:00-08:00 1.6395
2017-01-01 00:00:00-08:00 1.6395
2017-01-01 00:15:00-08:00 0.9959
2017-01-01 00:15:00-08:00 0.9959
2017-01-01 00:30:00-08:00 1.6222
2017-01-01 00:30:00-08:00 1.6222
2017-01-01 00:45:00-08:00 1.6374
2017-01-01 00:45:00-08:00 1.6374
... }
I need to download this as raw, because it's energy (kWh) and not power and each reading should be summed and the existing stats aggregation functions (mean, max, min, count) do not support it.
The text was updated successfully, but these errors were encountered: