Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using phenodata as a library #3

Open
MarkusZehner opened this issue Jun 15, 2020 · 14 comments
Open

Using phenodata as a library #3

MarkusZehner opened this issue Jun 15, 2020 · 14 comments

Comments

@MarkusZehner
Copy link

MarkusZehner commented Jun 15, 2020

Hi,

I am working on a database for a crop monitoring project. I really appreciate your work in this and the dwdweather2 repository!

What would be the best way to circumvent the console for this package? Is there an easy way to pipe the string to the options variable in phenodata.command.run()?

Thanks!
markus

@amotl
Copy link
Member

amotl commented Jun 16, 2020

Dear Markus,

I really appreciate your work on this.

Thanks for appreciating our work on this program.

[...] and the dwdweather2 repository!

Regarding weather information from DWD/CDC, we would also like to point out the fine python_dwd by @gutzbenj which sparked our interest just recently.

I am working on a database for a crop monitoring project.

I see what you are doing over at rcm_archive within loadwd.py. Good luck and let us know about any help you might need.

What would be the best way to circumvent the console for this package?
Is there an easy way to pipe the string to the options variable in phenodata.command.run()?

You are actually asking how to use this module as a library? What about using these lines from phenodata.command.run() and trying to ramp it up from there?

cdc_client = DwdCdcClient(ftp=FTPSession())
humanizer = DwdPhenoDataHumanizer(language=options['language'], long_station=options['long-station'], show_ids=options['show-ids'])
client = DwdPhenoData(cdc=cdc_client, humanizer=humanizer, dataset=options.get('dataset'))

data = client.get_observations(options, humanize=options['humanize'])

With kind regards,
Andreas.

@amotl
Copy link
Member

amotl commented Jun 16, 2020

What about using these lines from phenodata.command.run() and trying to ramp it up from there?

Now, I see that this might not so easy. For a quick solution, I have been able to give you this hack on how to fake the parameters into sys.argv instead of having to shell out to the phenodata program:

proc_string = [
    'phenodata',
    'list-stations',
    '--source=dwd',
    '--dataset=immediate',
    '--all', '--format=csv'
]

import sys
import phenodata.command

sys.argv = proc_string
phenodata.command.run()

The same would also work for

proc_string = ['phenodata', 'observations', '--source=dwd',
               '--dataset=' + str(dataset),
               '--partition=' + str(partition),
               '--filename=' + str(crops),
               '--station-id=' + str(stations),
               '--year=' + str(years),
               '--format=csv']

However, you would still have to parse STDOUT again, which is kind of sad.

@amotl
Copy link
Member

amotl commented Jun 16, 2020

At [1], you can now find two basic examples about how to use the module as a library in order to yield Pandas DataFrames for further downstream processing. That way, you will not have to convert the JSON or CSV output back, which would have been silly.

Currently, still all options have to be obtained, even if most of them are actually None. So, there's definitively room for improvement all over the place.

Please let me know if this will help you along.

[1] https://github.com/hiveeyes/phenodata/tree/master/examples

@MarkusZehner
Copy link
Author

Thanks for the examples!
yes this definitely helps, as you might have seen the silly back conversion to a dict already took place, but using the pandas directly might be the easiest solution.

Also is there a reason why the cache is encrypted?

@MarkusZehner
Copy link
Author

Thanks again, running the client directly is much more efficient!

@amotl
Copy link
Member

amotl commented Jun 17, 2020

Yes this definitely helps. Thanks for the examples!

You are welcome.

Also is there a reason why the cache is encrypted?

Are you sure about this detail? Maybe dogpile.cache just serializes the data using pickle under the hood?

I've stumbled upon the next problem: fcntl used by dogpile.cache is not compatible with windows. is there an easy fix for that? e.g. using sqlite3 as in dwdweather?

What about this guy? Have you been able to resolve it?
Edit: Now I see https://github.com/MarkusZehner/rcm_archive/issues/1 by @Aranil. So, that would still be an issue?

@amotl amotl changed the title using phenodata without console? Using phenodata as a library Jun 17, 2020
@MarkusZehner
Copy link
Author

@Aranil was testing rcm_archive on windows, before that i was not aware of fcntl. It is not a huge problem, the final thing is intended to run on a server that should run on linux (also why i deleted that comment).

I tried to use the sqlite3 solution in panodata/dwdweather2 but i'm not familiar with dogpile.cache so i got lost quickly after list_plus and list_plus_real.

@amotl
Copy link
Member

amotl commented Oct 27, 2020

Dear Markus,

as I am just revisiting this issue, I wanted to take the chance to tell you about Wetterdienst. You might want to prefer it over dwdweather2 these days.

With kind regards,
Andreas.

cc @gutzbenj

@amotl
Copy link
Member

amotl commented Jan 6, 2021

Dear Markus,

we recently worked on bringing phenodata and Grafana together, see [1]. The code at [2] might help you when trying to use this as a library.

With kind regards,
Andreas.

[1] https://github.com/panodata/grafana-pandas-datasource/tree/2d624da/examples/phenodata-mellifera
[2] https://github.com/panodata/grafana-pandas-datasource/blob/2d624da/examples/phenodata-mellifera/demo.py#L62-L101

@MarkusZehner
Copy link
Author

Dear Andreas,

thanks for keeping me in the loop!
Though currently im in the last months of writing my thesis, and no longer working on this project.

Kind regards,
Markus.

@amotl
Copy link
Member

amotl commented Jan 6, 2021

Currently I'm in the last months of writing my thesis.

We wish you a happy new year and good success with your thesis.

As we are modernizing phenodata these days, we will be happy about a star from all people who value our work here. </fishing> ;].

@MarkusZehner
Copy link
Author

Thank you very much!

@amotl
Copy link
Member

amotl commented Apr 11, 2023

Dear Markus,

we hope you are doing well, that you've finished your thesis properly, and that you are now travelling the world.

We just improved the documentation and added a dedicated section about how to use phenodata as a library 1, sparked by your inquiry. Let me know if you find any details for improvement.

I'm in the last months of writing my thesis, and no longer working on this project.

We will also be happy to hear back about what you used phenodata for, if you are allowed to talk about it now. If you have any resources available you can share, it would be nice to add them to the documentation as references.

With kind regards,
Andreas.

Footnotes

  1. https://phenodata.readthedocs.io/#library-use

@amotl
Copy link
Member

amotl commented Apr 30, 2023

Dear @MarkusZehner,

we continued our endeavor of unlocking DWD CDC open access data, and converged the phenology observation data into corresponding SQLite databases, making it very convenient for querying and filtering. More information at 123 ff. We hope you like it.

With kind regards,
Andreas.

/cc @Aranil, @MarkusRAdam, @lawacco

Footnotes

  1. https://phenodata.hiveeyes.org/

  2. https://phenodata.hiveeyes.org/data/dwd/

  3. https://community.hiveeyes.org/t/phenodata-ein-toolkit-zur-beschaffung-und-verarbeitung-von-open-access-phanologiedaten/2892/57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants