Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save raw source data in fetch #98

Closed
infojunkie opened this issue Nov 23, 2018 · 2 comments
Closed

Save raw source data in fetch #98

infojunkie opened this issue Nov 23, 2018 · 2 comments

Comments

@infojunkie
Copy link

infojunkie commented Nov 23, 2018

There are cases where the data source module may want to allow the user to specify special formatting instructions to be applied on the raw source data. Today, because the module's fetch function is expected to return a pd.DataFrame object, the module is forced to perform the formatting inside of fetch. This means that a user would have to re-fetch to apply the formatting instructions, which is counter-intuitive and wasteful of resources on all sides. One (ugly) workaround is to store some raw data in additional DataFrame columns that are read during render and cleaned up there.

Ideally, the fetch function would store the raw data, and the render function would have access to this raw data for formatting.

@adamhooper
Copy link
Contributor

Hi @infojunkie,

I've updated Workbench's innards in the way you describe. Now, fetch() can output any file; and render() can read any file.

We have two minor hurdles. We could use your help.

  1. Module API. I lied a little: I haven't changed fetch() and render() yet. I changed the lower-level (undocumented, unstable-API) fetch_arrow() and render_arrow() functions. I'd like to double-check expectations with a module author before altering the real fetch() and render() signatures :).
  2. User interface. We don't have a design that separates a "Fetch" button from an "Update params" button. Again: the real task is to collaborate with a module author so we can design this properly.

Would you be interested in a little design session, in which we'd discuss the API and UI of your module? (We don't need your feedback, but it would help.)

I realize it took more than a year to get to this. We didn't forget about it -- it was simply hard to do.

This relates to #120: users should be able to set CSV-parsing options without fetching.

@adamhooper
Copy link
Contributor

Virtually all modules now make fetch() store raw data and render() transform it into tabular data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants