Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy Streaming of Tabs #291

Open
wants to merge 52 commits into
base: master
Choose a base branch
from
Open

Conversation

cjachekang
Copy link
Contributor

@cjachekang cjachekang commented Mar 2, 2021

Overview

This change (along with the frontend changes) implements lazy loading of the tabs. In essence, all tabs are initially greyed out and tabs are loaded one by one.

For now, the ordering of the calculation of tabs is just reversed (to put correlation last). We can add some sort of tab prioritization later on if needed.

Changes

I had to change a lot of the tests in terms of the tabs they were expecting since we are now lazily streaming them in, the order could be changed (I just changed the tests from using lists to using sets). I changed the way we compute actions. Instead of doing them all at once, I compute the first available one and display that (so we have anything to display). Then when ipython_display reaches its end, I start calculating the rest of the tabs. Also changed the button to a tabs instead.
Actions are now checked before computations in order to prevent unecessary tabs from being displayed initially. These checks are pretty negligible in terms of cost.

Example Output

Please see frontend PR: lux-org/lux-widget#71.

Performance Benefit

The Correlation action can make computing the widget take very long as datasets such as the communities dataset require around 40-45 seconds to be computed. By streaming in the tabs, we will be able to display the widget much earlier allowing users to interact with it almost right away and have the heavy-computation tabs be added in later. For example, Lux for the communities dataset now appears almost instantly instead of after 40ish seconds.

@cjachekang cjachekang added this to the S2: February 2021 milestone Mar 2, 2021
@cjachekang cjachekang linked an issue Mar 2, 2021 that may be closed by this pull request
@dorisjlee
Copy link
Member

Hi @cjachekang, Thanks for adding these in, the streaming implementation looks really great! The interactivity that we now have with this is incredible! Here are some notes and comments:

  1. Can you add in some description in the PR on the performance benefits of these changes?
  2. One unexpected thing that I saw is that the warnings are now displayed under the widget, as opposed to above like before. This is not necessarily a bad thing, I'm just curious why this is happening.
    image
  3. Also I'm wondering if there is a way to better style the tabs. I'm personally not a huge fan of the Pandas/Lux tabs nested in the action tabs, especially given that the two types of tabs look very different (e.g., one in ipywidget and one in JS). In added the nested Lux widget takes up a lot of unnecessary space with the double border one the left and right side. Is there something that we can do in the CSS that make this more closely resemble the toggle button that we had earlier (e.g. invisible div + buttons)?
  4. For the tabs that are streaming in, I'm wondering if showing some type of indicator that additional tabs are still being computed would be helpful. I remember we briefly experimented with the idea of the greyed-out tab that becomes active again after the results are in, is there a reason why we moved away from this implementation?
  5. Temporal shows up first which usually just involves a single visualization. I'm wondering if there is a way to show something that contains a couple more visualizations in the initial view.

@cjachekang
Copy link
Contributor Author

cjachekang commented Mar 7, 2021

@dorisjlee

  1. The warnings are displayed below the widget now since I initially display the pandas dataframe/tabs widget before any computations start (the warnings get displayed during computations).

  2. Something like this?

image

  1. The reason is that it was becoming hard to know what tabs (at least temporal) were actually going to be computed before we computed it. I saw in Will's demo in the research meeting and he had a bunch of new tabs. I think for extensibility purposes, it'll be easier (and will save dev time) to push out tabs once they're done since we won't have to write code to preprocess each new action (to know if it is going to be displayed) everytime we add one.

5.. Yeah! The tabs can be computed in any order - for now I just reversed the order so that correlation is the last to be computed. How do you want me to prioritize the actions? They come in alphabetical as default.

@jerrysong1324
Copy link
Contributor

jerrysong1324 commented Mar 10, 2021

  1. I personally think the old tab method was a bit better. Maybe you can just change the font to match what is within our widget as well as change the paradigm for when the table is not selected. (Our widget uses blue text, the ipython tabs seem to use grey backgrounds).

@codecov
Copy link

codecov bot commented Mar 14, 2021

Codecov Report

Merging #291 (c485d2d) into master (17dd0ee) will decrease coverage by 3.93%.
The diff coverage is 84.44%.

❗ Current head c485d2d differs from pull request most recent head 5f0f9c2. Consider uploading reports for the commit 5f0f9c2 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #291      +/-   ##
==========================================
- Coverage   85.09%   81.16%   -3.94%     
==========================================
  Files          52       50       -2     
  Lines        4033     3663     -370     
==========================================
- Hits         3432     2973     -459     
- Misses        601      690      +89     
Impacted Files Coverage Δ
lux/action/correlation.py 92.30% <ø> (+5.64%) ⬆️
lux/action/enhance.py 100.00% <ø> (ø)
lux/action/generalize.py 84.61% <ø> (-1.10%) ⬇️
lux/action/univariate.py 93.75% <ø> (+2.44%) ⬆️
lux/core/series.py 73.68% <19.04%> (+18.12%) ⬆️
lux/core/frame.py 71.31% <85.48%> (-9.50%) ⬇️
lux/action/default.py 97.46% <97.22%> (-2.54%) ⬇️
lux/_config/config.py 79.54% <100.00%> (-5.39%) ⬇️
lux/action/custom.py 100.00% <100.00%> (+3.70%) ⬆️
lux/executor/SQLExecutor.py 13.53% <0.00%> (-70.92%) ⬇️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17dd0ee...5f0f9c2. Read the comment docs.

@@ -90,6 +90,7 @@ def __init__(self, *args, **kw):
self._min_max = None
self.pre_aggregated = None
self._type_override = {}
self.loadingBar = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this as loading_bar to be consistent with PEP8.

self._widget = rec_df.render_widget()

self._widget = rec_df.render_widget(
pandasHtml=rec_df.to_html(max_rows=5, classes="pandasStyle")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this 10 so that it corresponds to the default display.min_rows value which is 10

# re-render widget for the current dataframe if previous rec is not recomputed
elif show_prev:
rec_df.show_all_column_vis()
if lux.config.render_widget:
self._widget = rec_df.render_widget()
self._widget = rec_df.render_widget(
pandasHtml=rec_df.to_html(max_rows=5, classes="pandasStyle")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -675,11 +722,15 @@ def render_widget(self, renderer: str = "altair", input_current_vis=""):
import luxwidget

widgetJSON = self.to_JSON(self._rec_info, input_current_vis=input_current_vis)
if pandasHtml is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the purpose of backwards compatibility with other versions of lux-api right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps - there was an error in one of the tests that caused pandasHtml to be none, so I just put a null check here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, when we change the input to luxwidget, the missing traitlet variable can cause an error. So It's good that we have this check in here.

if loadingBar is not None:
loadingBar.value = progress

# # Pushing back correlation and geographical actions for performance reasons
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we currently determine what to compute first v.s. what to compute lazily? Or for now, are we explicitly saying that "correlation" and "geographical" will compute later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, actions are checked in alphabetical order (with correlation being first). I manually put correlation and geographical later because i noticed those two take the longest to compute/render. The order in which we check/compute is very flexible as we can just move the action names around in the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lazy streaming-based rendering
3 participants