-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Plotting large dataset in Dash app (#267)
* Plotting large dataset in Dash app * Added dataset instructions * Wording in README * Date range in apps and more description in README * Added description of app * Added description in app --------- Co-authored-by: Ben C <[email protected]>
- Loading branch information
Showing
13 changed files
with
296 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Plotting large datasets in Dash | ||
|
||
Interactive Dash applications that plot large datasets using one of: | ||
- [**WebGL**](https://plotly.com/python/webgl-vs-svg/) (in `webgl` folder): a powerful technology that uses GPU to accelerate computation, helping you render figures more effectively. This method is generally ideal for figures with up to 100,000-200,000 markers (terminology for data points in charts), depending on the power of your GPU. For figures larger than that, it's often optimal to aggregate the data points first | ||
|
||
- [**`plotly-resampler`**](https://github.com/predict-idlab/plotly-resampler) (in `resampler` folder): an external library that dynamically aggregates time-series data respective to the current graph view. This approach helps you downsample your dataset at the cost of losing some details. | ||
|
||
- Combined approach (in `combined` folder). | ||
|
||
We will be using a commercial flight dataset that documents information such as flight delays in the first half (1/1-6/30) of 2006. You can find it [here](https://github.com/vega/falcon/blob/master/data/flights-3m.csv). For the purpose of this project, we will focus on plotting departure delays. | ||
|
||
Once you download the dataset, run `python csv-clean.py flights-3m.csv` to obtain the cleaned csv file `flights-3m-cleaned.csv`. Move the cleaned file to the `data` folder in any of the project folders (`webgl`, `resample` or `combined`) you want to test. | ||
|
||
## Description | ||
|
||
On its home page, the apps will display a scatter plot figure denoting departure delay time (minute) of around 3 million flights, captured below. You can select the date range you want to visualize in `resampler` and `combined`. | ||
|
||
- `webgl` | ||
|
||
![](static/app_webgl.png) | ||
|
||
- `resampler` | ||
|
||
![](static/app_resampler.png) | ||
|
||
- `combined` | ||
|
||
![](static/app_combined.png) | ||
|
||
You can also click on the graph and drag your cursor around to zoom into any part of the graph you want. | ||
|
||
![](static/zoom_in.gif) | ||
|
||
To revert the figure to its original state, click on the `Reset axes` button at the upper right corner of the figure. | ||
|
||
![](static/zoom_out.gif) | ||
|
||
|
||
## Local testing | ||
|
||
`cd` into the folder of the approach you want to test, then run `gunicorn app:server run --bind 0.0.0.0:80`. You should be able to access the app at `0.0.0.0:80`. | ||
|
||
## Upload to Ploomber Cloud | ||
|
||
Ensure that you are in the correct project folder. | ||
|
||
### Command line | ||
|
||
Go to your app folder and set your API key: `ploomber-cloud key YOURKEY`. Next, initialize your app: `ploomber-cloud init` and deploy it: `ploomber-cloud deploy`. For more details, please refer to our [documentation](https://docs.cloud.ploomber.io/en/latest/user-guide/cli.html). | ||
|
||
### UI | ||
|
||
Zip `app.py` together with `requirements.txt` and `data` folder, then upload to Ploomber Cloud. For more details, please refer to our [Dash deployment guide](https://docs.cloud.ploomber.io/en/latest/apps/dash.html). | ||
|
||
## Interacting with the App | ||
|
||
Once the app starts running, you will see a page similar to the above screenshots. You can click on the graph and drag your cursor around to zoom into any part of the graph you want. | ||
|
||
![](static/zoom_in.gif) | ||
|
||
To revert the figure back to its original state, click on the `Reset axes` button at the upper right corner of the figure. | ||
|
||
![](static/zoom_out.gif) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
from dash import dcc, html, Input, Output, Dash | ||
import pandas as pd | ||
from datetime import datetime as dt | ||
import plotly.graph_objects as go | ||
from plotly_resampler import FigureResampler | ||
|
||
app = Dash(__name__) | ||
server = app.server | ||
|
||
N = 100000 | ||
|
||
df = pd.read_csv("data/flights-3m-cleaned.csv") | ||
|
||
app.layout = html.Div(children=[ | ||
html.H1("Plotting Large Datasets in Dash"), | ||
html.H2("""Downsampled figure: Departure delay time of around 3 | ||
million flights in the first half (1/1-6/30) of 2006"""), | ||
html.P("Select range of flight dates to visualize"), | ||
dcc.DatePickerRange( | ||
id="date-picker-select", | ||
start_date=dt(2006, 1, 1), | ||
end_date=dt(2006, 4, 1), | ||
min_date_allowed=dt(2006, 1, 1), | ||
max_date_allowed=dt(2006, 7, 1), | ||
initial_visible_month=dt(2006, 1, 1), | ||
), | ||
html.Div("""Click on the graph and drag | ||
your cursor around to zoom into any part of the graph you want.""" | ||
, style={"margin-top": "10px"}), | ||
html.Div("""To revert the figure to its original state, click on the | ||
'Reset axes' button at the upper right corner of the figure.""" | ||
, style={"margin-top": "10px"}), | ||
dcc.Graph(id='example-graph'), | ||
|
||
]) | ||
|
||
@app.callback( | ||
Output("example-graph", "figure"), | ||
[ | ||
Input("date-picker-select", "start_date"), | ||
Input("date-picker-select", "end_date"), | ||
], | ||
) | ||
def update_figure(start, end): | ||
start = start + " 00:00:00" | ||
end = end + " 00:00:00" | ||
|
||
df_filtered = df[(pd.to_datetime(df["DEP_DATETIME"]) >= pd.to_datetime(start)) & \ | ||
(pd.to_datetime(df["DEP_DATETIME"]) <= pd.to_datetime(end))] | ||
|
||
fig = FigureResampler(go.Figure()) | ||
|
||
fig.add_trace(go.Scattergl( | ||
mode="markers", # Replace with "line-markers" if you want to display lines between time series data. | ||
showlegend=False, | ||
line_width=0.3, | ||
line_color="gray", | ||
marker={ | ||
"color": abs(df["DEP_DELAY"]), # Convert marker value to color. | ||
"colorscale": "Portland", # How marker color changes based on data point value. | ||
"size": abs(5 + df["DEP_DELAY"] / 50) # Non-negative size of individual data point marker based on the dataset. | ||
} | ||
), | ||
hf_x=df_filtered["DEP_DATETIME"], | ||
hf_y=df_filtered["DEP_DELAY"], | ||
max_n_samples=N | ||
) | ||
|
||
fig.update_layout( | ||
title="Flight departure delay", | ||
xaxis_title="Flight date and time (24h)", | ||
yaxis_title="Departure delay (minutes)" | ||
) | ||
|
||
return fig |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
dash | ||
plotly-resampler | ||
pandas | ||
gunicorn |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
import pandas as pd | ||
import sys | ||
|
||
if __name__ == "__main__": | ||
if (len(sys.argv) != 2 or not sys.argv[1].endswith(".csv")): | ||
raise ValueError("Usage: python csv-clean.py filename.csv") | ||
|
||
in_file = sys.argv[1] | ||
df = pd.read_csv(in_file) | ||
|
||
# Clean out null values | ||
df = df[df['DEP_TIME'].notnull() & df['DEP_DELAY'].notnull()] | ||
|
||
# Ensure hour is between 0 and 23 for conversion | ||
df.loc[df.DEP_TIME == 2400, 'DEP_TIME'] = 0 | ||
|
||
# Add time to date and convert | ||
df["DEP_DATETIME"] = df["FL_DATE"] * 10000 + df["DEP_TIME"] | ||
df["DEP_DATETIME"] = df["DEP_DATETIME"].apply(lambda x: pd.to_datetime(str(int(x)))) | ||
|
||
# Select relevant columns. | ||
df = df[["DEP_DATETIME", "DEP_DELAY"]].sort_values(["DEP_DATETIME"]) | ||
print("Completed conversion. Resulting DataFrame:\n") | ||
print(df) | ||
|
||
out_file = in_file[:-4] + "-cleaned.csv" | ||
df.to_csv(out_file, sep=",") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
from dash import dcc, html, Input, Output, Dash | ||
import pandas as pd | ||
from datetime import datetime as dt | ||
import plotly.graph_objects as go | ||
from plotly_resampler import FigureResampler | ||
|
||
app = Dash(__name__) | ||
server = app.server | ||
|
||
N = 2000 | ||
|
||
df = pd.read_csv("data/flights-3m-cleaned.csv") | ||
|
||
app.layout = html.Div(children=[ | ||
html.H1("Plotting Large Datasets in Dash"), | ||
html.H2("""Downsampled figure: Departure delay time of around 3 | ||
million flights in the first half (1/1-6/30) of 2006"""), | ||
html.P("Select range of flight dates to visualize"), | ||
dcc.DatePickerRange( | ||
id="date-picker-select", | ||
start_date=dt(2006, 1, 1), | ||
end_date=dt(2006, 4, 1), | ||
min_date_allowed=dt(2006, 1, 1), | ||
max_date_allowed=dt(2006, 7, 1), | ||
initial_visible_month=dt(2006, 1, 1), | ||
), | ||
html.Div("""Click on the graph and drag | ||
your cursor around to zoom into any part of the graph you want.""" | ||
, style={"margin-top": "10px"}), | ||
html.Div("""To revert the figure to its original state, click on the | ||
'Reset axes' button at the upper right corner of the figure.""" | ||
, style={"margin-top": "10px"}), | ||
dcc.Graph(id='example-graph'), | ||
|
||
]) | ||
|
||
@app.callback( | ||
Output("example-graph", "figure"), | ||
[ | ||
Input("date-picker-select", "start_date"), | ||
Input("date-picker-select", "end_date"), | ||
], | ||
) | ||
def update_figure(start, end): | ||
start = start + " 00:00:00" | ||
end = end + " 00:00:00" | ||
|
||
df_filtered = df[(pd.to_datetime(df["DEP_DATETIME"]) >= pd.to_datetime(start)) & \ | ||
(pd.to_datetime(df["DEP_DATETIME"]) <= pd.to_datetime(end))] | ||
|
||
fig = FigureResampler(go.Figure()) | ||
|
||
fig.add_trace(go.Scatter( | ||
mode="markers", # Replace with "line-markers" if you want to display lines between time series data. | ||
showlegend=False, | ||
line_width=0.3, | ||
line_color="gray", | ||
marker_size=abs(5 + df["DEP_DELAY"] / 50), # Non-negative size of individual data point marker based on the dataset. | ||
marker_colorscale="Portland", # How marker color changes based on data point value. | ||
marker_color=abs(df["DEP_DELAY"]), # Convert marker value to color. | ||
), | ||
hf_x=df_filtered["DEP_DATETIME"], | ||
hf_y=df_filtered["DEP_DELAY"], | ||
max_n_samples=N | ||
) | ||
|
||
fig.update_layout( | ||
title="Flight departure delay", | ||
xaxis_title="Flight date and time (24h)", | ||
yaxis_title="Departure delay (minutes)" | ||
) | ||
|
||
return fig |
4 changes: 4 additions & 0 deletions
4
examples/dash/plotly-large-dataset/resampler/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
dash | ||
plotly-resampler | ||
pandas | ||
gunicorn |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
from dash import dcc, html, Input, Output, Dash | ||
from flask import request | ||
import pandas as pd | ||
import plotly.graph_objects as go | ||
|
||
app = Dash(__name__) | ||
server = app.server | ||
|
||
N = 100000 # Limit number of rows to plot. | ||
|
||
fig = go.Figure() # Initiate the figure. | ||
|
||
df = pd.read_csv("data/flights-3m-cleaned.csv") | ||
|
||
fig.add_trace(go.Scattergl( | ||
x=df["DEP_DATETIME"][:N], | ||
y=df["DEP_DELAY"][:N], | ||
mode="markers", # Replace with "line-markers" if you want to display lines between time series data. | ||
showlegend=False, | ||
line_width=0.3, | ||
line_color="gray", | ||
marker={ | ||
"color": abs(df["DEP_DELAY"][:N]), # Convert marker value to color. | ||
"colorscale": "Portland", # How marker color changes based on data point value. | ||
"size": abs(5 + df["DEP_DELAY"][:N] / 50) # Non-negative size of individual data point marker based on the dataset. | ||
} | ||
) | ||
) | ||
|
||
fig.update_layout( | ||
title="Flight departure delay", | ||
xaxis_title="Flight date and time (24h)", | ||
yaxis_title="Departure delay (minutes)" | ||
) | ||
|
||
app.layout = html.Div(children=[ | ||
html.H1("Plotting Large Datasets in Dash"), | ||
html.H2("""Downsampled figure: Departure delay time of around 3 | ||
million flights in the first half (1/1-6/30) of 2006"""), | ||
html.Div("""Click on the graph and drag | ||
your cursor around to zoom into any part of the graph you want.""" | ||
, style={"margin-top": "10px"}), | ||
html.Div("""To revert the figure to its original state, click on the | ||
'Reset axes' button at the upper right corner of the figure.""" | ||
, style={"margin-top": "10px"}), | ||
dcc.Graph(id='example-graph', figure=fig), | ||
]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
dash | ||
pandas | ||
gunicorn |