Skip to content

Commit 28cbc6f

Browse files
committedDec 2, 2023
Initial problem
1 parent cdeeecb commit 28cbc6f

19 files changed

+5532
-1
lines changed
 

‎.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -158,3 +158,5 @@ cython_debug/
158158
# and can be added to the global gitignore or merged into this file. For a more nuclear
159159
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160160
#.idea/
161+
162+
.DS_Store

‎README.md

+47-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,47 @@
1-
# python-data-app-challenge
1+
# Data App Comparison
2+
3+
This repo illustrates some differences between various web application frameworks.
4+
The purpose is to provide minimal, concrete examples of how to accomplish common development tasks in various python web application frameworks, and to use those examples to help people learn their APIs.
5+
The frameworks we have so far are:
6+
7+
- [Dash](https://plotly.com/dash/)
8+
9+
- [Panel](https://panel.holoviz.org/reference/index.html)
10+
11+
- [Shiny](https://shiny.posit.co/)
12+
13+
- [Streamlit](https://streamlit.io/)
14+
15+
## Running the examples
16+
17+
Navigate to the example folder and install the dependencies in a virtual environment with
18+
19+
``` bash
20+
pip install -r requirements.txt
21+
```
22+
23+
| Framework | Command |
24+
|-----------|------------------------|
25+
| Dash | `python app.py` |
26+
| Panel | `panel serve app.py` |
27+
| Streamlit | `streamlit run app.py` |
28+
| Shiny | `shiny run app.py` |
29+
30+
# Submitting a new problem
31+
32+
Please raise an issue to discuss and clarify the problem statement, and then submit a pull request with the problem statement in a README file.
33+
Ideally problems should have the following qualities:
34+
35+
- Problems should be small and clear
36+
37+
- Successful apps should stand alone and not require external APIs or system setup
38+
39+
- Problems should focus on the capabilities of the web framework
40+
41+
- For inspriation see [7guis](https://eugenkiss.github.io/7guis/) or [TodoMVC](https://todomvc.com/)
42+
43+
# Submitting a new solution
44+
45+
We want only one solution per framework, but please submit PRs with either solutions from a new framework, or improvements to an the existing solution.
46+
Your solution should focus on the framework's capabilities, and ideally have fairly few dependencies.
47+
For example it's not a good idea to include a lot of JavaScript code in your Streamlit solution because that will tell the reader more about how to do something in JavaScript than it will about what they can do in Streamlit.

‎sampling-dashboard/README.md

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Problem description
2+
3+
This exercise illustrates the common problem of sampling from a dataset and interrogating that dataset with matplotlib plots.
4+
You could imaging the sample being taken from a database, or larger than memory dataset, but in this case it's based on a small sample of the NYC Taxi data.
5+
6+
## Requirements
7+
8+
1. The application should have the following components:
9+
10+
- A proportion input which selects the proportion of the dataset to sample
11+
12+
- A log-scale input which selects whether the tip plot is on a log scale
13+
14+
- A plot showing the relationship between tips and prices
15+
16+
- A plot showing a histogram of prices
17+
18+
2. The app should use matplotlib plots (which can be found in `plots.py`
19+
20+
3. The histogram plot should not rerender if the log-scale selector is changed
21+
22+
4. The sample should only be retaken if the proportion slider changes
23+
24+
5. Each time the proportion slider changes the app should take a new sample
25+
26+
#

‎sampling-dashboard/dash/app.py

+86
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
import dash
2+
import dash_bootstrap_components as dbc
3+
import pandas as pd
4+
import plotly.express as px
5+
from dash import Input, Output, dcc, html
6+
7+
app = dash.Dash(external_stylesheets=[dbc.themes.BOOTSTRAP])
8+
# the style arguments for the sidebar. We use position:fixed and a fixed width
9+
SIDEBAR_STYLE = {
10+
"position": "fixed",
11+
"top": 0,
12+
"left": 0,
13+
"bottom": 0,
14+
"width": "16rem",
15+
"padding": "2rem 1rem",
16+
"background-color": "#f8f9fa",
17+
}
18+
19+
CONTENT_STYLE = {
20+
"margin-left": "18rem",
21+
"margin-right": "2rem",
22+
"padding": "2rem 1rem",
23+
}
24+
25+
sidebar = html.Div(
26+
children=[
27+
dcc.Input(id="sample", type="number", min=0, max=1, value=0.1, step=0.01),
28+
html.Div("Plot scale"),
29+
dcc.RadioItems(["Linear", "Log"], id="scale"),
30+
],
31+
style=SIDEBAR_STYLE,
32+
)
33+
34+
content = html.Div(
35+
id="page-content",
36+
style=CONTENT_STYLE,
37+
children=[
38+
html.Div(id="max-value", style={"padding-top": "50px"}),
39+
dcc.Graph(id="scatter-plot"),
40+
dcc.Graph(id="histogram"),
41+
dcc.Store(id="sampled-dataset"),
42+
],
43+
)
44+
45+
app.layout = html.Div([dcc.Location(id="url"), sidebar, content])
46+
47+
48+
@app.callback(Output("sampled-dataset", "data"), Input("sample", "value"))
49+
def cache_dataset(sample):
50+
df = pd.read_csv("nyc-taxi.csv")
51+
df = df.sample(frac=sample)
52+
53+
# To cache data in this way we need to seiralize it to json
54+
json = df.to_json(date_format="iso", orient="split")
55+
return json
56+
57+
58+
@app.callback(Output("max-value", "children"), Input("sampled-dataset", "data"))
59+
def update_max_value(sampled_df):
60+
df = pd.read_json(sampled_df, orient="split")
61+
return f'First taxi id: {df["taxi_id"].iloc[0]}'
62+
63+
64+
@app.callback(
65+
Output("scatter-plot", "figure"),
66+
Input("sampled-dataset", "data"),
67+
Input("scale", "value"),
68+
)
69+
def update_scatter(sampled_df, scale):
70+
df = pd.read_json(sampled_df, orient="split")
71+
scale = scale == "Log"
72+
fig = px.scatter(df, x="total_amount", y="tip_amount", log_x=scale, log_y=scale)
73+
fig.update_layout(transition_duration=500)
74+
return fig
75+
76+
77+
@app.callback(Output("histogram", "figure"), Input("sampled-dataset", "data"))
78+
def update_histogram(sampled_df):
79+
df = pd.read_json(sampled_df, orient="split")
80+
fig = px.histogram(df, x="total_amount")
81+
fig.update_layout(transition_duration=500)
82+
return fig
83+
84+
85+
if __name__ == "__main__":
86+
app.run_server(debug=True)

‎sampling-dashboard/dash/nyc-taxi.csv

+1,000
Large diffs are not rendered by default.

‎sampling-dashboard/dash/plots.py

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
from matplotlib.pyplot import close
2+
from plotnine import (
3+
aes,
4+
geom_histogram,
5+
geom_point,
6+
ggplot,
7+
scale_x_log10,
8+
scale_y_log10,
9+
theme_bw,
10+
)
11+
12+
13+
def plot_tips(sampled_data, log, color="black"):
14+
plot = (
15+
ggplot(sampled_data, aes("tip_amount", "total_amount"))
16+
+ geom_point(color=color)
17+
+ theme_bw()
18+
)
19+
if log:
20+
plot = plot + scale_x_log10() + scale_y_log10()
21+
fig = plot.draw()
22+
close()
23+
return fig
24+
25+
26+
def plot_hist(sampled_data, color="black"):
27+
plot = (
28+
ggplot(sampled_data, aes(x="total_amount"))
29+
+ geom_histogram(binwidth=5, color=color, fill=color)
30+
+ theme_bw()
31+
)
32+
fig = plot.draw()
33+
close()
34+
return fig

‎sampling-dashboard/gradio/app.py

+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
import time
2+
3+
from pandas import read_csv
4+
from plotnine import (
5+
aes,
6+
geom_histogram,
7+
geom_point,
8+
ggplot,
9+
scale_x_log10,
10+
scale_y_log10,
11+
theme_bw,
12+
)
13+
14+
import gradio as gr
15+
16+
taxi = read_csv("nyc-taxi.csv")
17+
18+
19+
def sample_data(slider):
20+
time.sleep(1)
21+
out = taxi.sample(frac=slider)
22+
23+
return {sampled_data: out}
24+
25+
26+
def plot_tips(sampled_data, log):
27+
plot = (
28+
ggplot(sampled_data, aes("tip_amount", "total_amount"))
29+
+ geom_point()
30+
+ theme_bw()
31+
)
32+
if log:
33+
plot = plot + scale_x_log10() + scale_y_log10()
34+
return plot.draw()
35+
36+
37+
def plot_hist(sampled_data):
38+
plot = (
39+
ggplot(sampled_data, aes(x="total_amount"))
40+
+ geom_histogram(binwidth=5)
41+
+ theme_bw()
42+
)
43+
return plot.draw()
44+
45+
46+
with gr.Blocks() as demo:
47+
sampled_data = gr.State(None)
48+
with gr.Row():
49+
with gr.Column(scale=2):
50+
slider = gr.Slider(0, 1, value=0.1, step=0.01)
51+
log_scale = gr.Checkbox(label="Log Scale")
52+
with gr.Column(scale=10):
53+
tip_plot = gr.Plot()
54+
hist_plot = gr.Plot()
55+
56+
slider.change(sample_data, [slider], [sampled_data]).then(
57+
plot_tips, [sampled_data, log_scale], [tip_plot]
58+
).then(plot_hist, [sampled_data], [hist_plot])
59+
60+
log_scale.change(plot_tips, [sampled_data, log_scale], [tip_plot])
61+
62+
63+
if __name__ == "__main__":
64+
demo.launch()

‎sampling-dashboard/gradio/nyc-taxi.csv

+1,000
Large diffs are not rendered by default.

‎sampling-dashboard/gradio/plots.py

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
from matplotlib.pyplot import close
2+
from plotnine import (
3+
aes,
4+
geom_histogram,
5+
geom_point,
6+
ggplot,
7+
scale_x_log10,
8+
scale_y_log10,
9+
theme_bw,
10+
)
11+
12+
13+
def plot_tips(sampled_data, log, color="black"):
14+
plot = (
15+
ggplot(sampled_data, aes("tip_amount", "total_amount"))
16+
+ geom_point(color=color)
17+
+ theme_bw()
18+
)
19+
if log:
20+
plot = plot + scale_x_log10() + scale_y_log10()
21+
fig = plot.draw()
22+
close()
23+
return fig
24+
25+
26+
def plot_hist(sampled_data, color="black"):
27+
plot = (
28+
ggplot(sampled_data, aes(x="total_amount"))
29+
+ geom_histogram(binwidth=5, color=color, fill=color)
30+
+ theme_bw()
31+
)
32+
fig = plot.draw()
33+
close()
34+
return fig

‎sampling-dashboard/panel/app.py

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
from pandas import read_csv
2+
import panel as pn
3+
4+
from plots import plot_hist, plot_tips
5+
6+
7+
def first_taxi(data):
8+
if data.empty:
9+
return '## First taxi id: *NA*'
10+
11+
return f'## First taxi id: *{data["taxi_id"].iloc[0]}*'
12+
13+
pn.extension(
14+
sizing_mode="stretch_width",
15+
)
16+
17+
data = pn.state.as_cached(
18+
key="nyc-taxi", fn=read_csv, filepath_or_buffer="nyc-taxi.csv"
19+
)
20+
plot_hist = pn.cache(plot_hist)
21+
plot_tips = pn.cache(plot_tips)
22+
23+
sample_input = pn.widgets.FloatSlider(
24+
value=0.1, start=0, end=1, step=0.01, name="Sample"
25+
)
26+
scale_input = pn.widgets.Checkbox(name="Use Log Scale", margin=(20, 10, 0, 10))
27+
28+
sample_data = pn.bind(data.sample, frac=sample_input)
29+
30+
pn.template.FastListTemplate(
31+
site="Panel",
32+
title="NYC Taxi Data",
33+
sidebar=[
34+
"## NYC Taxi Data",
35+
sample_input,
36+
scale_input,
37+
],
38+
main=[
39+
pn.bind(first_taxi, sample_data),
40+
pn.pane.Matplotlib(pn.bind(plot_tips, sample_data, scale_input), height=600),
41+
pn.pane.Matplotlib(pn.bind(plot_hist, sample_data), height=600 ),
42+
],
43+
main_max_width="850px",
44+
).servable()

‎sampling-dashboard/panel/nyc-taxi.csv

+1,000
Large diffs are not rendered by default.

‎sampling-dashboard/panel/plots.py

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
from matplotlib.pyplot import close
2+
from plotnine import (
3+
aes,
4+
geom_histogram,
5+
geom_point,
6+
ggplot,
7+
scale_x_log10,
8+
scale_y_log10,
9+
theme_bw,
10+
)
11+
12+
13+
def plot_tips(sampled_data, log, color="black"):
14+
plot = (
15+
ggplot(sampled_data, aes("tip_amount", "total_amount"))
16+
+ geom_point(color=color)
17+
+ theme_bw()
18+
)
19+
if log:
20+
plot = plot + scale_x_log10() + scale_y_log10()
21+
fig = plot.draw()
22+
close()
23+
return fig
24+
25+
26+
def plot_hist(sampled_data, color="black"):
27+
plot = (
28+
ggplot(sampled_data, aes(x="total_amount"))
29+
+ geom_histogram(binwidth=5, color=color, fill=color)
30+
+ theme_bw()
31+
)
32+
fig = plot.draw()
33+
close()
34+
return fig

‎sampling-dashboard/requirements.txt

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
dash
2+
dash_bootstrap_components
3+
gradio
4+
pandas
5+
panel
6+
plotnine
7+
ruff
8+
shiny
9+
streamlit

‎sampling-dashboard/shiny/app.py

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
from pandas import read_csv
2+
from shiny import App, reactive, render, ui
3+
4+
from plots import plot_hist, plot_tips
5+
6+
app_ui = ui.page_sidebar(
7+
ui.sidebar(
8+
ui.input_slider("sample", "Sample Size", 0, 1, value=0.1, ticks=False),
9+
ui.input_checkbox("log", "Log Scale"),
10+
),
11+
ui.h3(ui.output_text("first_taxi_id")),
12+
ui.card(ui.output_plot("tip_plot")),
13+
ui.card(ui.output_plot("amount_histogram")),
14+
title="Shiny",
15+
)
16+
17+
18+
def server(input, output, session):
19+
@reactive.Calc
20+
def dat():
21+
df = read_csv("nyc-taxi.csv")
22+
return df
23+
24+
@reactive.Calc
25+
def sampled_dat():
26+
return dat().sample(frac=input.sample())
27+
28+
@render.text
29+
def first_taxi_id():
30+
return f'Sample ID: {sampled_dat()["taxi_id"].iloc[0]}'
31+
32+
@render.plot
33+
def tip_plot():
34+
return plot_tips(sampled_dat(), input.log())
35+
36+
@output
37+
@render.plot
38+
def amount_histogram():
39+
return plot_hist(sampled_dat())
40+
41+
42+
app = App(app_ui, server)

‎sampling-dashboard/shiny/nyc-taxi.csv

+1,000
Large diffs are not rendered by default.

‎sampling-dashboard/shiny/plots.py

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
from matplotlib.pyplot import close
2+
from plotnine import (
3+
aes,
4+
geom_histogram,
5+
geom_point,
6+
ggplot,
7+
scale_x_log10,
8+
scale_y_log10,
9+
theme_bw,
10+
)
11+
12+
13+
def plot_tips(sampled_data, log, color="black"):
14+
plot = (
15+
ggplot(sampled_data, aes("tip_amount", "total_amount"))
16+
+ geom_point(color=color)
17+
+ theme_bw()
18+
)
19+
if log:
20+
plot = plot + scale_x_log10() + scale_y_log10()
21+
fig = plot.draw()
22+
close()
23+
return fig
24+
25+
26+
def plot_hist(sampled_data, color="black"):
27+
plot = (
28+
ggplot(sampled_data, aes(x="total_amount"))
29+
+ geom_histogram(binwidth=5, color=color, fill=color)
30+
+ theme_bw()
31+
)
32+
fig = plot.draw()
33+
close()
34+
return fig

‎sampling-dashboard/streamlit/app.py

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import streamlit as st
2+
from pandas import read_csv
3+
4+
from plots import plot_hist, plot_tips
5+
6+
if "count" not in st.session_state:
7+
st.session_state.count = 0
8+
9+
10+
def increment_counter():
11+
st.session_state.count += 1
12+
13+
14+
with st.sidebar:
15+
sample_ui = st.number_input(
16+
"sample", 0.0, 1.0, value=0.1, step=0.01, on_change=increment_counter
17+
)
18+
log = st.checkbox("Log Scale")
19+
20+
21+
@st.cache_data
22+
def load_data():
23+
df = read_csv("nyc-taxi.csv")
24+
return df
25+
26+
27+
data = load_data()
28+
29+
30+
@st.cache_data(max_entries=2)
31+
def take_sample_busted(df, fraction, counter):
32+
return df.copy().sample(frac=fraction)
33+
34+
35+
# We need to use this cache busting approach because otherwise the
36+
# sample will be retrieved from cache instead of taking a new sample each
37+
# time the sample size changed.
38+
busted_sample = take_sample_busted(data, sample_ui, st.session_state.count)
39+
40+
st.subheader(f'Sample id: {busted_sample["taxi_id"].iloc[0]}')
41+
st.pyplot(plot_tips(busted_sample, log))
42+
st.pyplot(plot_hist(busted_sample))

‎sampling-dashboard/streamlit/nyc-taxi.csv

+1,000
Large diffs are not rendered by default.

‎sampling-dashboard/streamlit/plots.py

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
from matplotlib.pyplot import close
2+
from plotnine import (
3+
aes,
4+
geom_histogram,
5+
geom_point,
6+
ggplot,
7+
scale_x_log10,
8+
scale_y_log10,
9+
theme_bw,
10+
)
11+
12+
13+
def plot_tips(sampled_data, log, color="black"):
14+
plot = (
15+
ggplot(sampled_data, aes("tip_amount", "total_amount"))
16+
+ geom_point(color=color)
17+
+ theme_bw()
18+
)
19+
if log:
20+
plot = plot + scale_x_log10() + scale_y_log10()
21+
fig = plot.draw()
22+
close()
23+
return fig
24+
25+
26+
def plot_hist(sampled_data, color="black"):
27+
plot = (
28+
ggplot(sampled_data, aes(x="total_amount"))
29+
+ geom_histogram(binwidth=5, color=color, fill=color)
30+
+ theme_bw()
31+
)
32+
fig = plot.draw()
33+
close()
34+
return fig

0 commit comments

Comments
 (0)
Please sign in to comment.