Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chat with csv #32

Merged
merged 3 commits into from
Oct 31, 2023
Merged

chat with csv #32

merged 3 commits into from
Oct 31, 2023

Conversation

neelasha23
Copy link
Contributor

@neelasha23 neelasha23 commented Oct 18, 2023

@neelasha23
Copy link
Contributor Author

Demo: https://misty-sunset-6594.ploomberapp.io/
(only sample dataset is working)

11ed1dda-71da-4b05-97ff-ead7190361a0

Few points:

  1. The app is in a Python script currently using duckdb directly. Do we need to use JupySQL (then I'll need to convert it to notebook).
  2. openAI is returning the entire code including import statements, although I have included If you're asked to plot, you can return plotting code without any import statements or data loading statements in the prompt. And there is some difference between the result returned in chatGPT. e.g. for the query scatter plot of bill length vs bill depth the open-ai API returns:
import matplotlib.pyplot as plt
# Assuming you want to plot the distribution of flipper lengths
plt.hist(penguins['flipper_length_mm']) 
plt.xlabel('Flipper Length (mm)') plt.ylabel('Count') 
plt.title('Distribution of Penguin Flipper Lengths') 
plt.show()

while chatGPT is returning:

import matplotlib.pyplot as plt

# Create a Matplotlib figure and axis
fig, ax = plt.subplots(figsize=(10, 6))

# Customize your scatter plot
ax.scatter(penguins["bill_length_mm"], penguins["bill_depth_mm"])

# Add labels and a title
ax.set_xlabel("Bill Length (mm)")
ax.set_ylabel("Bill Depth (mm)")
ax.set_title("Scatter Plot of Bill Length vs. Bill Depth")

# Show the plot
plt.show()

So right now there is a hardocded plot in the code.

  1. I'm still trying out the ChatGPT - Solara example : https://itnext.io/python-how-to-build-a-chatgpt-interface-in-solara-fd6a1e15ef95. I'll update once I have some results.

@edublancas

@neelasha23 neelasha23 marked this pull request as ready for review October 18, 2023 12:10
@neelasha23 neelasha23 assigned edublancas and unassigned edublancas Oct 18, 2023
@neelasha23 neelasha23 requested a review from edublancas October 18, 2023 12:11
@neelasha23 neelasha23 self-assigned this Oct 18, 2023
@edublancas
Copy link
Contributor

use the existing example as a reference: it already uses jupysql and also fixed the plotting problem: https://github.com/ploomber/doc/blob/main/examples/voila/chat-with-csv/notebook.ipynb

the code only works with histograms and boxplots but we can improve it later

@neelasha23
Copy link
Contributor Author

use the existing example as a reference: it already uses jupysql and also fixed the plotting problem: https://github.com/ploomber/doc/blob/main/examples/voila/chat-with-csv/notebook.ipynb

the code only works with histograms and boxplots but we can improve it later

I'm using this notebook as reference and just removed the jupysql part. So I need the build the Solara app in a notebook instead of python script right? And what the Ipython widgets components like Upload button etc? All of those need to be converted to Solara components? @edublancas

@edublancas
Copy link
Contributor

yes. you can extract the jupysql + openAI logic from the voila app and then build the components with solara

@neelasha23
Copy link
Contributor Author

Deployed the app here: https://damp-term-2947.ploomberapp.io/

There is an issue with the plotting currently: Solara discourages the use of pyplot and jupysql is currently axes returning axes created using pyplot and not matplotlib.Figure. If we use pyplot it disconnects the Solara app immediately. If we try to create a figure using fig.add_axes(ax) then it throws error:

Screenshot 2023-10-19 at 3 14 58 PM

@edublancas

@edublancas
Copy link
Contributor

edublancas commented Oct 19, 2023

Deployed the app here: https://damp-term-2947.ploomberapp.io/

This looks like a Solara web app embedded in a Voila app? (because when I load it, it shows the Voila spinner)

image

There is an issue with the plotting currently: Solara discourages the use of pyplot and jupysql is currently axes returning axes created using pyplot and not matplotlib.Figure. If we use pyplot it disconnects the Solara app immediately. If we try to create a figure using fig.add_axes(ax) then it throws error:

try asking in Solara's GitHub or Discord.


btw, remember to delete applications that you're no longer using.

@neelasha23
Copy link
Contributor Author

This looks like a Solara web app embedded in a Voila app? (because when I load it, it shows the Voila spinner)

Yes I deployed using the Voila option. Can we deploy notebooks in Docker? I thought it's only fpr .py scripts.

btw, remember to delete applications that you're no longer using.

yes I've already deleted the ones not needed.

@edublancas

@edublancas
Copy link
Contributor

Yes I deployed using the Voila option. Can we deploy notebooks in Docker? I thought it's only fpr .py scripts.

why not deploy it as a regular solara app? .py file instead of ipynb

https://docs.cloud.ploomber.io/en/latest/apps/solara.html

@neelasha23
Copy link
Contributor Author

Yes I deployed using the Voila option. Can we deploy notebooks in Docker? I thought it's only fpr .py scripts.

why not deploy it as a regular solara app? .py file instead of ipynb

https://docs.cloud.ploomber.io/en/latest/apps/solara.html

that's what i did previously but we need to use the JupySQL magics right? So I converted it to notebook @edublancas

@edublancas
Copy link
Contributor

ah. yeah, I forgot about the magic details.

then, you need to transform the magics into regular python code.

https://jupysql.ploomber.io/en/latest/api/python.html
https://github.com/ploomber/jupysql/blob/master/src/sql/run/run.py

many parts of the Python API are undocumented so you'll have to navigate the source code. I think once you get this working, we can document the missing parts of JupySQL's Python API

@neelasha23
Copy link
Contributor Author

ah. yeah, I forgot about the magic details.

then, you need to transform the magics into regular python code.

https://jupysql.ploomber.io/en/latest/api/python.html https://github.com/ploomber/jupysql/blob/master/src/sql/run/run.py

many parts of the Python API are undocumented so you'll have to navigate the source code. I think once you get this working, we can document the missing parts of JupySQL's Python API

ok sure!

@neelasha23
Copy link
Contributor Author

New dashboard: https://holy-voice-3121.ploomberapp.io/

I have converted the notebook to python script. The plot issue is also solved now. @edublancas

@edublancas
Copy link
Contributor

a few things:

the app broken when I tried to create a boxplot:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/reacton/core.py", line 1661, in _render
    root_element = el.component.f(*el.args, **el.kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app.py", line 185, in Page
    ax = fn("my_data", column)
         ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ploomber_core/dependencies.py", line 40, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ploomber_core/telemetry/telemetry.py", line 642, in wrapper
    result = func(_payload, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sql/plot.py", line 250, in boxplot
    set_ticklabels([column])
  File "/usr/local/lib/python3.11/site-packages/matplotlib/axes/_base.py", line 73, in wrapper
    return get_method(self)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/matplotlib/_api/deprecation.py", line 297, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/matplotlib/axis.py", line 2025, in set_ticklabels
    raise ValueError(
ValueError: The number of FixedLocator locations (2), usually from a call to set_ticks, does not match the number of labels (1).

(histogram worked fine). the second time I tried, it didn't crash but it showed a blank plot.

the data frame controls doesn't seem to work:

image

also, the last message is deleted when you send a new one, the app should keep the entire history. this looks relevant: https://itnext.io/python-how-to-build-a-chatgpt-interface-in-solara-fd6a1e15ef95

also, I see the SQL command is printed. This is nice but it should be hidden by default (perhaps add toggle control to enable/disabled this)

@neelasha23
Copy link
Contributor Author

I have added the first version of chat interface: https://cool-sun-8688.ploomberapp.io/

There are few glitches like messages occurring multiple times and order is changing. I'll fix these one by one.

Some example queries:

Screenshot 2023-10-24 at 11 01 22 PM Screenshot 2023-10-24 at 11 02 10 PM

I think the boxplot plot is blank due to this issue: ploomber/jupysql#923

the data frame controls doesn't seem to work:

It wasn't working previously because there were only 5 records so there was just 1 page. I have changed it to 30 now.

@edublancas

@edublancas
Copy link
Contributor

sounds good! please request my review when done!

@neelasha23
Copy link
Contributor Author

neelasha23 commented Oct 25, 2023

Chat based interface : https://lively-limit-9311.ploomberapp.io/

The boxplot issue is fixed now. PR: ploomber/jupysql#924

I fixed the bugs that I found (except the clear dataset feature. I will look into it). PFB some samples:

Screenshot 2023-10-25 at 10 51 45 PM Screenshot 2023-10-25 at 10 52 13 PM Screenshot 2023-10-25 at 10 52 36 PM

Please review.
Also, can you please share access to flaticon @edublancas

@edublancas
Copy link
Contributor

  • try with a free image, we no longer have a flaticon account (we were not using it so I canceled it)

general feedback:

  • can we make the enter key to send the message? the only way right now is by clicking on the button
  • let's see if there's a way to remove the "This website runs on Solara" because it overlaps with the text box
  • can we add some margin here? the table view and the "download" button overlap - the download feature is a nice touch!
    image
  • is the visualization library working? since we're using jupysql's plotting features, I guess not. I tried it and it doesn't seem to be working (that's fine, just wondering what's the status)
    image
  • it's easy to crash this, for example if I ask "what plots can you make?" it crashes. perhaps we can do a try catch? if jupysql doesn't complain, then run it, otherwise just print a generic message that the bot can't answer that question?

@neelasha23
Copy link
Contributor Author

  • can we make the enter key to send the message? the only way right now is by clicking on the button

yes it was that way to allow sending multiple messages. but i think that's not required for our case so I'll remove it.

  • is the visualization library working? since we're using jupysql's plotting features, I guess not. I tried it and it doesn't seem to be working (that's fine, just wondering what's the status)

I just kept it as a placeholder because we had decided to add support for other libraries after we have the basic structure ready. We would need to write logic here to extract only the plotting code because openai returns the entire code script including imports , data loading etc.

@edublancas

@edublancas
Copy link
Contributor

yes it was that way to allow sending multiple messages. but i think that's not required for our case so I'll remove it.

sounds good.

I just kept it as a placeholder because we had decided to add support for other libraries after we have the basic structure ready. We would need to write logic here to extract only the plotting code because openai returns the entire code script including imports , data loading etc.

Ok. I think this app has a lot of potential and we can keep them as an ongoing effort (add more features and showcase it in our socials)

I think for v1, we can hide this control and limit to jupysql's plotting but in the near future we can work on v2 and expand the plotting capabilities

solara notebook voila

removed files

Python app

reverted deleted folder

Import changed

plots

chat based

lates version

zip

readme
@neelasha23
Copy link
Contributor Author

Address the comments. Please let me know if you find more issues.
Latest dashboard: https://lucky-art-0278.ploomberapp.io/

A couple of things:

  1. Currently jupysql is being installed directly from the master branch (Dockerfile).
  2. We need to attribute free flaticon images by providing reference links beneath the image in the app page. But I've just added the links in chat.py file as comments. Is that ok?

@edublancas

@edublancas
Copy link
Contributor

the top message:

This Solara app is designed for chatting with your data.

Examples of queries : unique column-name values ;
select top 20 rows from table ;

Example of queries that will return a plot : histogram on column ; boxplot on column

is a bit hard to read

I'd suggest:

Interact with your data using natural language.

Examples:
- show me the unique values of column {column name}
- create a histogram of {column name}
- create a boxplot of {column name}

Found an error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/reacton/core.py", line 1661, in _render
    root_element = el.component.f(*el.args, **el.kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app.py", line 231, in Chat
    ask_chatgpt()
  File "/app.py", line 191, in ask_chatgpt
    _, name, column = final.split(" ")
    ^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 3)

I think the query was: create a boxplot of body_mass_g for MALE observations (either that or histogram)

Copy link
Contributor

@edublancas edublancas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

@neelasha23
Copy link
Contributor Author

neelasha23 commented Oct 31, 2023

Addressed the above comments.
For the 2nd case the query is translated to %sqlplot boxplot body_mass_g WHERE sex = 'MALE' . I've added a try catch. But do we want to handle this case? Maybe we need to create a snippet for male rows and then run boxplot

New app: https://sweet-pine-0716.ploomberapp.io/

@edublancas

@edublancas edublancas merged commit 22c0443 into main Oct 31, 2023
@edublancas edublancas deleted the demo branch October 31, 2023 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants